SlideShare a Scribd company logo
[2D1]Elasticsearch 성능 최적화
정호욱책임/ BigDataPlatform Team 
그루터 
ElasticSearch의이해와 
성능최적화
저는요… 
•정호욱 
•BigdataPlatform, GruterCorp 
•hwjeong@gruter.com 
•http://jjeong.tistory.com 
•E-book: 실무예제로배우는Elasticsearch검색엔진-입문편
1.ElasticSearch이해 
2.ElasticSearch 성능최적화이해 
3.ElasticSearch 빅데이터활용 
CONTENTS
1.ElasticSearch 
이해 
1.1.ElasticSearch와동작방식 
1.2.설치및실행하기 
1.3.Modeling 하기
ElasticSearch란? 
Lucene기반의오픈소스검색엔진 
1.1.ElasticSearch와동작방식 
ElasticSearch특징 
Easy 
Real time search & analytics 
Distributed & highly available search engine
ElasticSearch구성 
Physical구성 
Logical구성 
1.1.ElasticSearch와동작방식 
Cluster 
Index 
Node 
Node 
Node 
Indice 
Indice 
Indice 
Shard 
Shard 
Shard 
Shard 
Shard 
Shard 
Shard 
Shard 
Shard 
Type 
Type 
Type 
Document 
Document 
Document 
field:value 
field:value 
field:value 
field:value 
field:value 
field:value 
field:value 
field:value 
field:value 
[Physical 구성] 
[Logical 구성]
ElasticSearchNodes 
Master node 
Data node 
Search load balancer node 
Client node 
1.1.ElasticSearch와동작방식 
Master 
node.master: true 
Data 
node.data: true 
Search LB 
node.master: false 
node.data: false 
Client 
node.client: true
ElasticSearchNodes 구성예 
1.1.ElasticSearch와동작방식 
Case 1) 
All round player 
node.master: true 
node.data: true 
node.master: true 
node.data: true 
node.master: true 
node.data: true 
Case 2) 
Master 
Data 
node.master: true 
node.data: false 
node.master: true 
node.data: false 
node.master: false 
node.data: true 
node.master: false 
node.data: true 
Case 3) 
Master 
Data 
Search LB 
node.master: true 
node.data: false 
node.master: true 
node.data: false 
node.master: false 
node.data: true 
node.master: false 
node.data: true 
node.master: false 
node.data: false 
node.master: false 
node.data: false
ElasticSearchvs RDBMS 
1.1.ElasticSearch와동작방식 
Relational Database 
ElasticSearch 
Database 
Index 
Table 
Type 
Row 
Document 
Column 
Field 
Index 
Analyze 
Primary key 
_id 
Schema 
Mapping 
Physical partition 
Shard 
Logical partition 
Route 
Relational 
Parent/Child, Nested 
SQL 
Query DSL
ElasticSearchshard replication 
1.1.ElasticSearch와동작방식 
POST /my_index/_settings{ "number_of_replicas":1} 
POST /my_index/_settings{ "number_of_replicas":2} 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/replica-shards
Creating, indexing and deleting a document 
1.1.ElasticSearch와동작방식 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html
Retrieve, query and fetch a document 
1.1.ElasticSearch와동작방식 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-read.html 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_query_phase.html 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_fetch_phase.html
설치하기 
다운로드 
압축해제 
1.2.설치및실행하기 
실행하기 
실행 
테스트 
Create index 
Add document 
Get document 
Search document
Indice/type design 
Time-based/User-based data 
Relational data 
1TB 
1.3.Modeling 하기 
Field design 
검색대상필드 
분석대상필드 
정렬대상필드 
저장대상필드 
Primary key 필드
Modeling 구성예 
1.3.Modeling 하기 
Indice1 
Indice2 
Indice3 
IndiceA 
IndiceB 
IndiceC 
Type 
Parent 
Type 
Child 
Type 
Parent 
Type 
Child 
Type 
Child 
Type 
1 : N 
1 : N 
1 : N
Shard design 
number_of_shards>= number_of_data_nodes 
number_of_replica<= number_of_data_nodes-1 
1.3.Modeling 하기 
Shard sizing 
Index 당최대shard 수: 200 개이하 
Shard 하나당최대크기: 20 ~ 50GB 
Shard 하나당최소크기: ~ 3GB
Hash partition test 
1.3.Modeling 하기 
public class EsHashPartitionTest{ 
@Test 
public void testHashPartiion() { 
……중략…… 
for ( inti=0; i<1000000; i++ ) { 
intshardId= MathUtils.mod(hash(String.valueOf(i)), shardSize); 
shards.add(shardId, (long) ++partSize[shardId]); 
} 
……중략…… 
} 
public inthash(String routing) { 
return hashFunction.hash(routing); 
} 
}
2.ElasticSearch 
성능최적화 
이해 
2.1.성능에영향을미치는요소들 
2.2.설정최적화 
2.3.색인최적화 
2.4.질의최적화
장비관점 
Network bandwidth? 
Disk I/O? 
RAM? 
CPU cores? 
2.1.성능에영향을미치는요소들 
문서관점 
Document size? 
Total index data size? 
Data size increase? 
Store period? 
서비스관점 
Analyzer? 
Analyze fields? 
Indexed field size? 
Boosting? 
Realtimeor batch? 
Queries?
In ElasticSearchsite: 
If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need? 
This is a question that is impossible to answer in the general case. There are just too many variables: the hardware that you use, the size and complexity of your documents, how you index and analyze those documents, the types of queries that you run, the aggregations that you perform, how you model your data, etc., etc. 
2.1.성능에영향을미치는요소들 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
In ElasticSearchsite: 
Fortunately, it is an easy question to answer in the specific case: yours. 
1.Create a cluster consisting of a single server, with the hardware that you are considering using in production. 
2.Create an index with the same settings and analyzers that you plan to use in production, but with only on primary shard and no replicas. 
3.Fill it with real documents (or as close to real as you can get). 
4.Run real queries and aggregations (or as close to real as you can get). 
2.1.성능에영향을미치는요소들 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
운영체제관점 
Increase File descriptor 
Avoid swap 
2.2.설정최적화 
검색엔진관점 
Avoid swap 
Thread pool 
Segment merge 
Index buffer size 
Storage device 
Use recent version
Cluster restart관점 
Optimize (max segments: 5) 
Close index 
Restart after set “disable_allocation: true” 
Increase recovery limits 
2.2.설정최적화
Modeling 
Disable “_all”fields 
Disable “_source” fields, so far as possible 
Set right value to “_id” fields 
Set false to “store” fields, so far as possible 
2.3.색인최적화
Sizing 
Indice는데이터의크기를관리할수있는용도로사용한다. 
Indice당primary shard 수는data node 수보다크거나같아야한다. (number_of_shards>= number_of_data_nodes) 
Indice당shard 수는200개미만으로구성한다. 
Shard 하나의크기는50GB 미만으로구성한다. 
2.3.색인최적화
Client 
Bulk API를사용한다. 
Hardware 성능을점검한다. 
Exception을확인한다. 
Thread pools을점검한다. 
1110(Node,Indice,Shard,Replica)으로점검한다. 
Optimize 대신Flush와Refresh를활용한다. 
2.3.색인최적화
Bulk indexing 
Request 당크기는5 ~ 15MB 
Request 당문서크기는1,000 ~ 5,000개 
Server bulk thread pool 크기는core size ×5 보다작거나같게설정 
Client bulk connection pool 크기는3 ~ 10개×number_of_data_nodes 
Client ping timeout은30 ~ 90초로설정 
Client node sampler interval은30 ~ 90초로설정 
Client transport sniff를true로설정 
Client network TCP blocking을false로설정 
2.3.색인최적화
Bulk indexing 
Disable refresh_interval 
Disable replica 
Use flush & refresh (instead of optimize) 
2.3.색인최적화 
Bulk indexing flow 
Update 
Settings 
Bulk 
Request 
Flush & 
Refresh 
Update 
Settings
Shards 
Data 분산을위해shard 수를늘린다. 
Replica shard 수를늘린다. 
2.4.질의최적화 
Data distribution 
Use routing 
Check _id 
ShardId= hash(_id) % number_of_primary_shards
Query 
항상같은node 로query hitting이되지않도록한다. 
Zero hit query를줄여야한다. 
Query 결과를cache 한다. 
Avoid deep pagination. 
Sorting : number_of_shard×(from +size) 
Script 사용시_source, _field 대신doc[‘field’]를사용한다. 
2.4.질의최적화 
Search type 
Query and fetch 
Query then fetch 
Count 
Scan
Queries vs. Filters 
Query 대신filtered query와filter를사용한다. 
And/or/not filter 대신boolfilter를사용한다. 
2.4.질의최적화 
Queries 
Filters 
Relevance 
Binary yes/no 
Full text 
Exactvalues 
Not cached 
Cached 
Slower 
Faster 
“query” : { 
“match_all” : { 
} 
} 
“query” : { 
“filtered” : { 
“query” : { 
“match_all” : {} 
} 
} 
}
3.ElasticSearch 
빅데이터 
활용 
3.1.Hadoop 통합 
3.2.SQL on ElasticSearch
ElasticSearchHadoop 활용 
Big data 분석을위한도구 
Snapshot & Restore 저장소 
ElasticSearchHadoop plugin 도구제공 
3.1.Hadoop 통합
Indexing 
3.1.Hadoop 통합 
ElasticSearch 
Hadoop plugin 
Read raw data 
Integrate natively 
Bulk indexing 
Java client 
application 
BulkRequestBuilder 
REST API 
Control concurrency request
Indexing 
ElasticSearch 
Hadoop 
Plugin 
MapReduce 
3.1.Hadoop 통합 
Configuration conf= new Configuration(); 
…중략… 
conf.set(Configuration.ES_NODES, “localhost:9200”); 
conf.set(Configuration.ES_RESOURCE, “blog/post”); 
…중략… 
Job job= new Job(conf); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(EsOutputFormat.class); 
job.setMapOutputValueClass(LinkedMapWritable.class); 
job.setMapperClass(TabMapper.class); 
job.setNumReduceTasks(0); 
File fl= new File(“blog/post.txt”); 
long splitSize= fl.length() / 3; 
TextInputFormat.setMaxInputSplitSize(job, splitSize); 
TextInputFormat.setMinInputSplitSize(job, 50); 
booleanresult = job.waitForCompletion(true);
Indexing 
Java 
Client 
Application 
MapReduce 
3.1.Hadoop 통합 
public static void main(String[] args) throws Exception { 
...중략... 
settings= Connector.buildSettings(esCluster); 
client= Connector.buildClient(settings, esNodes.split(",")); 
runBeforeConfig(esIndice); 
Job job= new Job(conf); 
...중략... 
for ( String distJar: esDistributedCacheJars) { 
DistributedCache.addFileToClassPath( 
new Path(esDistributedCachePath+"/"+distJar), 
job.getConfiguration()); 
} 
...중략... 
if ( "true".equalsIgnoreCase(esOptimize) ) { 
runOptimize(esIndice); 
} else { 
runRefreshAndFlush(esIndice); 
} 
runAfterConfig(esIndice, replica); 
}
Indexing 
Java 
Client 
Application 
MapReduce 
3.1.Hadoop 통합 
public void map(Object key, Object value, Context context) 
throws Exception { 
...중략... 
IndexRequestindexRequest= new IndexRequest(); 
indexRequest= indexRequest.index(esIndice) 
.type(esType) 
.source(doc); 
...중략... 
bulkRequest.add(indexRequest); 
...중략... 
bulkResponse= bulkRequest.setConsistencyLevel(QUORUM) 
.setReplicationType(ASYNC) 
.setRefresh(false) 
.execute() 
.actionGet(); 
...중략... 
}
Searching 
3.1.Hadoop 통합 
ElasticSearchHadoop plugin 
Integrate natively 
Query request 
Java client application 
Query request
Searching 
ElasticSearch 
Hadoop 
Plugin 
MapReduce 
3.1.Hadoop 통합 
public static class SearchMapperextends Mapper { 
@Override 
public void map(Object key, Object value, Context context) 
throws IOException, InterruptedException{ 
Text docId= (Text) key; 
LinkedMapWritabledoc = (LinkedMapWritable) value; 
System.out.println(docId); 
} 
} 
public static void main(String[] args) throws Exception { 
Configuration conf= new Configuration(); 
...중략... 
Job job= new Job(conf); 
...중략... 
conf.set(ConfigurationOptions.ES_QUERY, 
"{ "query" : { "match_all" : {} } }"); 
job.setNumReduceTasks(0); 
booleanresult = job.waitForCompletion(true); 
}
Searching 
Java 
Client 
Application 
3.1.Hadoop 통합 
SearchResponsesearchResponse; 
MatchAllQueryBuilder 
matchAllQueryBuilder= new MatchAllQueryBuilder(); 
searchResponse= client.prepareSearch(esIndice) 
.setQuery(matchAllQueryBuilder) 
.execute() 
.actionGet(); 
System.out.println(searchResponse.toString());
ElasticSearchSQL 이란? 
쉬운접근성과데이터분석도구를제공한다. 
표준SQL 문법을Query DSL로변환한다. 
표준SQL 문법을사용하여검색엔진으로CRUD 연산을수행할수있다. 
JDBC drive와CLI 기능을제공하고있다. 
Apache Tajo용SQL analyzer를사용하고있다. 
3.2.SQL on ElasticSearch
ElasticSearchJDBC driver 
3.2.SQL on ElasticSearch 
Client 
Application 
JDBC 
Driver 
Elastic 
Search 
SQL 
Analyzer 
Algebra 
Expression 
Query DSL 
Planner 
Query 
Execution 
SQL 
DSL
ElasticSearchSQL Syntax 
Create database/table 
Drop database/table 
Select/Insert/Upsert/Delete 
Use database 
Show databases/tables 
Desctable 
3.2.SQL on ElasticSearch
ElasticSearchAnalytics(Aggregations) SQL 
Min/max/sum/avg/stats/extended_stats 
Value_count/percentiles/cardinality 
Global_* 
Terms/range/date_range 
3.2.SQL on ElasticSearch
ElasticSearchSQL vs. Query DSL 
3.2.SQL on ElasticSearch 
SQL 
Query DSL 
SELECT * 
FROM type_name 
LIMIT 0/10 
"match_all": {} 
… 
“from” : 0, 
“size” : 10 
SELECT field1, field2 
FROM type_name 
WHERE search_field= ‘elasticsearch’ 
"term": { 
"search_field": { 
"value": "elasticsearch" 
} 
} 
… 
"fields": [ 
"field1","field2" 
]
ElasticSearchSQL vs. Query DSL 
3.2.SQL on ElasticSearch 
SQL 
Query DSL 
SELECT * 
FROM type_name 
WHERE search_ field > ‘20140624235959’ 
ORDER BY search_fieldDESC 
"range": { 
"search_field": { 
"gt": "20140624235959" 
} 
} 
… 
"sort": [ 
{ 
"search_field": { 
"order": "desc" 
} 
} 
]
SQL on ElasticSearch 
Demo
ElasticSearch이해 
Lucene기반의분산검색엔진 
ElasticSearch성능최적화이해 
정답은없지만… 
항상좋은장비에최신버전을사용한다. 
확장가능한modeling과sizing을구성한다. 
병목구간을항상모니터링한다. 
Query와filter를목적에맞게사용한다. 
Bulk API를사용한다. 
ElasticSearch빅데이터활용 
Hadoop과SQL로쉽게분석도구로활용한다. 
마무리하며…
Q&A 
E-mail : sophistlv@gmail.com
THANK YOU

More Related Content

What's hot (20)

PDF
엘라스틱서치 실무 가이드_202204.pdf
한 경만
 
PDF
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Web Services Korea
 
PDF
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
Amazon Web Services Korea
 
PDF
Amazon EKS로 간단한 웹 애플리케이션 구축하기 - 김주영 (AWS) :: AWS Community Day Online 2021
AWSKRUG - AWS한국사용자모임
 
PDF
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
PPTX
4. 대용량 아키텍쳐 설계 패턴
Terry Cho
 
PDF
AWS EMR Cost optimization
SANG WON PARK
 
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon Web Services Korea
 
PDF
AWS SAM으로 서버리스 아키텍쳐 운영하기 - 이재면(마이뮤직테이스트) :: AWS Community Day 2020
AWSKRUG - AWS한국사용자모임
 
PDF
게임의 성공을 위한 Scalable 한 데이터 플랫폼 사례 공유 - 오승용, 데이터 플랫폼 리더, 데브시스터즈 ::: Games on AW...
Amazon Web Services Korea
 
PDF
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Yongho Ha
 
PDF
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
NAVER D2
 
PDF
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Web Services Korea
 
PDF
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon Web Services Korea
 
PDF
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
PDF
[Pgday.Seoul 2017] 7. PostgreSQL DB Tuning 기업사례 - 송춘자
PgDay.Seoul
 
PDF
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon Web Services Korea
 
PDF
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
SeungYong Oh
 
PDF
[부스트캠프 Tech Talk] 진명훈_datasets로 협업하기
CONNECT FOUNDATION
 
PPTX
Elasticsearch development case
일규 최
 
엘라스틱서치 실무 가이드_202204.pdf
한 경만
 
Amazon Redshift의 이해와 활용 (김용우) - AWS DB Day
Amazon Web Services Korea
 
데브시스터즈 데이터 레이크 구축 이야기 : Data Lake architecture case study (박주홍 데이터 분석 및 인프라 팀...
Amazon Web Services Korea
 
Amazon EKS로 간단한 웹 애플리케이션 구축하기 - 김주영 (AWS) :: AWS Community Day Online 2021
AWSKRUG - AWS한국사용자모임
 
[215]네이버콘텐츠통계서비스소개 김기영
NAVER D2
 
4. 대용량 아키텍쳐 설계 패턴
Terry Cho
 
AWS EMR Cost optimization
SANG WON PARK
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon Web Services Korea
 
AWS SAM으로 서버리스 아키텍쳐 운영하기 - 이재면(마이뮤직테이스트) :: AWS Community Day 2020
AWSKRUG - AWS한국사용자모임
 
게임의 성공을 위한 Scalable 한 데이터 플랫폼 사례 공유 - 오승용, 데이터 플랫폼 리더, 데브시스터즈 ::: Games on AW...
Amazon Web Services Korea
 
Spark 의 핵심은 무엇인가? RDD! (RDD paper review)
Yongho Ha
 
[124]네이버에서 사용되는 여러가지 Data Platform, 그리고 MongoDB
NAVER D2
 
Amazon Aurora Deep Dive (김기완) - AWS DB Day
Amazon Web Services Korea
 
AWS Lake Formation을 통한 손쉬운 데이터 레이크 구성 및 관리 - 윤석찬 :: AWS Unboxing 온라인 세미나
Amazon Web Services Korea
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
[Pgday.Seoul 2017] 7. PostgreSQL DB Tuning 기업사례 - 송춘자
PgDay.Seoul
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon Web Services Korea
 
Elastic Stack 을 이용한 게임 서비스 통합 로깅 플랫폼 - elastic{on} 2019 Seoul
SeungYong Oh
 
[부스트캠프 Tech Talk] 진명훈_datasets로 협업하기
CONNECT FOUNDATION
 
Elasticsearch development case
일규 최
 

Similar to [2D1]Elasticsearch 성능 최적화 (20)

PDF
[제3회 스포카콘] Elasticsearch 동기화 개선을 위한 고군분투기
유리 한
 
PPTX
Elasticsearch 설치 및 기본 활용
종민 김
 
PDF
Elastic search 클러스터관리
HyeonSeok Choi
 
PDF
Apache Spark Overview part1 (20161107)
Steve Min
 
PDF
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
 
PDF
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
John Kim
 
PPTX
microservice architecture public education v2
uEngine Solutions
 
PPTX
Cache in API Gateway
GilWon Oh
 
PPTX
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
문기 박
 
PDF
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
Amazon Web Services Korea
 
PDF
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
PDF
스마일샤크 AWS 웨비나 자료
SmileShark
 
PPTX
엘라스틱서치 적합성 이해하기 20160630
Yong Joon Moon
 
PDF
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
PDF
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon Web Services Korea
 
PDF
Building multi tenancy enterprise applications
uEngine Solutions
 
PDF
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Donghan Kim
 
PDF
PostgreSQL 9.6 새 기능 소개
PgDay.Seoul
 
PPTX
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
PDF
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
[제3회 스포카콘] Elasticsearch 동기화 개선을 위한 고군분투기
유리 한
 
Elasticsearch 설치 및 기본 활용
종민 김
 
Elastic search 클러스터관리
HyeonSeok Choi
 
Apache Spark Overview part1 (20161107)
Steve Min
 
Gruter_TECHDAY_2014_01_SearchEngine (in Korean)
Gruter
 
DEVIEW - 오픈소스를 활용한 분산아키텍처 구현기술
John Kim
 
microservice architecture public education v2
uEngine Solutions
 
Cache in API Gateway
GilWon Oh
 
SQream-GPU가속 초거대 정형데이타 분석용 SQL DB-제품소개-박문기@메가존클라우드
문기 박
 
AWS를 활용한 Big Data 실전 배치 사례 :: 이한주 :: AWS Summit Seoul 2016
Amazon Web Services Korea
 
실시간 Streaming using Spark and Kafka 강의교재
hkyoon2
 
스마일샤크 AWS 웨비나 자료
SmileShark
 
엘라스틱서치 적합성 이해하기 20160630
Yong Joon Moon
 
[Demo session] 관리형 Kafka 서비스 - Oracle Event Hub Service
Oracle Korea
 
Amazon DocumentDB vs MongoDB 의 내부 아키텍쳐 와 장단점 비교
Amazon Web Services Korea
 
Building multi tenancy enterprise applications
uEngine Solutions
 
제3회 사내기술세미나-hadoop(배포용)-dh kim-2014-10-1
Donghan Kim
 
PostgreSQL 9.6 새 기능 소개
PgDay.Seoul
 
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 
Ad

More from NAVER D2 (20)

PDF
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D2
 
PDF
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
 
PDF
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D2
 
PDF
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D2
 
PDF
[235]Wikipedia-scale Q&A
NAVER D2
 
PDF
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D2
 
PDF
[243] Deep Learning to help student’s Deep Learning
NAVER D2
 
PDF
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
 
PDF
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D2
 
PDF
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D2
 
PDF
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D2
 
PDF
[224]네이버 검색과 개인화
NAVER D2
 
PDF
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D2
 
PDF
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
PDF
[213] Fashion Visual Search
NAVER D2
 
PDF
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
PDF
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D2
 
PDF
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D2
 
PDF
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
 
PDF
[231] Clova 화자인식
NAVER D2
 
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D2
 
[235]Wikipedia-scale Q&A
NAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
NAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D2
 
[224]네이버 검색과 개인화
NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
[213] Fashion Visual Search
NAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
 
[231] Clova 화자인식
NAVER D2
 
Ad

Recently uploaded (20)

PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
July Patch Tuesday
Ivanti
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 

[2D1]Elasticsearch 성능 최적화

  • 2. 정호욱책임/ BigDataPlatform Team 그루터 ElasticSearch의이해와 성능최적화
  • 3. 저는요… •정호욱 •BigdataPlatform, GruterCorp •[email protected] •http://jjeong.tistory.com •E-book: 실무예제로배우는Elasticsearch검색엔진-입문편
  • 4. 1.ElasticSearch이해 2.ElasticSearch 성능최적화이해 3.ElasticSearch 빅데이터활용 CONTENTS
  • 5. 1.ElasticSearch 이해 1.1.ElasticSearch와동작방식 1.2.설치및실행하기 1.3.Modeling 하기
  • 6. ElasticSearch란? Lucene기반의오픈소스검색엔진 1.1.ElasticSearch와동작방식 ElasticSearch특징 Easy Real time search & analytics Distributed & highly available search engine
  • 7. ElasticSearch구성 Physical구성 Logical구성 1.1.ElasticSearch와동작방식 Cluster Index Node Node Node Indice Indice Indice Shard Shard Shard Shard Shard Shard Shard Shard Shard Type Type Type Document Document Document field:value field:value field:value field:value field:value field:value field:value field:value field:value [Physical 구성] [Logical 구성]
  • 8. ElasticSearchNodes Master node Data node Search load balancer node Client node 1.1.ElasticSearch와동작방식 Master node.master: true Data node.data: true Search LB node.master: false node.data: false Client node.client: true
  • 9. ElasticSearchNodes 구성예 1.1.ElasticSearch와동작방식 Case 1) All round player node.master: true node.data: true node.master: true node.data: true node.master: true node.data: true Case 2) Master Data node.master: true node.data: false node.master: true node.data: false node.master: false node.data: true node.master: false node.data: true Case 3) Master Data Search LB node.master: true node.data: false node.master: true node.data: false node.master: false node.data: true node.master: false node.data: true node.master: false node.data: false node.master: false node.data: false
  • 10. ElasticSearchvs RDBMS 1.1.ElasticSearch와동작방식 Relational Database ElasticSearch Database Index Table Type Row Document Column Field Index Analyze Primary key _id Schema Mapping Physical partition Shard Logical partition Route Relational Parent/Child, Nested SQL Query DSL
  • 11. ElasticSearchshard replication 1.1.ElasticSearch와동작방식 POST /my_index/_settings{ "number_of_replicas":1} POST /my_index/_settings{ "number_of_replicas":2} http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/replica-shards
  • 12. Creating, indexing and deleting a document 1.1.ElasticSearch와동작방식 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-write.html
  • 13. Retrieve, query and fetch a document 1.1.ElasticSearch와동작방식 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/distrib-read.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_query_phase.html http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_fetch_phase.html
  • 14. 설치하기 다운로드 압축해제 1.2.설치및실행하기 실행하기 실행 테스트 Create index Add document Get document Search document
  • 15. Indice/type design Time-based/User-based data Relational data 1TB 1.3.Modeling 하기 Field design 검색대상필드 분석대상필드 정렬대상필드 저장대상필드 Primary key 필드
  • 16. Modeling 구성예 1.3.Modeling 하기 Indice1 Indice2 Indice3 IndiceA IndiceB IndiceC Type Parent Type Child Type Parent Type Child Type Child Type 1 : N 1 : N 1 : N
  • 17. Shard design number_of_shards>= number_of_data_nodes number_of_replica<= number_of_data_nodes-1 1.3.Modeling 하기 Shard sizing Index 당최대shard 수: 200 개이하 Shard 하나당최대크기: 20 ~ 50GB Shard 하나당최소크기: ~ 3GB
  • 18. Hash partition test 1.3.Modeling 하기 public class EsHashPartitionTest{ @Test public void testHashPartiion() { ……중략…… for ( inti=0; i<1000000; i++ ) { intshardId= MathUtils.mod(hash(String.valueOf(i)), shardSize); shards.add(shardId, (long) ++partSize[shardId]); } ……중략…… } public inthash(String routing) { return hashFunction.hash(routing); } }
  • 19. 2.ElasticSearch 성능최적화 이해 2.1.성능에영향을미치는요소들 2.2.설정최적화 2.3.색인최적화 2.4.질의최적화
  • 20. 장비관점 Network bandwidth? Disk I/O? RAM? CPU cores? 2.1.성능에영향을미치는요소들 문서관점 Document size? Total index data size? Data size increase? Store period? 서비스관점 Analyzer? Analyze fields? Indexed field size? Boosting? Realtimeor batch? Queries?
  • 21. In ElasticSearchsite: If 1 shard is too few and 1,000 shards are too many, how do I know how many shards I need? This is a question that is impossible to answer in the general case. There are just too many variables: the hardware that you use, the size and complexity of your documents, how you index and analyze those documents, the types of queries that you run, the aggregations that you perform, how you model your data, etc., etc. 2.1.성능에영향을미치는요소들 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
  • 22. In ElasticSearchsite: Fortunately, it is an easy question to answer in the specific case: yours. 1.Create a cluster consisting of a single server, with the hardware that you are considering using in production. 2.Create an index with the same settings and analyzers that you plan to use in production, but with only on primary shard and no replicas. 3.Fill it with real documents (or as close to real as you can get). 4.Run real queries and aggregations (or as close to real as you can get). 2.1.성능에영향을미치는요소들 http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/capacity-planning.html
  • 23. 운영체제관점 Increase File descriptor Avoid swap 2.2.설정최적화 검색엔진관점 Avoid swap Thread pool Segment merge Index buffer size Storage device Use recent version
  • 24. Cluster restart관점 Optimize (max segments: 5) Close index Restart after set “disable_allocation: true” Increase recovery limits 2.2.설정최적화
  • 25. Modeling Disable “_all”fields Disable “_source” fields, so far as possible Set right value to “_id” fields Set false to “store” fields, so far as possible 2.3.색인최적화
  • 26. Sizing Indice는데이터의크기를관리할수있는용도로사용한다. Indice당primary shard 수는data node 수보다크거나같아야한다. (number_of_shards>= number_of_data_nodes) Indice당shard 수는200개미만으로구성한다. Shard 하나의크기는50GB 미만으로구성한다. 2.3.색인최적화
  • 27. Client Bulk API를사용한다. Hardware 성능을점검한다. Exception을확인한다. Thread pools을점검한다. 1110(Node,Indice,Shard,Replica)으로점검한다. Optimize 대신Flush와Refresh를활용한다. 2.3.색인최적화
  • 28. Bulk indexing Request 당크기는5 ~ 15MB Request 당문서크기는1,000 ~ 5,000개 Server bulk thread pool 크기는core size ×5 보다작거나같게설정 Client bulk connection pool 크기는3 ~ 10개×number_of_data_nodes Client ping timeout은30 ~ 90초로설정 Client node sampler interval은30 ~ 90초로설정 Client transport sniff를true로설정 Client network TCP blocking을false로설정 2.3.색인최적화
  • 29. Bulk indexing Disable refresh_interval Disable replica Use flush & refresh (instead of optimize) 2.3.색인최적화 Bulk indexing flow Update Settings Bulk Request Flush & Refresh Update Settings
  • 30. Shards Data 분산을위해shard 수를늘린다. Replica shard 수를늘린다. 2.4.질의최적화 Data distribution Use routing Check _id ShardId= hash(_id) % number_of_primary_shards
  • 31. Query 항상같은node 로query hitting이되지않도록한다. Zero hit query를줄여야한다. Query 결과를cache 한다. Avoid deep pagination. Sorting : number_of_shard×(from +size) Script 사용시_source, _field 대신doc[‘field’]를사용한다. 2.4.질의최적화 Search type Query and fetch Query then fetch Count Scan
  • 32. Queries vs. Filters Query 대신filtered query와filter를사용한다. And/or/not filter 대신boolfilter를사용한다. 2.4.질의최적화 Queries Filters Relevance Binary yes/no Full text Exactvalues Not cached Cached Slower Faster “query” : { “match_all” : { } } “query” : { “filtered” : { “query” : { “match_all” : {} } } }
  • 33. 3.ElasticSearch 빅데이터 활용 3.1.Hadoop 통합 3.2.SQL on ElasticSearch
  • 34. ElasticSearchHadoop 활용 Big data 분석을위한도구 Snapshot & Restore 저장소 ElasticSearchHadoop plugin 도구제공 3.1.Hadoop 통합
  • 35. Indexing 3.1.Hadoop 통합 ElasticSearch Hadoop plugin Read raw data Integrate natively Bulk indexing Java client application BulkRequestBuilder REST API Control concurrency request
  • 36. Indexing ElasticSearch Hadoop Plugin MapReduce 3.1.Hadoop 통합 Configuration conf= new Configuration(); …중략… conf.set(Configuration.ES_NODES, “localhost:9200”); conf.set(Configuration.ES_RESOURCE, “blog/post”); …중략… Job job= new Job(conf); job.setInputFormatClass(TextInputFormat.class); job.setOutputFormatClass(EsOutputFormat.class); job.setMapOutputValueClass(LinkedMapWritable.class); job.setMapperClass(TabMapper.class); job.setNumReduceTasks(0); File fl= new File(“blog/post.txt”); long splitSize= fl.length() / 3; TextInputFormat.setMaxInputSplitSize(job, splitSize); TextInputFormat.setMinInputSplitSize(job, 50); booleanresult = job.waitForCompletion(true);
  • 37. Indexing Java Client Application MapReduce 3.1.Hadoop 통합 public static void main(String[] args) throws Exception { ...중략... settings= Connector.buildSettings(esCluster); client= Connector.buildClient(settings, esNodes.split(",")); runBeforeConfig(esIndice); Job job= new Job(conf); ...중략... for ( String distJar: esDistributedCacheJars) { DistributedCache.addFileToClassPath( new Path(esDistributedCachePath+"/"+distJar), job.getConfiguration()); } ...중략... if ( "true".equalsIgnoreCase(esOptimize) ) { runOptimize(esIndice); } else { runRefreshAndFlush(esIndice); } runAfterConfig(esIndice, replica); }
  • 38. Indexing Java Client Application MapReduce 3.1.Hadoop 통합 public void map(Object key, Object value, Context context) throws Exception { ...중략... IndexRequestindexRequest= new IndexRequest(); indexRequest= indexRequest.index(esIndice) .type(esType) .source(doc); ...중략... bulkRequest.add(indexRequest); ...중략... bulkResponse= bulkRequest.setConsistencyLevel(QUORUM) .setReplicationType(ASYNC) .setRefresh(false) .execute() .actionGet(); ...중략... }
  • 39. Searching 3.1.Hadoop 통합 ElasticSearchHadoop plugin Integrate natively Query request Java client application Query request
  • 40. Searching ElasticSearch Hadoop Plugin MapReduce 3.1.Hadoop 통합 public static class SearchMapperextends Mapper { @Override public void map(Object key, Object value, Context context) throws IOException, InterruptedException{ Text docId= (Text) key; LinkedMapWritabledoc = (LinkedMapWritable) value; System.out.println(docId); } } public static void main(String[] args) throws Exception { Configuration conf= new Configuration(); ...중략... Job job= new Job(conf); ...중략... conf.set(ConfigurationOptions.ES_QUERY, "{ "query" : { "match_all" : {} } }"); job.setNumReduceTasks(0); booleanresult = job.waitForCompletion(true); }
  • 41. Searching Java Client Application 3.1.Hadoop 통합 SearchResponsesearchResponse; MatchAllQueryBuilder matchAllQueryBuilder= new MatchAllQueryBuilder(); searchResponse= client.prepareSearch(esIndice) .setQuery(matchAllQueryBuilder) .execute() .actionGet(); System.out.println(searchResponse.toString());
  • 42. ElasticSearchSQL 이란? 쉬운접근성과데이터분석도구를제공한다. 표준SQL 문법을Query DSL로변환한다. 표준SQL 문법을사용하여검색엔진으로CRUD 연산을수행할수있다. JDBC drive와CLI 기능을제공하고있다. Apache Tajo용SQL analyzer를사용하고있다. 3.2.SQL on ElasticSearch
  • 43. ElasticSearchJDBC driver 3.2.SQL on ElasticSearch Client Application JDBC Driver Elastic Search SQL Analyzer Algebra Expression Query DSL Planner Query Execution SQL DSL
  • 44. ElasticSearchSQL Syntax Create database/table Drop database/table Select/Insert/Upsert/Delete Use database Show databases/tables Desctable 3.2.SQL on ElasticSearch
  • 45. ElasticSearchAnalytics(Aggregations) SQL Min/max/sum/avg/stats/extended_stats Value_count/percentiles/cardinality Global_* Terms/range/date_range 3.2.SQL on ElasticSearch
  • 46. ElasticSearchSQL vs. Query DSL 3.2.SQL on ElasticSearch SQL Query DSL SELECT * FROM type_name LIMIT 0/10 "match_all": {} … “from” : 0, “size” : 10 SELECT field1, field2 FROM type_name WHERE search_field= ‘elasticsearch’ "term": { "search_field": { "value": "elasticsearch" } } … "fields": [ "field1","field2" ]
  • 47. ElasticSearchSQL vs. Query DSL 3.2.SQL on ElasticSearch SQL Query DSL SELECT * FROM type_name WHERE search_ field > ‘20140624235959’ ORDER BY search_fieldDESC "range": { "search_field": { "gt": "20140624235959" } } … "sort": [ { "search_field": { "order": "desc" } } ]
  • 49. ElasticSearch이해 Lucene기반의분산검색엔진 ElasticSearch성능최적화이해 정답은없지만… 항상좋은장비에최신버전을사용한다. 확장가능한modeling과sizing을구성한다. 병목구간을항상모니터링한다. Query와filter를목적에맞게사용한다. Bulk API를사용한다. ElasticSearch빅데이터활용 Hadoop과SQL로쉽게분석도구로활용한다. 마무리하며…