SlideShare a Scribd company logo
Debugging and
Testing ES Systems
Chris Birchall
2013/8/29
Elasticsearch 勉強会 第1回
#elasticsearchjp
Elasticsearch and me
● At Infoscience, helped build a log
management product based on ES +
Hadoop
● At M3, ES evangelist (??)
○ Maintain ES cluster
○ Help dev teams integrate ES into their apps
Twitter: @cbirchall
Github: https://github.com/cb372
Search at M3
● Using ES for all new services
○ Search, recommendation (MoreLikeThis)
● Slowly migrating other services from Solr
● A few legacy services use Lucene directly
● Running all indices on one ES cluster
● Kuromoji for Japanese content
Debugging
Mostly debugging of queries
● “Why doesn’t doc X match query Y?”
● “Why does this search return no results?”
Operational issues are very rare
● ES’s clustering magic is surprisingly
stable!
● No performance issues so far
Debugging - Step 1
Check for typos!
ES will silently ignore many typos in
settings/mapping definitions
Typo - Example
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mapping" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Let’s create a new index...
Typo - Example (cont’d)
{"ok":true,"acknowledged":true}
Response from ES:
OK, seems fine...
Typo - Example (cont’d)
$ curl localhost:9200/myapp/_mappings?pretty
Response from ES:
{
"myapp" : { }
}
Eh?
Where are my lovingly-crafted mappings?!
Now check the mappings...
Typo - Example (cont’d)
$ curl -X PUT localhost:9200/myapp -d '{
"settings": {
"number_of_shards": 3
},
"mappings" : {
"article" : {
"_source": { "enabled": false },
"properties": {
"title": { "type": "string", "store": "true" },
"body": { "type": "string", "store": "true" },
...
}
},
...
}'
Oops!
Debugging - Step 2
Set up a local environment
● Makes it easy to wipe & rebuild index
Setting up a local env (OSX)
# Install
$ brew install elasticsearch
# Kuromoji plugin (optional)
$ /usr/local/opt/elasticsearch/bin/plugin -install
elasticsearch/elasticsearch-analysis-kuromoji/1.5.0
# Start
$ elasticsearch
# Create index
$ curl -X PUT localhost:9200/my_app -d '{ ... }'
# Insert some documents
$ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }'
$ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }'
# Done!
Useful commands - Analyze
$ curl 'localhost:9200/myindex/_analyze?pretty' /
-d '東京特許許可局許可局長'
{
"tokens" : [ {
"token" : "東京",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 1
}, {
"token" : "特許",
"start_offset" : 2,
"end_offset" : 4,
"type" : "word",
...
How is my
document/query
being
tokenized?
Useful commands - Explain
$ curl 'localhost:9200/kuro/docs/123/_explain?pretty' /
-d '{ "query": { "term": { "body": "東京" } } }'
{
...
"matched" : true,
"explanation" : {
"value" : 0.375,
"description" : "weight(body:東京 in 0)
[PerFieldSimilarity], result of:",
"details" : [ {
"value" : 0.375,
"description" : "fieldWeight in 0, product of:",
"details" : [ {
"value" : 1.0,
"description" : "tf(freq=1.0), with freq of:",
"details" : [ {
"value" : 1.0,
"description" : "termFreq=1.0"
...
Why does this
document (not)
match this
query?
Specify document ID
Tuning queries
Parameters to tweak
● default_operator (AND/OR)
● auto_generate_phrase_queries
● minumum_should_match
● Stop words/tags
● Kuromoji
○ Segmentation mode
○ Reading form filter
○ Disable Kuromoji! (for some fields)
Why disable Kuromoji?
Problem: occasionally weird tokenization
● AND query will fail, because not all terms match
● OR query will match any document with 病院
→ low precision
Phrase Terms
特定医療法人財団 日本会 東日本病院
(document field)
特定、医療、法人、財団、
日本、会、東日本、病院
東日本 (query) 東日、東日本、本
東日本病院 (query) 東、東日本、日本、病院
Useful plugin - Head
$ bin/plugin -install mobz/elasticsearch-head
http://mobz.github.io/elasticsearch-head/
Testing
Main goal: Ensure that queries return the
results that we expect
● Test coverage of representative queries
○ Freedom to tune for a given query without
breaking other queries
Ideally, tests should:
● Run fast
● Run standalone (i.e. no need to have an
ES server running)
Testing - Java
elasticsearch-test is awesome
● DSL to set up/tear down ES
● Annotations + JUnit runner
● ES runs in-process
○ No need to start an external ES server
● Index is stored in-memory
○ Runs quickly
https://github.com/tlrx/elasticsearch-test
https://github.com/cb372/elasticsearch-test-example
Testing - Java
Simple elasticsearch-test example
Testing - Ruby
Simple Rails + Tire + RSpec example
https://github.com/cb372/elasticsearch-rspec-example
We’re hiring!
TODO We are hiring slide
http://bit.ly/m3jobs

More Related Content

What's hot (20)

PDF
Logstash: Get to know your logs
SmartLogic
 
PDF
ニコニコ動画を検索可能にしてみよう
genta kaneyama
 
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
PPTX
Elasticsearch 설치 및 기본 활용
종민 김
 
PDF
Introduction to solr
Sematext Group, Inc.
 
PDF
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
PPTX
Introduction to ELK
YuHsuan Chen
 
PDF
Application Logging With The ELK Stack
benwaine
 
PDF
LogStash in action
Manuj Aggarwal
 
PPTX
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
PDF
Simple search with elastic search
markstory
 
PDF
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
PDF
Nodejs - A quick tour (v6)
Felix Geisendörfer
 
PDF
Elastic Search
Lukas Vlcek
 
PDF
elasticsearch - advanced features in practice
Jano Suchal
 
PDF
아파트 정보를 이용한 ELK stack 활용 - 오근문
NAVER D2
 
PDF
Elastic search 클러스터관리
HyeonSeok Choi
 
PDF
Dcm#8 elastic search
Ivan Wallarm
 
PPTX
ElasticSearch AJUG 2013
Roy Russo
 
PDF
PySpark with Juypter
Li Ming Tsai
 
Logstash: Get to know your logs
SmartLogic
 
ニコニコ動画を検索可能にしてみよう
genta kaneyama
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Elasticsearch 설치 및 기본 활용
종민 김
 
Introduction to solr
Sematext Group, Inc.
 
[2D1]Elasticsearch 성능 최적화
NAVER D2
 
Introduction to ELK
YuHsuan Chen
 
Application Logging With The ELK Stack
benwaine
 
LogStash in action
Manuj Aggarwal
 
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Simple search with elastic search
markstory
 
Real-time search in Drupal. Meet Elasticsearch
Alexei Gorobets
 
Nodejs - A quick tour (v6)
Felix Geisendörfer
 
Elastic Search
Lukas Vlcek
 
elasticsearch - advanced features in practice
Jano Suchal
 
아파트 정보를 이용한 ELK stack 활용 - 오근문
NAVER D2
 
Elastic search 클러스터관리
HyeonSeok Choi
 
Dcm#8 elastic search
Ivan Wallarm
 
ElasticSearch AJUG 2013
Roy Russo
 
PySpark with Juypter
Li Ming Tsai
 

Viewers also liked (13)

PDF
elasticsearchプラグイン入門
Shinsuke Sugaya
 
PDF
Elasticsearch入門 pyfes 201207
Jun Ohtani
 
PPT
Logstash
琛琳 饶
 
PDF
Elasticsearchを使うときの注意点 公開用スライド
崇介 藤井
 
PDF
リクルート流Elasticsearchの使い方
Recruit Technologies
 
PDF
Elasticsearchのサジェスト機能を使った話
ktaro_w
 
PDF
Elasticsearchで作る形態素解析サーバ
Shinsuke Sugaya
 
PPTX
Elasticsearchインデクシングのパフォーマンスを測ってみた
Ryoji Kurosawa
 
PDF
Amebaにおけるログ解析基盤Patriotの活用事例
cyberagent
 
PPTX
Flumeを活用したAmebaにおける大規模ログ収集システム
Satoshi Iijima
 
PPTX
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Recruit Technologies
 
PDF
[Black Belt Online Seminar] AWS上でのログ管理
Amazon Web Services Japan
 
PDF
Fluentdのお勧めシステム構成パターン
Kentaro Yoshida
 
elasticsearchプラグイン入門
Shinsuke Sugaya
 
Elasticsearch入門 pyfes 201207
Jun Ohtani
 
Logstash
琛琳 饶
 
Elasticsearchを使うときの注意点 公開用スライド
崇介 藤井
 
リクルート流Elasticsearchの使い方
Recruit Technologies
 
Elasticsearchのサジェスト機能を使った話
ktaro_w
 
Elasticsearchで作る形態素解析サーバ
Shinsuke Sugaya
 
Elasticsearchインデクシングのパフォーマンスを測ってみた
Ryoji Kurosawa
 
Amebaにおけるログ解析基盤Patriotの活用事例
cyberagent
 
Flumeを活用したAmebaにおける大規模ログ収集システム
Satoshi Iijima
 
Elasticsearch+nodejs+dynamodbで作る全社システム基盤
Recruit Technologies
 
[Black Belt Online Seminar] AWS上でのログ管理
Amazon Web Services Japan
 
Fluentdのお勧めシステム構成パターン
Kentaro Yoshida
 
Ad

Similar to Debugging and Testing ES Systems (20)

PDF
Elasticsearch in 15 Minutes
Karel Minarik
 
PPTX
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
ODP
Elastic Search
NexThoughts Technologies
 
PDF
Introduction to Elasticsearch
Sperasoft
 
PPTX
Elastic search intro-@lamper
medcl
 
PDF
Workshop: Learning Elasticsearch
Anurag Patel
 
PDF
Intro to Elasticsearch
Clifford James
 
PDF
Introduction to elasticsearch
hypto
 
PDF
Elasticsearch in 15 minutes
David Pilato
 
PDF
Hopper Elasticsearch Hackathon
imotov
 
PPTX
Elasticsearch - under the hood
SmartCat
 
PDF
elasticsearch basics workshop
Mathieu Elie
 
PDF
Elasticsearch Quick Introduction
imotov
 
PPTX
Big data elasticsearch practical
JWORKS powered by Ordina
 
PDF
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
PDF
Kyiv.py #16 october 2015
Andrii Soldatenko
 
PDF
Elastic Search Training#1 (brief tutorial)-ESCC#1
medcl
 
PPTX
曾勇 Elastic search-intro
Shaoning Pan
 
PDF
Elasticsearch Basics
Shifa Khan
 
PPTX
Elasticsearch
Yervand Aghababyan
 
Elasticsearch in 15 Minutes
Karel Minarik
 
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Elastic Search
NexThoughts Technologies
 
Introduction to Elasticsearch
Sperasoft
 
Elastic search intro-@lamper
medcl
 
Workshop: Learning Elasticsearch
Anurag Patel
 
Intro to Elasticsearch
Clifford James
 
Introduction to elasticsearch
hypto
 
Elasticsearch in 15 minutes
David Pilato
 
Hopper Elasticsearch Hackathon
imotov
 
Elasticsearch - under the hood
SmartCat
 
elasticsearch basics workshop
Mathieu Elie
 
Elasticsearch Quick Introduction
imotov
 
Big data elasticsearch practical
JWORKS powered by Ordina
 
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
Kyiv.py #16 october 2015
Andrii Soldatenko
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
medcl
 
曾勇 Elastic search-intro
Shaoning Pan
 
Elasticsearch Basics
Shifa Khan
 
Elasticsearch
Yervand Aghababyan
 
Ad

More from Chris Birchall (11)

PDF
Scala.js & friends: SCALA ALL THE THINGS
Chris Birchall
 
PDF
Rust 超入門
Chris Birchall
 
PDF
Tour of Distributed Systems 3 - Apache Kafka
Chris Birchall
 
PPTX
Tour of distributed systems 2 - Cassandra
Chris Birchall
 
PDF
Guess the Country - Playing with Twitter Streaming API
Chris Birchall
 
PDF
Tour of distributed systems 1 - ZooKeeper
Chris Birchall
 
PDF
ScalaCache: simple caching in Scala
Chris Birchall
 
PDF
Hydra
Chris Birchall
 
PDF
Load testing with gatling
Chris Birchall
 
PDF
Phone Home: A client-side error collection system
Chris Birchall
 
PDF
Branching Strategies: Feature Branches vs Branch by Abstraction
Chris Birchall
 
Scala.js & friends: SCALA ALL THE THINGS
Chris Birchall
 
Rust 超入門
Chris Birchall
 
Tour of Distributed Systems 3 - Apache Kafka
Chris Birchall
 
Tour of distributed systems 2 - Cassandra
Chris Birchall
 
Guess the Country - Playing with Twitter Streaming API
Chris Birchall
 
Tour of distributed systems 1 - ZooKeeper
Chris Birchall
 
ScalaCache: simple caching in Scala
Chris Birchall
 
Load testing with gatling
Chris Birchall
 
Phone Home: A client-side error collection system
Chris Birchall
 
Branching Strategies: Feature Branches vs Branch by Abstraction
Chris Birchall
 

Recently uploaded (20)

PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 

Debugging and Testing ES Systems

  • 1. Debugging and Testing ES Systems Chris Birchall 2013/8/29 Elasticsearch 勉強会 第1回 #elasticsearchjp
  • 2. Elasticsearch and me ● At Infoscience, helped build a log management product based on ES + Hadoop ● At M3, ES evangelist (??) ○ Maintain ES cluster ○ Help dev teams integrate ES into their apps Twitter: @cbirchall Github: https://github.com/cb372
  • 3. Search at M3 ● Using ES for all new services ○ Search, recommendation (MoreLikeThis) ● Slowly migrating other services from Solr ● A few legacy services use Lucene directly ● Running all indices on one ES cluster ● Kuromoji for Japanese content
  • 4. Debugging Mostly debugging of queries ● “Why doesn’t doc X match query Y?” ● “Why does this search return no results?” Operational issues are very rare ● ES’s clustering magic is surprisingly stable! ● No performance issues so far
  • 5. Debugging - Step 1 Check for typos! ES will silently ignore many typos in settings/mapping definitions
  • 6. Typo - Example $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mapping" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Let’s create a new index...
  • 7. Typo - Example (cont’d) {"ok":true,"acknowledged":true} Response from ES: OK, seems fine...
  • 8. Typo - Example (cont’d) $ curl localhost:9200/myapp/_mappings?pretty Response from ES: { "myapp" : { } } Eh? Where are my lovingly-crafted mappings?! Now check the mappings...
  • 9. Typo - Example (cont’d) $ curl -X PUT localhost:9200/myapp -d '{ "settings": { "number_of_shards": 3 }, "mappings" : { "article" : { "_source": { "enabled": false }, "properties": { "title": { "type": "string", "store": "true" }, "body": { "type": "string", "store": "true" }, ... } }, ... }' Oops!
  • 10. Debugging - Step 2 Set up a local environment ● Makes it easy to wipe & rebuild index
  • 11. Setting up a local env (OSX) # Install $ brew install elasticsearch # Kuromoji plugin (optional) $ /usr/local/opt/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-analysis-kuromoji/1.5.0 # Start $ elasticsearch # Create index $ curl -X PUT localhost:9200/my_app -d '{ ... }' # Insert some documents $ curl -X PUT localhost:9200/my_app/my_type/1 -d '{ ... }' $ curl -X PUT localhost:9200/my_app/my_type/2 -d '{ ... }' # Done!
  • 12. Useful commands - Analyze $ curl 'localhost:9200/myindex/_analyze?pretty' / -d '東京特許許可局許可局長' { "tokens" : [ { "token" : "東京", "start_offset" : 0, "end_offset" : 2, "type" : "word", "position" : 1 }, { "token" : "特許", "start_offset" : 2, "end_offset" : 4, "type" : "word", ... How is my document/query being tokenized?
  • 13. Useful commands - Explain $ curl 'localhost:9200/kuro/docs/123/_explain?pretty' / -d '{ "query": { "term": { "body": "東京" } } }' { ... "matched" : true, "explanation" : { "value" : 0.375, "description" : "weight(body:東京 in 0) [PerFieldSimilarity], result of:", "details" : [ { "value" : 0.375, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" ... Why does this document (not) match this query? Specify document ID
  • 14. Tuning queries Parameters to tweak ● default_operator (AND/OR) ● auto_generate_phrase_queries ● minumum_should_match ● Stop words/tags ● Kuromoji ○ Segmentation mode ○ Reading form filter ○ Disable Kuromoji! (for some fields)
  • 15. Why disable Kuromoji? Problem: occasionally weird tokenization ● AND query will fail, because not all terms match ● OR query will match any document with 病院 → low precision Phrase Terms 特定医療法人財団 日本会 東日本病院 (document field) 特定、医療、法人、財団、 日本、会、東日本、病院 東日本 (query) 東日、東日本、本 東日本病院 (query) 東、東日本、日本、病院
  • 16. Useful plugin - Head $ bin/plugin -install mobz/elasticsearch-head http://mobz.github.io/elasticsearch-head/
  • 17. Testing Main goal: Ensure that queries return the results that we expect ● Test coverage of representative queries ○ Freedom to tune for a given query without breaking other queries Ideally, tests should: ● Run fast ● Run standalone (i.e. no need to have an ES server running)
  • 18. Testing - Java elasticsearch-test is awesome ● DSL to set up/tear down ES ● Annotations + JUnit runner ● ES runs in-process ○ No need to start an external ES server ● Index is stored in-memory ○ Runs quickly https://github.com/tlrx/elasticsearch-test
  • 20. Testing - Ruby Simple Rails + Tire + RSpec example https://github.com/cb372/elasticsearch-rspec-example
  • 21. We’re hiring! TODO We are hiring slide http://bit.ly/m3jobs