SlideShare a Scribd company logo
Itamar Syn-Hershko
http://code972.com
@synhershko
Practical Elasticsearch
Me?
• Itamar Syn-Hershko / @synhershko
• Lucene.NET PMC and lead committer
• Microsoft MVP
• RavenDB
– X-Core developer
– “RavenDB in Action” author
Consulting
Partner
Practical Elasticsearch - real world use cases
An index
Elasticsearch
• Powered by Apache Lucene
• Open-source
• Rapid growth
• High profile users world-wide
REST API
• Indexes
• Types
• IDs
$ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
"user" : "synhershko",
"post_date" : "2013-05-30T14:12:12",
"message" : "trying out Elastic Search",
"followers": 3,
"registered": true
}'
Full-Text Search
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
Example from:
Justin Zobel , Alistair Moffat,
Inverted files for text search engines,
ACM Computing Surveys (CSUR)
v.38 n.2, p.6-es, 2006
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
Full-text Search 101:
The inverted index
Full-text Search 101:
The inverted index
DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
The index:
Dictionary and
posting lists
6 documents to index
The old night keeper keeps the keep in the town1
In the big old house in the big old gown.2
The house in the town had the big old keep3
Where the old night keeper never did sleep.4
The night keeper keeps the keep in the night5
And keeps in the dark and sleeps in the light.6
User queries for “keeper”
Term Normalization DocumentsTerm
<6>and
<2> <3>big
<6>dark
<4>did
<2>gown
<3>had
<2> <3>house
<1> <2> <3> <5> <6>in
<1> <3> <5>keep
<1> <4> <5>keeper
<1> <5> <6>keeps
<6>light
<4>never
<1> <4> <5>night
<1> <2> <3> <4>old
<4>sleep
<6>sleeps
<1> <2> <3> <4> <5> <6>the
<1> <3>town
<4>where
• Lowercasing
• Stop words (grey)
• Not best practice anymore
• Stemming
• Porter stemmer
• s-stemmer
• Relevance++
• SizeOnDisk--
Full-Text Search
Your data
store
How hard is it to get search right,
anyway?
Relevance
• Precision
The fraction of the retrieved
documents that are relevant
• Recall
The fraction of the relevant
documents that are retrieved
• Order of results
Challenges with search
• Relevance
• Getting the tokens right
– Tokenization
– Stemming
• Multi-lingual content
– Or other cross-cutting search concerns
• Tolerance
Real-time Analytics
Real-time Analytics
Queue
(Redis)
“Shippers”
“Indexer”
Scaling out
Moar use cases!
#1: Real-Time Alerting System
Percolation
#2: Smarter query parsing
Matching inexact queries
• Phrase slop
– “Bridge of London” -> “London Bridge”
• Word-level edit distance with fuzzy queries
– ditsance -> distance
– color -> colour
#3: Offline Classification
Structuring the unstructured
• Record linkage
– Bag of words model
– “More Like This” functionality
• NLP
• Entity extraction
#4: Everything is searchable
Geo-spatial search
• Distance
• Shape interactions
• Multiple algorithms
Geo-spatial search
Practical Elasticsearch - real world use cases
Image search
http://colors.qbox.io/
http://cs.stanford.edu/people/karpathy/deepimage
sent
Deep Visual-Semantic Alignments for
Generating Image Descriptions
#5: Anomaly detection
The Significant Terms Aggregation
Uncommonly common
Mark Harwood’s talk at
http://www.infoq.com/presentations/elasticsear
ch-revealing-uncommonly-common
#6: Debugging a distributed system
Queue
(Redis)
#6: Debugging a distributed system
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 "http://www.example.com/start.html"
"Mozilla/4.08 [en] (Win98; I ;Nav)"
System.NullReferenceException: Object reference not set to an instance of an object.
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at AjaxControlToolkit.ToolkitScriptManager.GetScriptCombineAttributes(Assembly assembly)
at AjaxControlToolkit.ToolkitScriptManager.IsScriptCombinable(ScriptEntry scriptEntry)
at AjaxControlToolkit.ToolkitScriptManager.OnResolveScriptReference(ScriptReferenceEventArgs e)
at System.Web.UI.ScriptManager.RegisterScripts()
at System.Web.UI.ScriptManager.OnPagePreRenderComplete(Object sender, EventArgs e)
at System.Web.UI.Page.OnPreRenderComplete(EventArgs e)
at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean
includeStagesAfterAsyncPoint)
#7: Distributed git storage
• PoC in C# using libgit2sharp
• https://github.com/synhershko/libgit2sharp.El
asticsearch
• Kudos @nulltoken
Putting this to practice
• Search on your data
– Data doesn’t have to be structured to be queried
• Use your logs to gain insight
– Metrics
– Establish a baseline
– Investigate on unexpected / unfamiliar behaviors
Thank you.
Questions?
Itamar Syn-Hershko
http://code972.com
@synhershko

More Related Content

What's hot (20)

ODP
Query DSL In Elasticsearch
Knoldus Inc.
 
PDF
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
PDF
Elasticsearch in Netflix
Danny Yuan
 
PPTX
Introduction to Elasticsearch
Ismaeel Enjreny
 
PPSX
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
PPTX
ELK Stack
Phuc Nguyen
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PPTX
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
PDF
Elasticsearch
Shagun Rathore
 
PDF
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
ODP
Elasticsearch for beginners
Neil Baker
 
PDF
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
PDF
Solving PostgreSQL wicked problems
Alexander Korotkov
 
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
PPTX
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Query DSL In Elasticsearch
Knoldus Inc.
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Edureka!
 
Elasticsearch in Netflix
Danny Yuan
 
Introduction to Elasticsearch
Ismaeel Enjreny
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
ELK Stack
Phuc Nguyen
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
MySQL Monitoring using Prometheus & Grafana
YoungHeon (Roy) Kim
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
Elastic Stack Introduction
Vikram Shinde
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elasticsearch
Shagun Rathore
 
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Elasticsearch for beginners
Neil Baker
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
Solving PostgreSQL wicked problems
Alexander Korotkov
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
Elastic stack Presentation
Amr Alaa Yassen
 
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 

Similar to Practical Elasticsearch - real world use cases (20)

PPTX
Elasticsearch Distributed search & analytics on BigData made easy
Itamar
 
ODP
Elastic Search
NexThoughts Technologies
 
PDF
Search pitb
Nawab Iqbal
 
PDF
Search at Twitter
lucenerevolution
 
PDF
Kyiv.py #16 october 2015
Andrii Soldatenko
 
PPTX
Elasticsearch
Yervand Aghababyan
 
PDF
Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator
Mihai Oprea
 
PPTX
Connect and search your data
brendonpage
 
PDF
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Piotr Pelczar
 
PPTX
Elastic Search
Navule Rao
 
PDF
How to search...better!
Alessandro Melchiori
 
PDF
How to search...better !!!
socraetr2010
 
PDF
Elasto Mania
andrefsantos
 
PPTX
Elasticsearch
Divij Sehgal
 
PPTX
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
PDF
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
PDF
Yokozuna, Distributed Search You Don't Think About
rzezeski
 
PPTX
Elasticsearch a real-time distributed search and analytics engine
gautam kumar
 
PPTX
Introducing ElasticSearch - Ashish
Entrepreneur / Startup
 
PDF
Elasticsearch at EyeEm
Lars Fronius
 
Elasticsearch Distributed search & analytics on BigData made easy
Itamar
 
Elastic Search
NexThoughts Technologies
 
Search pitb
Nawab Iqbal
 
Search at Twitter
lucenerevolution
 
Kyiv.py #16 october 2015
Andrii Soldatenko
 
Elasticsearch
Yervand Aghababyan
 
Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator
Mihai Oprea
 
Connect and search your data
brendonpage
 
Elasticsearch - SEARCH & ANALYZE DATA IN REAL TIME
Piotr Pelczar
 
Elastic Search
Navule Rao
 
How to search...better!
Alessandro Melchiori
 
How to search...better !!!
socraetr2010
 
Elasto Mania
andrefsantos
 
Elasticsearch
Divij Sehgal
 
PEARC17: Designsafe: Using Elasticsearch to Share and Search Data on a Scienc...
Josue Balandrano
 
Scalability and Real-time Queries with Elasticsearch
Ivo Andreev
 
Yokozuna, Distributed Search You Don't Think About
rzezeski
 
Elasticsearch a real-time distributed search and analytics engine
gautam kumar
 
Introducing ElasticSearch - Ashish
Entrepreneur / Startup
 
Elasticsearch at EyeEm
Lars Fronius
 
Ad

Recently uploaded (20)

PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
Ad

Practical Elasticsearch - real world use cases