SlideShare a Scribd company logo
Introduction to
By Melvyn Peignon
What will I cover?
- Company and products presentation
- Elasticsearch architecture
- Presentation of Kibana
- Presentation of the search API
- Analyzer
- TF/IDF and relevance
- Elasticsearch use case
- Conclusion
Elastic
Founded in 2012
- Is behind:
- Kibana
- Elasticsearch
- Logstash
- Beats
What is elasticsearch?
- Full text search engine
- Based on Lucene
- Highly available
- Distributed
- Scalable
- RESTful
- Open Source
Shay
Bannon
Trending between search-engine (ES is blue)
How do they make money?
CRUD
CREATE
READ
UPDATE
DELETE
Some concepts to know
- Near real time (NRT)
- Cluster
- Node
- Index
- Document
- Shards and Replicas
Documents, Types, indexes
- An index is a collection of documents that share similar
properties.
- A document is the basic piece of information that can be
indexed.
- A type is a logical partition of the data in your index
Cluster, Nodes, Shards and Replicas
Cluster
Node 1
S1 S2
S3 S4
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2
S3 S4S1 S2
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Ping
PongPing
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Cluster, Nodes, Shards and Replicas
Cluster
Node 1 Node 2 Node 3 Node 4
S1 S2 S3 S4R2 R1 R4 R3
Responsibilities of the master
- Cluster health
- All the creation of index
- Repartition of the Shards
- Repartition of the Replicas
Cluster recommendation
- Your servers in the same data center
- Your machines on different Rack
- Keeping at least 3 eligible master node (Quorum of 2 is 2)
What’s Kibana?
- Another elastic product
- A tool allowing you to communicate in a more “human”
way to your elasticsearch
- A product that allow you to do dashboard and data
visualization
Introduction to elasticsearch
Let’s go for a demonstration
Demonstration done on Kibana
Query can be found on Github:
The analyzer
{“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0],
“wood”: [id_0], “probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
Standard Analyzer
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0], “probability”:[id_1],
“complete”:[id_1], “guide”:[id_1]}
[id_0, id_1]
The analyzer
{“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0],
“the”: [id_0], “wood”: [id_0],
“probability”:[id_1], “complete”:[id_1],
“guide”:[id_1]}
[]
The english analyzer
English Analyzer
{“walk”: [id_0], “wood”: [id_0]}
The english analyzer
{ “walk”: [id_0], “wood”: [id_0]}
[]
What is relevance?
Two theories to know:
- Boolean model
- Space vector model
Boolean model
O0 = “Eric is ... always feeding”
O1 = “Jherez is ... with the friends”
….
O6 = “Manage Idea… to Melvyn)”
QT= {“lab”, “manager”} QO = “OR”
T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”:
feeding}
D = {D0, D1, …, D6}
D0 = {Eric, is, …, feeding}
D1 = {Jherez, is, …, friends}
D6 = {Manage, idea, …,
Melvyn}
S1 = {D0, D1, D6}
S2 = {D0, D6}
SF = S1 ∪ S2 = S1
Space vector model
S1 = {D0, D1, D6}
T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0)
T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0)
T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
Weight of a token in a document
- Term frequency
TF = √Frequency
- Inverse Document Frequency
IDF = 1 + log(1/ (docFrequency + 1))
- Field length
FL = 1 / √TokenInField
Weight = TF x IDF x FL
Relevance
Vq = [1, 1.47]
V0 = [0.81, 0.85]
V1 = [0.37, 0]
V6 = [0.8, 1.2]
Relevance(Vq, Vx) = cos(Vq, Vx) =
(Vq . Vx) / (॥Vq॥.॥Vx॥)
Let’s Kaggle with elasticsearch
https://www.kaggle.com/c/whats-cooking
Results of our “Classifier”
Explanation of the methodology:
http://melvyn.pythonanywhere.com/posts/1/
Last advices?
- Mapping (I highly recommend having a mapping. You cannot update the type
defined in a field in the mapping)
- Elasticsearch as a database (I prefer having both, easier for reindexation,
having a back up, do my search and analytics on ES and use my database for
identification, etc ...)
- Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but
nice to have if you wanna do a quick implementation for a POC)
Hope you enjoyed the presentation!
Thank you for your attention!
Questions?

More Related Content

What's hot (20)

PDF
Elasticsearch From the Bottom Up
foundsearch
 
ODP
Deep Dive Into Elasticsearch
Knoldus Inc.
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PPTX
Elastic search overview
ABC Talks
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
PDF
Elasticsearch
Hermeto Romano
 
PPTX
Centralized log-management-with-elastic-stack
Rich Lee
 
ODP
Elasticsearch presentation 1
Maruf Hassan
 
PPTX
Elasticsearch
Jean-Philippe Chateau
 
PPTX
Introduction to Elasticsearch
Ismaeel Enjreny
 
PPTX
The Elastic ELK Stack
enterprisesearchmeetup
 
PPTX
Elastic stack Presentation
Amr Alaa Yassen
 
PDF
Introduction à ElasticSearch
Fadel Chafai
 
PDF
ElasticSearch
Volodymyr Kraietskyi
 
PPTX
Kibana overview
Rinat Tainov
 
PDF
Introduction to Elasticsearch
Ruslan Zavacky
 
PDF
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
PPTX
ELK Stack
Phuc Nguyen
 
PPTX
An Intro to Elasticsearch and Kibana
ObjectRocket
 
PPTX
quick intro to elastic search
medcl
 
Elasticsearch From the Bottom Up
foundsearch
 
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Elastic Stack Introduction
Vikram Shinde
 
Elastic search overview
ABC Talks
 
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elasticsearch
Hermeto Romano
 
Centralized log-management-with-elastic-stack
Rich Lee
 
Elasticsearch presentation 1
Maruf Hassan
 
Elasticsearch
Jean-Philippe Chateau
 
Introduction to Elasticsearch
Ismaeel Enjreny
 
The Elastic ELK Stack
enterprisesearchmeetup
 
Elastic stack Presentation
Amr Alaa Yassen
 
Introduction à ElasticSearch
Fadel Chafai
 
ElasticSearch
Volodymyr Kraietskyi
 
Kibana overview
Rinat Tainov
 
Introduction to Elasticsearch
Ruslan Zavacky
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
Edureka!
 
ELK Stack
Phuc Nguyen
 
An Intro to Elasticsearch and Kibana
ObjectRocket
 
quick intro to elastic search
medcl
 

Similar to Introduction to elasticsearch (20)

PPTX
Dev nexus 2017
Roy Russo
 
PPTX
Devnexus 2018
Roy Russo
 
PDF
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
PDF
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
UllyCarolinneSampaio
 
PDF
Introduction to Elasticsearch
Sperasoft
 
PPTX
Elasticsearch python
valiantval2
 
PPTX
Introduction to ElasticSearch
Manav Shrivastava
 
PDF
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
PDF
Elasticsearch and Spark
Audible, Inc.
 
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
PDF
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
PPTX
Elasticsearch - DevNexus 2015
Roy Russo
 
PPTX
Elasticsearch
Yervand Aghababyan
 
PDF
Intro to Elasticsearch
Clifford James
 
PDF
Elasticsearch: An Overview
Ruby Shrestha
 
PDF
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Codemotion
 
PPTX
Big data elasticsearch practical
JWORKS powered by Ordina
 
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
PPTX
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
PDF
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Dev nexus 2017
Roy Russo
 
Devnexus 2018
Roy Russo
 
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
Vancouver part 1 intro to elasticsearch and kibana-beginner's crash course ...
UllyCarolinneSampaio
 
Introduction to Elasticsearch
Sperasoft
 
Elasticsearch python
valiantval2
 
Introduction to ElasticSearch
Manav Shrivastava
 
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
Elasticsearch and Spark
Audible, Inc.
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
Elasticsearch - DevNexus 2015
Roy Russo
 
Elasticsearch
Yervand Aghababyan
 
Intro to Elasticsearch
Clifford James
 
Elasticsearch: An Overview
Ruby Shrestha
 
Making your elastic cluster perform - Jettro Coenradie - Codemotion Amsterdam...
Codemotion
 
Big data elasticsearch practical
JWORKS powered by Ordina
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Ad

Recently uploaded (20)

PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Ad

Introduction to elasticsearch

  • 2. What will I cover? - Company and products presentation - Elasticsearch architecture - Presentation of Kibana - Presentation of the search API - Analyzer - TF/IDF and relevance - Elasticsearch use case - Conclusion
  • 3. Elastic Founded in 2012 - Is behind: - Kibana - Elasticsearch - Logstash - Beats
  • 4. What is elasticsearch? - Full text search engine - Based on Lucene - Highly available - Distributed - Scalable - RESTful - Open Source Shay Bannon
  • 6. How do they make money?
  • 8. Some concepts to know - Near real time (NRT) - Cluster - Node - Index - Document - Shards and Replicas
  • 9. Documents, Types, indexes - An index is a collection of documents that share similar properties. - A document is the basic piece of information that can be indexed. - A type is a logical partition of the data in your index
  • 10. Cluster, Nodes, Shards and Replicas Cluster Node 1 S1 S2 S3 S4
  • 11. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 12. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 S3 S4S1 S2
  • 13. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 14. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 15. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3 Ping PongPing
  • 16. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 17. Cluster, Nodes, Shards and Replicas Cluster Node 1 Node 2 Node 3 Node 4 S1 S2 S3 S4R2 R1 R4 R3
  • 18. Responsibilities of the master - Cluster health - All the creation of index - Repartition of the Shards - Repartition of the Replicas
  • 19. Cluster recommendation - Your servers in the same data center - Your machines on different Rack - Keeping at least 3 eligible master node (Quorum of 2 is 2)
  • 20. What’s Kibana? - Another elastic product - A tool allowing you to communicate in a more “human” way to your elasticsearch - A product that allow you to do dashboard and data visualization
  • 22. Let’s go for a demonstration
  • 23. Demonstration done on Kibana Query can be found on Github:
  • 24. The analyzer {“a”: [id_0], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0]} Standard Analyzer
  • 25. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} Standard Analyzer
  • 26. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} [id_0, id_1]
  • 27. The analyzer {“a”: [id_0, id_1], “walk”: [id_0], “in”: [id_0], “the”: [id_0], “wood”: [id_0], “probability”:[id_1], “complete”:[id_1], “guide”:[id_1]} []
  • 28. The english analyzer English Analyzer {“walk”: [id_0], “wood”: [id_0]}
  • 29. The english analyzer { “walk”: [id_0], “wood”: [id_0]} []
  • 30. What is relevance? Two theories to know: - Boolean model - Space vector model
  • 31. Boolean model O0 = “Eric is ... always feeding” O1 = “Jherez is ... with the friends” …. O6 = “Manage Idea… to Melvyn)” QT= {“lab”, “manager”} QO = “OR” T = {t1:”lab”, t2:”manager”, t3:”Idea”, …, “t4”: feeding} D = {D0, D1, …, D6} D0 = {Eric, is, …, feeding} D1 = {Jherez, is, …, friends} D6 = {Manage, idea, …, Melvyn} S1 = {D0, D1, D6} S2 = {D0, D6} SF = S1 ∪ S2 = S1
  • 32. Space vector model S1 = {D0, D1, D6} T0 = D0 ∩ QT (“lab”, “manager”) ⇒ V0 = (L0, M0) T1 = D1 ∩ QT (“lab”) ⇒ V1 = (L1, 0) T6 = D6 ∩ QT (“lab”, “manager”) ⇒ V6 = (L6, M6)
  • 33. Weight of a token in a document - Term frequency TF = √Frequency - Inverse Document Frequency IDF = 1 + log(1/ (docFrequency + 1)) - Field length FL = 1 / √TokenInField Weight = TF x IDF x FL
  • 34. Relevance Vq = [1, 1.47] V0 = [0.81, 0.85] V1 = [0.37, 0] V6 = [0.8, 1.2] Relevance(Vq, Vx) = cos(Vq, Vx) = (Vq . Vx) / (॥Vq॥.॥Vx॥)
  • 35. Let’s Kaggle with elasticsearch https://www.kaggle.com/c/whats-cooking
  • 36. Results of our “Classifier” Explanation of the methodology: http://melvyn.pythonanywhere.com/posts/1/
  • 37. Last advices? - Mapping (I highly recommend having a mapping. You cannot update the type defined in a field in the mapping) - Elasticsearch as a database (I prefer having both, easier for reindexation, having a back up, do my search and analytics on ES and use my database for identification, etc ...) - Elasticsearch as a NOSQL database (I wouldn’t do it on a serious project, but nice to have if you wanna do a quick implementation for a POC)
  • 38. Hope you enjoyed the presentation! Thank you for your attention! Questions?