SlideShare a Scribd company logo
INTRODUCTION TO
ELASTICSEARCH
2

Agenda
• Me
• ElasticSearch Basics
• Concepts
• Network / Discovery
• Data Structure
• Inverted Index
• The REST API
• Sample Deployment
3

Me
• Roy Russo
• JBoss Portal Co-Founder
• LoopFuse Co-Founder
• ElasticHQ
• http://www.elastichq.org
• AltiSource Labs Architect
4

ElasticSearch in One Slide
• Document - Oriented Search Engine
• JSON
• Apache Lucene
• No Schema
• Mapping Types
• Horizontal Scale, Distributed
• REST API
• Vibrant Ecosystem
• Tooling, Plugins, Hosting, Client-Libs
5

When to use ElasticSearch
• Full-Text Search
• Fast Read Database
• “Simple” Data Structures
• Minimize Impedance Mismatch
6

When to use ElasticSearch - Logs
• Logstash + ElasticSearch + Kibana
7

How to use ElasticSearch - CQRS
Client

Command Sent
Ack Resp.

Remote Interfaces
Services
Domain Objects
Data
Storage

Request DTO
DTO Returned
8

How to use ElasticSearch - CQRS
Request DTO
DTO Returned

Client

Command Sent
Ack Resp.

Remote Interfaces

Remote Interfaces

Services

DTO Read Layer

Domain Objects
Event
Storage

?

Data
Storage
9

A note on Rivers
• JDBC
• CouchDB
• MongoDB
• RabbitMQ

• Twitter
• And more…

"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/my_db",
"user" : "root",
"password" : "mypassword",
"sql" : "select * from products"
}
10

ElasticSearch at Work
REALTrans

REALServicing
REALSearch

ElasticSearch

REALDoc
11

What sucks about ElasticSearch
• No AUTH/AUTHZ
• No Usage Metrics
12

How the World Uses ElasticSearch
13

The Basics - Distro
• Download and Run
Executables

Node Configs

Data Storage

Log files

├── bin
│ ├── elasticsearch
│ ├── elasticsearch.in.sh
│ └── plugin
├── config
│ ├── elasticsearch.yml
│ └── logging.yml
├── data
│ └── cluster1
├── lib
│ ├── elasticsearch-x.y.z.jar
│ ├── ...
│ └──
└── logs
├── elasticsearch.log
└── elasticsearch_index_search_slowlog.log
└── elasticsearch_index_indexing_slowlog.log
14

The Basics - Glossary
• Node = One ElasticSearch instance (1 java proc)
• Cluster = 1..N Nodes w/ same Cluster Name
• Index = Similar to a DB
• Named Collection of Documents
• Maps to 1..N Primary shards && 0..N Replica shards
• Mapping Type = Similar to a DB Table
• Document Definition
• Shard = One Lucene instance
• Distributed across all nodes in the cluster.
15

The Basics - Document Structure
• Modeled as a JSON object
{

{
"genre": "Crime",
“language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983

"_index": "imdb",
"_type": "movie",
"_id": "u17o8zy9RcKg6SjQZqQ4Ow",
"_version": 1,
"_source": {
"genre": "Crime",
"language": "English",
"country": "USA",
"runtime": 170,
"title": "Scarface",
"year": 1983
}

}

}
16

The Basics - Document Structure
• Document Metadata fields
• _id
• _type : mapping type
• _source : enabled/disabled
• _timestamp
• _ttl
• _size : size of uncompressed _source
• _version
17

The Basics - Document Structure
• Mapping:
• ES will auto-map (type) fields
• You can specify mapping, if needed
• Data Types:
• String
• Number
• Int, long, float, double, short, byte

• Boolean
• Datetime
• formatted

• geo_point, geo_shape
• Array
• Nested
• IP
18

A Mapping Type
"imdb": {
"movie": {
"properties": {
"country": {
"type": "string“,
“store”:true,
“index”:false
},
"genre": {
"type": "string“,
"null_value" : "na“,
“store”:false,
“index:true
},
"year": {
"type": "long"
}
}
}
}
19

Lucene – Inverted Index
• Which presidential speeches contain the words “fair”
• Go over every speech, word by word, and mark the speeches that
contain it
• Fails at large scale
20

Lucene – Inverted Index
• Inverting
• Take all the speeches
• Break them down by word (tokenize)
• For each word, store the IDs of the speeches
• Sort all words (tokens)
• Searching
• Finding the word is fast
• Iterate over document IDs that are referenced
Token

Doc Frequency

Doc IDs

Jobs

2

4,8

Fair

5

1,2,4,8,42

Bush

300

1,2,3,4,5,6, …
21

Lucene – Inverted Index
• Not an algorithm
• Implementations vary
22

Cluster Topology
• 4 Node Cluster
• Index Configuration:
• “A”: 2 Shards, 1 Replica
• “B”: 3 Shards, 1 Replica

A1
B3

B2

A2

B1

B2

A1

B1
B3

A2
23

Building a Cluster
Start Cluster…
start cmd.exe /C elasticsearch -Des.node.name=Primus
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=false -Des.node.name=Slayer
start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=true -Des.node.name=Maiden

Create Index…
curl -XPUT 'http://localhost:9200/imdb/' -d '{
"settings" : {
"index" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
}'

Index Document…
curl -XPOST 'http://localhost:9200/imdb/movie/' -d '{
"genre": “Comedy",
"language": "English",
"country": "USA",
"runtime": 99,
"title": “Big Trouble in Little China",
"year": 1986
}'
24

Cluster State
• Cluster State
• Node Membership
• Indices Settings and Mappings (Types)
• Shard Allocation Table
• Shard State
• cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
25

Cluster State
• Changes in State published from Master to other nodes
1

(M)

3

2

PUT /newIndex
CS1

1

(M)

CS1

1

(M)
CS2

3

2

CS2

CS1

CS1

CS1

3

2
CS2

CS2
26

Discovery
• Nodes discover each other using multicast.
• Unicast is an option
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"]

• Each cluster has an elected master node
• Beware of split-brain
27

The Basics - Shards
• Primary Shard:
• First time Indexing
• Index has 1..N primary shards (default: 5)
• # Not changeable once index created
• Replica Shard:
• Copy of the primary shard
• Can be changed later
• Each primary has 0..N replicas
• HA:
• Promoted to primary if primary fails
• Get/Search handled by primary||replica
28

Shard Auto-Allocation
• Add a node - Shards Relocate

Node 1

Node 2

0P

1P

1R

0R

Node 2

0R

• Shard Stages
• UNASSIGNED
• INITIALIZING
• STARTED
• RELOCATING
29

The Basics – Searching
• How it works:
• Search request hits a node
• Node broadcasts to every shard in the index
• Each shard performs query
• Each shard returns results
• Results merged, sorted, and returned to client.
• Problems:
• ES has no idea where your document is
• Broadcast query to 100 nodes
• Performance degrades
30

The Basics - Shards
• Shard Allocation Awareness
• cluster.routing.allocation.awareness.attributes: rack_id
• Example:
•
•
•
•
•

2 Nodes with node.rack_id=rack_one
Create Index 5 shards / 1 replica (10 shards)
Add 2 Nodes with node.rack_id=rack_two
Shards RELOCATE to even distribution
Primary & Replica will NOT be on the same rack_id value.

• Shard Allocation Filtering
• node.tag=val1
• index.routing.allocation.include.tag:val1,val2
curl -XPUT localhost:9200/newIndex/_settings -d '{
"index.routing.allocation.include.tag" : "val1,val2"
}'
31

Nodes
• Master node handles cluster-wide (Meta-API) events:
• Node participation
• New indices create/delete
• Re-Allocation of shards
• Data Nodes
• Indexing / Searching operations
• Client Nodes
• REST calls
• Light-weight load balancers
32

REST API
• Create Index
• action.auto_create_index: 0
• Index Document
• Dynamic type mapping
• Versioning
• ID specification
• Parent / Child (/1122?parent=1111)
33

REST API – Versioning
• Every document is Versioned
• Version assigned on creation
• Version number can be assigned
34

REST API - Update
• Update using partial data
• Partial doc merged with existing
• Fails if document doesn’t exist
• “Upsert” data used to create a doc, if doesn’t exist
{
“upsert" : {
“title": “Blade Runner”
}
}
35

REST API
• Exists
• No overhead in loading
• Status Code Result
• Delete
• Get
• Multi-Get

{
"docs" : [
{
"_id" : "1"
"_index" : "imdb"
"_type" : "movie"
},
{
"_id" : "5"
"_index" : "oldmovies"
"_type" : "movie"
"_fields" " ["title", "genre"]
}
]
}
36

REST API - Search
• Free Text Search
• URL Request
• http://localhost:9200/imdb/movie/_search?q=scar*
• Complex Query
• http://localhost:9200/imdb/movie/_search?q=scarface+OR

+star
• http://localhost:9200/imdb/movie/_search?q=(scarface+O
R+star)+AND+year:[1981+TO+1984]
37

REST API - Search
• Search Types:
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1941+TO+1984]&search_type=count
• http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A
ND+year:[1941+TO+1984]&search_type=query_then_fetch
• Query and Fetch (fastest):
• Executes on all shards and return results

• Query then Fetch (default):
• Executes on all shards. Only some information returned for rank/sort,

only the relevant shards are asked for data
38

REST API – Query DSL
http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984]

Becomes…
curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
{
"query_string" : {
"query" : "scarface or star"
}
},
{
"range" : {
"year" : { "gte" : 1931 }
}
}
]
}
}
}'
39

REST API – Query DSL
• Query String Request use Lucene query syntax
• Limited
• Instead use “match” query

curl -XPOST 'localhost:9200/_search?pretty' -d '{
"query" : {
"bool" : {
"must" : [
Automatically builds
{
a boolean query
“match" : {
“message" : “scarface star"
}
},
{
"range" : {
“year" : { "gte" : 1981 }
}
}
]
…
40

REST API – Query DSL
• Match Query
{
“match”:{
“title”:{
“type”:“phrase”,
“query”:“quick fox”,
“slop”:1
}
}
}

• Boolean Query
• Must: document must match query
• Must_not: document must not match query
• Should: document doesn’t have to match
• If it matches… higher score

{
"bool":{
"must":[
{
"match":{
"color":"blue"
}
},
{
"match":{
"title":"shirt"
}
}
],
"must_not":[
{
"match":{
"size":"xxl"
}
}
],
"should":[
{
"match":{
"textile":"cotton"
}
41

REST API – Query DSL
• Range Query
• Numeric / Date Types
• Prefix/Wildcard Query
• Match on partial terms
• RegExp Query

{
"range":{
"founded_year":{
"gte":1990,
"lt":2000
}
}
}
42

REST API – Query DSL
• Geo_bbox
• Bounding box filter
• Geo_distance
• Geo_distance_range

{
"query":{
"filtered":{
"query":{
"match_all":{
}
},
"filter":{
"geo_bbox":{
"location":{
"top_left":{
"lat":40.73,
"lon":-74.1
},
"bottom_right":{
"lat":40.717,
"lon":-73.99
}

{
"query":{
"filtered":{
"query":{
"match_all":{

}
},
"filter":{
"geo_distance":{
"distance":"400km"
"location":{
"lat":40.73,
"lon":-74.1
}
}

…
43

REST API – Bulk Operations
• Bulk API
• Minimize round trips with index/delete ops
• Individual response for every request action
• In order

• Failure of one action will not stop subsequent actions.

• localhost:9200/_bulk

{ "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n
{ "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n
{ "first_name" : "Tony", "last_name" : "Soprano" }n
...
{ “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n
{ doc : {“title" : “Blade Runner" } }n
44

Percolate API
• Reversing Search
• Store queries and filter (percolate) documents through them.
• Useful for Alert/Monitoring systems
curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{
"query" : {
"boolean" : {
"must" : [
{ "term" : { "company" : "NOK" }},
{ "range" : { "value" : { "lt" : "2.5" }}}
]
}
}
}'

curl -X PUT localhost:9200/stocks/stock/1?percolate=* -d '{
"doc" : {
"company" : "NOK",
"value" : 2.4
}
}'
45

Clients
• Client list: http://www.elasticsearch.org/guide/clients/
• Java Client, JS, PHP, Perl, Python, Ruby
• Spring Data:
• Uses TransportClient
• Implementation of ElasticsearchRepository aligns with generic
Repository interfaces.
• ElasticSearchCrudRepository extends PagingandSortingRepository
• https://github.com/spring-projects/spring-data-elasticsearch
@Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1")
public class Book {
…
}
public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> {
}
46

B’what about Mongo?
• Mongo:
• General purpose DB
• ElasticSearch:
• Distributed text search engine

… that’s all I have to say about that.
47

Questions?

More Related Content

What's hot (20)

PDF
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
PPTX
An Introduction to Elastic Search.
Jurriaan Persyn
 
PDF
Intro to Elasticsearch
Clifford James
 
PDF
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
PDF
Elasticsearch for Data Analytics
Felipe
 
PDF
Elasticsearch quick Intro (English)
Federico Panini
 
PDF
ElasticSearch - index server used as a document database
Robert Lujo
 
PPTX
Elasticsearch - under the hood
SmartCat
 
PPTX
The ultimate guide for Elasticsearch plugins
Itamar
 
PDF
Elasticsearch first-steps
Matteo Moci
 
PDF
Использование Elasticsearch для организации поиска по сайту
Olga Lavrentieva
 
PPTX
Elasticsearch
Ricardo Peres
 
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
PDF
Workshop: Learning Elasticsearch
Anurag Patel
 
ODP
Cool bonsai cool - an introduction to ElasticSearch
clintongormley
 
PPTX
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
PDF
Elastic Search
Lukas Vlcek
 
PPTX
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
KEY
Elasticsearch - Devoxx France 2012 - English version
David Pilato
 
PPTX
Elasticsearch Introduction
Roopendra Vishwakarma
 
Elasticsearch Introduction at BigData meetup
Eric Rodriguez (Hiring in Lex)
 
An Introduction to Elastic Search.
Jurriaan Persyn
 
Intro to Elasticsearch
Clifford James
 
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Elasticsearch for Data Analytics
Felipe
 
Elasticsearch quick Intro (English)
Federico Panini
 
ElasticSearch - index server used as a document database
Robert Lujo
 
Elasticsearch - under the hood
SmartCat
 
The ultimate guide for Elasticsearch plugins
Itamar
 
Elasticsearch first-steps
Matteo Moci
 
Использование Elasticsearch для организации поиска по сайту
Olga Lavrentieva
 
Elasticsearch
Ricardo Peres
 
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Karel Minarik
 
Workshop: Learning Elasticsearch
Anurag Patel
 
Cool bonsai cool - an introduction to ElasticSearch
clintongormley
 
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Elastic Search
Lukas Vlcek
 
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Elasticsearch - Devoxx France 2012 - English version
David Pilato
 
Elasticsearch Introduction
Roopendra Vishwakarma
 

Similar to ElasticSearch - DevNexus Atlanta - 2014 (20)

PPTX
Elasticsearch an overview
Amit Juneja
 
PDF
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
PDF
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
PDF
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
ODP
Elasticsearch for beginners
Neil Baker
 
PPTX
Elastic search Walkthrough
Suhel Meman
 
PPSX
Elasticsearch - basics and beyond
Ernesto Reig
 
PPTX
ElasticSearch Basic Introduction
Mayur Rathod
 
PDF
Elasto Mania
andrefsantos
 
ODP
Elasticsearch selected topics
Cube Solutions
 
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
PDF
Introduction to elasticsearch
pmanvi
 
PPTX
Elastic search intro-@lamper
medcl
 
PDF
Using elasticsearch with rails
Tom Z Zeng
 
PPTX
Elastic search
Binit Pathak
 
PDF
Managing Your Content with Elasticsearch
Samantha Quiñones
 
PDF
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
PDF
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
PPTX
Elastic pivorak
Pivorak MeetUp
 
Elasticsearch an overview
Amit Juneja
 
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
Elasticsearch for beginners
Neil Baker
 
Elastic search Walkthrough
Suhel Meman
 
Elasticsearch - basics and beyond
Ernesto Reig
 
ElasticSearch Basic Introduction
Mayur Rathod
 
Elasto Mania
andrefsantos
 
Elasticsearch selected topics
Cube Solutions
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
Introduction to elasticsearch
pmanvi
 
Elastic search intro-@lamper
medcl
 
Using elasticsearch with rails
Tom Z Zeng
 
Elastic search
Binit Pathak
 
Managing Your Content with Elasticsearch
Samantha Quiñones
 
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Elastic pivorak
Pivorak MeetUp
 
Ad

More from Roy Russo (6)

PPTX
Devnexus 2018
Roy Russo
 
PPTX
Dev nexus 2017
Roy Russo
 
PPTX
Elasticsearch Atlanta Meetup 3/15/16
Roy Russo
 
PPTX
PyATL Meetup, Oct 8, 2015
Roy Russo
 
PPTX
Introduction to Akka - Atlanta Java Users Group
Roy Russo
 
PPTX
Ajug hibernate-dos-donts
Roy Russo
 
Devnexus 2018
Roy Russo
 
Dev nexus 2017
Roy Russo
 
Elasticsearch Atlanta Meetup 3/15/16
Roy Russo
 
PyATL Meetup, Oct 8, 2015
Roy Russo
 
Introduction to Akka - Atlanta Java Users Group
Roy Russo
 
Ajug hibernate-dos-donts
Roy Russo
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 

ElasticSearch - DevNexus Atlanta - 2014

  • 2. 2 Agenda • Me • ElasticSearch Basics • Concepts • Network / Discovery • Data Structure • Inverted Index • The REST API • Sample Deployment
  • 3. 3 Me • Roy Russo • JBoss Portal Co-Founder • LoopFuse Co-Founder • ElasticHQ • http://www.elastichq.org • AltiSource Labs Architect
  • 4. 4 ElasticSearch in One Slide • Document - Oriented Search Engine • JSON • Apache Lucene • No Schema • Mapping Types • Horizontal Scale, Distributed • REST API • Vibrant Ecosystem • Tooling, Plugins, Hosting, Client-Libs
  • 5. 5 When to use ElasticSearch • Full-Text Search • Fast Read Database • “Simple” Data Structures • Minimize Impedance Mismatch
  • 6. 6 When to use ElasticSearch - Logs • Logstash + ElasticSearch + Kibana
  • 7. 7 How to use ElasticSearch - CQRS Client Command Sent Ack Resp. Remote Interfaces Services Domain Objects Data Storage Request DTO DTO Returned
  • 8. 8 How to use ElasticSearch - CQRS Request DTO DTO Returned Client Command Sent Ack Resp. Remote Interfaces Remote Interfaces Services DTO Read Layer Domain Objects Event Storage ? Data Storage
  • 9. 9 A note on Rivers • JDBC • CouchDB • MongoDB • RabbitMQ • Twitter • And more… "type" : "jdbc", "jdbc" : { "driver" : "com.mysql.jdbc.Driver", "url" : "jdbc:mysql://localhost:3306/my_db", "user" : "root", "password" : "mypassword", "sql" : "select * from products" }
  • 11. 11 What sucks about ElasticSearch • No AUTH/AUTHZ • No Usage Metrics
  • 12. 12 How the World Uses ElasticSearch
  • 13. 13 The Basics - Distro • Download and Run Executables Node Configs Data Storage Log files ├── bin │ ├── elasticsearch │ ├── elasticsearch.in.sh │ └── plugin ├── config │ ├── elasticsearch.yml │ └── logging.yml ├── data │ └── cluster1 ├── lib │ ├── elasticsearch-x.y.z.jar │ ├── ... │ └── └── logs ├── elasticsearch.log └── elasticsearch_index_search_slowlog.log └── elasticsearch_index_indexing_slowlog.log
  • 14. 14 The Basics - Glossary • Node = One ElasticSearch instance (1 java proc) • Cluster = 1..N Nodes w/ same Cluster Name • Index = Similar to a DB • Named Collection of Documents • Maps to 1..N Primary shards && 0..N Replica shards • Mapping Type = Similar to a DB Table • Document Definition • Shard = One Lucene instance • Distributed across all nodes in the cluster.
  • 15. 15 The Basics - Document Structure • Modeled as a JSON object { { "genre": "Crime", “language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 "_index": "imdb", "_type": "movie", "_id": "u17o8zy9RcKg6SjQZqQ4Ow", "_version": 1, "_source": { "genre": "Crime", "language": "English", "country": "USA", "runtime": 170, "title": "Scarface", "year": 1983 } } }
  • 16. 16 The Basics - Document Structure • Document Metadata fields • _id • _type : mapping type • _source : enabled/disabled • _timestamp • _ttl • _size : size of uncompressed _source • _version
  • 17. 17 The Basics - Document Structure • Mapping: • ES will auto-map (type) fields • You can specify mapping, if needed • Data Types: • String • Number • Int, long, float, double, short, byte • Boolean • Datetime • formatted • geo_point, geo_shape • Array • Nested • IP
  • 18. 18 A Mapping Type "imdb": { "movie": { "properties": { "country": { "type": "string“, “store”:true, “index”:false }, "genre": { "type": "string“, "null_value" : "na“, “store”:false, “index:true }, "year": { "type": "long" } } } }
  • 19. 19 Lucene – Inverted Index • Which presidential speeches contain the words “fair” • Go over every speech, word by word, and mark the speeches that contain it • Fails at large scale
  • 20. 20 Lucene – Inverted Index • Inverting • Take all the speeches • Break them down by word (tokenize) • For each word, store the IDs of the speeches • Sort all words (tokens) • Searching • Finding the word is fast • Iterate over document IDs that are referenced Token Doc Frequency Doc IDs Jobs 2 4,8 Fair 5 1,2,4,8,42 Bush 300 1,2,3,4,5,6, …
  • 21. 21 Lucene – Inverted Index • Not an algorithm • Implementations vary
  • 22. 22 Cluster Topology • 4 Node Cluster • Index Configuration: • “A”: 2 Shards, 1 Replica • “B”: 3 Shards, 1 Replica A1 B3 B2 A2 B1 B2 A1 B1 B3 A2
  • 23. 23 Building a Cluster Start Cluster… start cmd.exe /C elasticsearch -Des.node.name=Primus start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=false -Des.node.name=Slayer start cmd.exe /C elasticsearch -Des.node.data=true -Des.node.master=true -Des.node.name=Maiden Create Index… curl -XPUT 'http://localhost:9200/imdb/' -d '{ "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 1 } } }' Index Document… curl -XPOST 'http://localhost:9200/imdb/movie/' -d '{ "genre": “Comedy", "language": "English", "country": "USA", "runtime": 99, "title": “Big Trouble in Little China", "year": 1986 }'
  • 24. 24 Cluster State • Cluster State • Node Membership • Indices Settings and Mappings (Types) • Shard Allocation Table • Shard State • cURL -XGET http://localhost:9200/_cluster/state?pretty=1'
  • 25. 25 Cluster State • Changes in State published from Master to other nodes 1 (M) 3 2 PUT /newIndex CS1 1 (M) CS1 1 (M) CS2 3 2 CS2 CS1 CS1 CS1 3 2 CS2 CS2
  • 26. 26 Discovery • Nodes discover each other using multicast. • Unicast is an option discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["host1", "host2:port", "host3"] • Each cluster has an elected master node • Beware of split-brain
  • 27. 27 The Basics - Shards • Primary Shard: • First time Indexing • Index has 1..N primary shards (default: 5) • # Not changeable once index created • Replica Shard: • Copy of the primary shard • Can be changed later • Each primary has 0..N replicas • HA: • Promoted to primary if primary fails • Get/Search handled by primary||replica
  • 28. 28 Shard Auto-Allocation • Add a node - Shards Relocate Node 1 Node 2 0P 1P 1R 0R Node 2 0R • Shard Stages • UNASSIGNED • INITIALIZING • STARTED • RELOCATING
  • 29. 29 The Basics – Searching • How it works: • Search request hits a node • Node broadcasts to every shard in the index • Each shard performs query • Each shard returns results • Results merged, sorted, and returned to client. • Problems: • ES has no idea where your document is • Broadcast query to 100 nodes • Performance degrades
  • 30. 30 The Basics - Shards • Shard Allocation Awareness • cluster.routing.allocation.awareness.attributes: rack_id • Example: • • • • • 2 Nodes with node.rack_id=rack_one Create Index 5 shards / 1 replica (10 shards) Add 2 Nodes with node.rack_id=rack_two Shards RELOCATE to even distribution Primary & Replica will NOT be on the same rack_id value. • Shard Allocation Filtering • node.tag=val1 • index.routing.allocation.include.tag:val1,val2 curl -XPUT localhost:9200/newIndex/_settings -d '{ "index.routing.allocation.include.tag" : "val1,val2" }'
  • 31. 31 Nodes • Master node handles cluster-wide (Meta-API) events: • Node participation • New indices create/delete • Re-Allocation of shards • Data Nodes • Indexing / Searching operations • Client Nodes • REST calls • Light-weight load balancers
  • 32. 32 REST API • Create Index • action.auto_create_index: 0 • Index Document • Dynamic type mapping • Versioning • ID specification • Parent / Child (/1122?parent=1111)
  • 33. 33 REST API – Versioning • Every document is Versioned • Version assigned on creation • Version number can be assigned
  • 34. 34 REST API - Update • Update using partial data • Partial doc merged with existing • Fails if document doesn’t exist • “Upsert” data used to create a doc, if doesn’t exist { “upsert" : { “title": “Blade Runner” } }
  • 35. 35 REST API • Exists • No overhead in loading • Status Code Result • Delete • Get • Multi-Get { "docs" : [ { "_id" : "1" "_index" : "imdb" "_type" : "movie" }, { "_id" : "5" "_index" : "oldmovies" "_type" : "movie" "_fields" " ["title", "genre"] } ] }
  • 36. 36 REST API - Search • Free Text Search • URL Request • http://localhost:9200/imdb/movie/_search?q=scar* • Complex Query • http://localhost:9200/imdb/movie/_search?q=scarface+OR +star • http://localhost:9200/imdb/movie/_search?q=(scarface+O R+star)+AND+year:[1981+TO+1984]
  • 37. 37 REST API - Search • Search Types: • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=count • http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+A ND+year:[1941+TO+1984]&search_type=query_then_fetch • Query and Fetch (fastest): • Executes on all shards and return results • Query then Fetch (default): • Executes on all shards. Only some information returned for rank/sort, only the relevant shards are asked for data
  • 38. 38 REST API – Query DSL http://localhost:9200/imdb/movie/_search?q=(scarface+OR+star)+AND+year:[1981+TO+1984] Becomes… curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ { "query_string" : { "query" : "scarface or star" } }, { "range" : { "year" : { "gte" : 1931 } } } ] } } }'
  • 39. 39 REST API – Query DSL • Query String Request use Lucene query syntax • Limited • Instead use “match” query curl -XPOST 'localhost:9200/_search?pretty' -d '{ "query" : { "bool" : { "must" : [ Automatically builds { a boolean query “match" : { “message" : “scarface star" } }, { "range" : { “year" : { "gte" : 1981 } } } ] …
  • 40. 40 REST API – Query DSL • Match Query { “match”:{ “title”:{ “type”:“phrase”, “query”:“quick fox”, “slop”:1 } } } • Boolean Query • Must: document must match query • Must_not: document must not match query • Should: document doesn’t have to match • If it matches… higher score { "bool":{ "must":[ { "match":{ "color":"blue" } }, { "match":{ "title":"shirt" } } ], "must_not":[ { "match":{ "size":"xxl" } } ], "should":[ { "match":{ "textile":"cotton" }
  • 41. 41 REST API – Query DSL • Range Query • Numeric / Date Types • Prefix/Wildcard Query • Match on partial terms • RegExp Query { "range":{ "founded_year":{ "gte":1990, "lt":2000 } } }
  • 42. 42 REST API – Query DSL • Geo_bbox • Bounding box filter • Geo_distance • Geo_distance_range { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_bbox":{ "location":{ "top_left":{ "lat":40.73, "lon":-74.1 }, "bottom_right":{ "lat":40.717, "lon":-73.99 } { "query":{ "filtered":{ "query":{ "match_all":{ } }, "filter":{ "geo_distance":{ "distance":"400km" "location":{ "lat":40.73, "lon":-74.1 } } …
  • 43. 43 REST API – Bulk Operations • Bulk API • Minimize round trips with index/delete ops • Individual response for every request action • In order • Failure of one action will not stop subsequent actions. • localhost:9200/_bulk { "delete" : { "_index" : “imdb", "_type" : “movie", "_id" : "2" } }n { "index" : { "_index" : “imdb", "_type" : “actor", "_id" : "1" } }n { "first_name" : "Tony", "last_name" : "Soprano" }n ... { “update" : { "_index" : “imdb", "_type" : “movie", "_id" : "3" } }n { doc : {“title" : “Blade Runner" } }n
  • 44. 44 Percolate API • Reversing Search • Store queries and filter (percolate) documents through them. • Useful for Alert/Monitoring systems curl -XPUT localhost:9200/_percolator/stocks/alert-on-nokia -d '{ "query" : { "boolean" : { "must" : [ { "term" : { "company" : "NOK" }}, { "range" : { "value" : { "lt" : "2.5" }}} ] } } }' curl -X PUT localhost:9200/stocks/stock/1?percolate=* -d '{ "doc" : { "company" : "NOK", "value" : 2.4 } }'
  • 45. 45 Clients • Client list: http://www.elasticsearch.org/guide/clients/ • Java Client, JS, PHP, Perl, Python, Ruby • Spring Data: • Uses TransportClient • Implementation of ElasticsearchRepository aligns with generic Repository interfaces. • ElasticSearchCrudRepository extends PagingandSortingRepository • https://github.com/spring-projects/spring-data-elasticsearch @Document(indexName = "book", type = "book", indexStoreType = "memory", shards = 1, replicas = 0, refreshInterval = "-1") public class Book { … } public interface ElasticSearchBookRepository extends ElasticsearchRepository<Book, String> { }
  • 46. 46 B’what about Mongo? • Mongo: • General purpose DB • ElasticSearch: • Distributed text search engine … that’s all I have to say about that.