SlideShare a Scribd company logo
Battle of the Giants
Rafał Kuć – Sematext Group, Inc.
@kucrafal @sematext sematext.com
Ich bin ein…
Sematext consultant & engineer
Solr Cookbook series author
„ElasticSearch Server” author
„Mastering ElasticSearch” author
Solr.pl co-founder
Father and husband 
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
Under the Hood
Copyright 2013 Sematext Group. Inc. All rights reserved
Lucene 4.3Lucene 4.3
Expectations
Scalability
Fault toleranance
High availablity
Features
Manageability
Ease of installation
Tools
Support
Copyright 2013 Sematext Group. Inc. All rights reserved
Expectations vs Reality
Only ElasticSearch nodes
Single leader
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr + ZooKeeper
Leader per shard
Distributed
Fault tolerant
Automatic leader election
All Time Top Committers
Copyright 2013 Sematext Group. Inc. All rights reserved
Active Contributors
Copyright 2013 Sematext Group. Inc. All rights reserved
The Code
Copyright 2013 Sematext Group. Inc. All rights reserved
The Mailing Lists
Copyright 2013 Sematext Group. Inc. All rights reserved
Trends
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection vs Index
Collections and Indices can be spread among
different nodes in the cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
Collection – main
logical index
Index – main
logical structure
Apache Solr Index Structure
Field and types defined in schema
Automatic value copying
Dynamic fields
Custom similarity
Custom postings format
Multiple document types require shared schema
Can be read using API
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Index Structure
Schema - less
Fields and types defined with HTTP API
Multi – field support
Nested and parent – child documents
Custom similarity
Custom postings format
Multiple document with different structure
Can be read and written using API
Copyright 2013 Sematext Group. Inc. All rights reserved
Shards and Replicas
Many shards
0 or more replicas
Replica can become leader
Replicas can be created on
live cluster
Copyright 2013 Sematext Group. Inc. All rights reserved
Configuration
Static in solrconfig.xml
Can be reloaded with
core reload
Static in elasticsearch.yml
Changable at runtime
Copyright 2013 Sematext Group. Inc. All rights reserved
Discovery
Copyright 2013 Sematext Group. Inc. All rights reserved
Zen DiscoveryApache Zookeeper
Solr & ZooKeeper
Requires additional software
Prevents split – brain situations
Holds collections configurations
ZooKeeper ensemble needed
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Zen Discovery
Automatic node discovery
Multicast and unicast discovery methods
Automatic master detection
Two - way failure detection
Copyright 2013 Sematext Group. Inc. All rights reserved
HTTP FTW
HTTP REST API in ElasticSearch or Query String
for simple queries
HTTP with Query String in Apache Solr
Both provide specialized Java API
Copyright 2013 Sematext Group. Inc. All rights reserved
Results Grouping
Group on:
field value
query result
function query
Copyright 2013 Sematext Group. Inc. All rights reserved
Prospective Search
Called Percolator
Matches documents to stored queries
Copyright 2013 Sematext Group. Inc. All rights reserved
Full Text Search Capabilities
Variety of queries
Control score calculation
Different query parsers
Advanced Lucene queries
Copyright 2013 Sematext Group. Inc. All rights reserved
Score Calculation
Leverage Lucene scoring
Control importance of:
documents
queries
terms
phrases
Similiarity configuration
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr and Score Influence
Index - time boosting
Query - time
Term boosts
Field boosts
Phrases boost
Function queries
Sub-queries used for boosting
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch and Score Influence
Index - time
Query - time
Different queries provide different boost controls
Can calculate distributed term frequencies
Negative and Positive boosting queries
Custom score filters
Scripts
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Query Rescore
Reorders top N hits by using other query
Executed on shards before results are returned
to the node handling it
Not executed with scan and count
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Nested Objects
Indexed as separate documents
Stored in the same part of index as root doc
Hidden from standard queries and filters
Need appropriate queries and filters (nested)
Top level documents can be sorted on the basis
of nested ones
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Parent – Child Relationship
Used at query time
Multi core joins possible
select?q={!join from=parent to=id}color:Yellow
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Parent – Child
Proper indexing required
Indexed as separate documents
Standard queries don’t return child documents
Retrieve parent docs using queries and filters
(has_child, has_parent, top_children)
Copyright 2013 Sematext Group. Inc. All rights reserved
Filters
Used to narrown down query results
Good candidates for caching and reuse
Copyright 2013 Sematext Group. Inc. All rights reserved
Addictive
Can use different query parsers
Can use local params
Narrows down faceting results
Defined using Query DSL
Can be used for score calculation
Doesn’t narrow down faceting
results
Faceting
Copyright 2013 Sematext Group. Inc. All rights reserved
Terms
Range & query
Terms statistics
Spatial distance
Pivot Histograms
Real Time Or Not ?
Get not yet indexed docs from transaction log
Don’t need searcher reopening
Copyright 2013 Sematext Group. Inc. All rights reserved
Separate Get and
Multi Get API
Separate Realtime Get
Handler
Data Handling
Single and batch indexing supported
Copyright 2013 Sematext Group. Inc. All rights reserved
JSON in / JSON out
(and YAML)
Different formats allowed
(XML, JSON, CSV, binary)
Partial Document Updates
Not based on LUCENE-3837
Server-side doc reindexing
Both servers use versioning
Decreases network traffic
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Partial Doc Update
Sent to the standard update handler
Requires _version_ field
curl 'localhost:8983/solr/update?commit=true' -H
'Content-type:application/json' -d '[ {
"id" : "12345",
"enabled" : {
"set" : true
}
} ]'
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Partial Doc Update
Special end – point exposed - _update
Supports parameters like routing, parent,
replication, percolate, etc (similar to Index API)
Uses scripts to perform document updates
curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{
"script" : "ctx._source.enabled = enabled",
"params" : {
"enabled" : true
}
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
Solr Collections API
Collection
creation
reload
deletion
shards splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Indices REST API
Index
creation
deletion
closing and opening
refreshing
existence checking
Copyright 2013 Sematext Group. Inc. All rights reserved
Apache Solr Shard Splitting
Copyright 2013 Sematext Group. Inc. All rights reserved
admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
Cluster State Monitoring
Copyright 2013 Sematext Group. Inc. All rights reserved
Multiple MBeans exposed by
JMX
Multiple REST end – points
exposed to get different
statistics
ElasticSearch Statistics API
Health and state check
Nodes information
Cache statistics
Segments information
Index information
Mappings information
Copyright 2013 Sematext Group. Inc. All rights reserved
SPM – „One to rule them all”
ElasticSearch Cluster Settings Update
Control
rebalancing
recovery
allocation
Change cluster configuration properties
Copyright 2013 Sematext Group. Inc. All rights reserved
ElasticSearch Custom Shard Allocation
Cluster level:
Index level:
curl -XPUT localhost:9200/_cluster/settings -d '{
"persistent" : {
"cluster.routing.allocation.exclude._ip" : "192.168.2.1"
}
}'
curl -XPUT localhost:9200/sematext/_settings/ -d '{
"index.routing.allocation.include.tag" : "nodeOne,nodeTwo"
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
Moving Shards and Replicas
Move shards between nodes on demand
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands" : [
{"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1",
"to_node" : "node2"}},
{"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}}
]
}'
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
The Verdict
And The Winner Is ?
Copyright 2013 Sematext Group. Inc. All rights reserved
We Are Hiring !
Dig Search ?
Dig Analytics ?
Dig Big Data ?
Dig Performance ?
Dig working with and in open – source ?
We’re hiring world – wide !
http://sematext.com/about/jobs.html
Copyright 2013 Sematext Group. Inc. All rights reserved
Copyright 2013 Sematext Group. Inc. All rights reserved
Rafał Kuć
@kucrafal
rafal.kuc@sematext.com
Sematext
@sematext
http://sematext.com
http://blog.sematext.com
ElasticSearch Server 25% off:
MREESS25
Thank You !

More Related Content

PPTX
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
PPTX
Battle of the Giants round 2
PPTX
Battle of the giants: Apache Solr vs ElasticSearch
PDF
From zero to hero - Easy log centralization with Logstash and Elasticsearch
PPT
Solr and Elasticsearch, a performance study
PPT
Solr vs ElasticSearch
KEY
Elasticsearch - Devoxx France 2012 - English version
PDF
Workshop: Learning Elasticsearch
Battle of the Giants - Apache Solr vs. Elasticsearch (ApacheCon)
Battle of the Giants round 2
Battle of the giants: Apache Solr vs ElasticSearch
From zero to hero - Easy log centralization with Logstash and Elasticsearch
Solr and Elasticsearch, a performance study
Solr vs ElasticSearch
Elasticsearch - Devoxx France 2012 - English version
Workshop: Learning Elasticsearch

What's hot (20)

PDF
ElasticSearch in action
PPTX
Introduction to Lucene & Solr and Usecases
PDF
Side by Side with Elasticsearch & Solr, Part 2
PDF
Apache Solr/Lucene Internals by Anatoliy Sokolenko
PPT
Elastic search apache_solr
PDF
Elasticsearch speed is key
PPT
Lucene Introduction
PPTX
Hacking Lucene for Custom Search Results
PPT
Building Intelligent Search Applications with Apache Solr and PHP5
PDF
Building your own search engine with Apache Solr
PDF
Solr Recipes
PPTX
ElasticSearch in Production: lessons learned
PPT
Building a CRM on top of ElasticSearch
PDF
Intro to Elasticsearch
PDF
Your Data, Your Search, ElasticSearch (EURUKO 2011)
PPTX
Introduction to Elasticsearch with basics of Lucene
PDF
Introduction to Elasticsearch
PDF
Introduction to Apache Solr
PPTX
Introduction to Apache Solr
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
ElasticSearch in action
Introduction to Lucene & Solr and Usecases
Side by Side with Elasticsearch & Solr, Part 2
Apache Solr/Lucene Internals by Anatoliy Sokolenko
Elastic search apache_solr
Elasticsearch speed is key
Lucene Introduction
Hacking Lucene for Custom Search Results
Building Intelligent Search Applications with Apache Solr and PHP5
Building your own search engine with Apache Solr
Solr Recipes
ElasticSearch in Production: lessons learned
Building a CRM on top of ElasticSearch
Intro to Elasticsearch
Your Data, Your Search, ElasticSearch (EURUKO 2011)
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch
Introduction to Apache Solr
Introduction to Apache Solr
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Ad

Viewers also liked (17)

PDF
The Seven Deadly Sins of Solr - By Jay Hill
PDF
Facettensuche mit Lucene und Solr
PDF
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
PDF
Grouping and Joining in Lucene/Solr
PDF
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
PDF
Working with deeply nested documents in Apache Solr
PDF
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
PDF
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
PDF
Working with deeply nested documents in Apache Solr
PPTX
Elasticsearch 설치 및 기본 활용
PDF
엘라스틱서치, 로그스태시, 키바나
PDF
Elastic Search (엘라스틱서치) 입문
PPTX
elasticsearch_적용 및 활용_정리
PDF
Webinar: Building Conversational Search with Fusion
ODP
Solr facets and custom indices
PPTX
Solr vs. Elasticsearch - Case by Case
PPTX
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
The Seven Deadly Sins of Solr - By Jay Hill
Facettensuche mit Lucene und Solr
Warum 'ne Datenbank, wenn wir Elasticsearch haben?
Grouping and Joining in Lucene/Solr
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
Working with deeply nested documents in Apache Solr
The Evolution of Lucene & Solr Numerics from Strings to Points: Presented by ...
Building and Running Solr-as-a-Service: Presented by Shai Erera, IBM
Working with deeply nested documents in Apache Solr
Elasticsearch 설치 및 기본 활용
엘라스틱서치, 로그스태시, 키바나
Elastic Search (엘라스틱서치) 입문
elasticsearch_적용 및 활용_정리
Webinar: Building Conversational Search with Fusion
Solr facets and custom indices
Solr vs. Elasticsearch - Case by Case
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Ad

Similar to Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch (20)

PPTX
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
PPTX
Scaling Massive Elasticsearch Clusters
PDF
Scaling massive elastic search clusters - Rafał Kuć - Sematext
PPTX
Elasticsearch tuning
PPTX
Elasticsearch { "Meetup" : "talk" }
ODP
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
PPTX
Boston elasticsearch meetup October 2012
PPTX
Elastic pivorak
PPTX
Elasticsearch - DevNexus 2015
ODP
Elasticsearch for beginners
PDF
Elasticsearch and Spark
PPSX
Elasticsearch - basics and beyond
PDF
Introduction to solr
PDF
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
PPTX
Elasticsearch as a search alternative to a relational database
PPTX
Elasticsearch python
PDF
Optimizing Elastic for Search at McQueen Solutions
PDF
Real-time search in Drupal. Meet Elasticsearch
PPTX
Introduction to ElasticSearch
PDF
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf
BigData Faceted Search Comparison between Apache Solr vs. ElasticSearch
Scaling Massive Elasticsearch Clusters
Scaling massive elastic search clusters - Rafał Kuć - Sematext
Elasticsearch tuning
Elasticsearch { "Meetup" : "talk" }
Large Scale Performance Monitoring for ElasticSearch, HBase, Solr, SenseiDB, ...
Boston elasticsearch meetup October 2012
Elastic pivorak
Elasticsearch - DevNexus 2015
Elasticsearch for beginners
Elasticsearch and Spark
Elasticsearch - basics and beyond
Introduction to solr
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
Elasticsearch as a search alternative to a relational database
Elasticsearch python
Optimizing Elastic for Search at McQueen Solutions
Real-time search in Drupal. Meet Elasticsearch
Introduction to ElasticSearch
ITB 2023 - cbElasticSearch Modern Searching for Modern CFML - Jon Clausen.pdf

More from Sematext Group, Inc. (20)

PDF
Tweaking the Base Score: Lucene/Solr Similarities Explained
PDF
OOPs, OOMs, oh my! Containerizing JVM apps
PPTX
Is observability good for your brain?
PDF
Introducing log analysis to your organization
PPTX
Solr Search Engine: Optimize Is (Not) Bad for You
PDF
Solr on Docker - the Good, the Bad and the Ugly
PDF
Monitoring and Log Management for
PDF
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
PDF
Elasticsearch for Logs & Metrics - a deep dive
PDF
How to Run Solr on Docker and Why
PDF
Tuning Solr & Pipeline for Logs
PPTX
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
PDF
Top Node.js Metrics to Watch
PPT
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
PDF
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
PDF
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
PDF
Docker Logging Webinar
PDF
Docker Monitoring Webinar
PDF
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
PPTX
Tuning Elasticsearch Indexing Pipeline for Logs
Tweaking the Base Score: Lucene/Solr Similarities Explained
OOPs, OOMs, oh my! Containerizing JVM apps
Is observability good for your brain?
Introducing log analysis to your organization
Solr Search Engine: Optimize Is (Not) Bad for You
Solr on Docker - the Good, the Bad and the Ugly
Monitoring and Log Management for
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Elasticsearch for Logs & Metrics - a deep dive
How to Run Solr on Docker and Why
Tuning Solr & Pipeline for Logs
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Top Node.js Metrics to Watch
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Docker Logging Webinar
Docker Monitoring Webinar
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Tuning Elasticsearch Indexing Pipeline for Logs

Recently uploaded (20)

PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
PDF
Smarter Business Operations Powered by IoT Remote Monitoring
PDF
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
PDF
Reimagining Insurance: Connected Data for Confident Decisions.pdf
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
Transforming Manufacturing operations through Intelligent Integrations
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Top Generative AI Tools for Patent Drafting in 2025.pdf
PDF
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Belt and Road Supply Chain Finance Blockchain Solution
PDF
Google’s NotebookLM Unveils Video Overviews
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Enable Enterprise-Ready Security on IBM i Systems.pdf
Building High-Performance Oracle Teams: Strategic Staffing for Database Manag...
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
A Day in the Life of Location Data - Turning Where into How.pdf
Smarter Business Operations Powered by IoT Remote Monitoring
AI And Its Effect On The Evolving IT Sector In Australia - Elevate
Reimagining Insurance: Connected Data for Confident Decisions.pdf
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
Transforming Manufacturing operations through Intelligent Integrations
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Top Generative AI Tools for Patent Drafting in 2025.pdf
How AI Agents Improve Data Accuracy and Consistency in Due Diligence.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Belt and Road Supply Chain Finance Blockchain Solution
Google’s NotebookLM Unveils Video Overviews
madgavkar20181017ppt McKinsey Presentation.pdf

Battle of the Giants Round 2 - Apache Solr vs. Elasticsearch

  • 1. Battle of the Giants Rafał Kuć – Sematext Group, Inc. @kucrafal @sematext sematext.com
  • 2. Ich bin ein… Sematext consultant & engineer Solr Cookbook series author „ElasticSearch Server” author „Mastering ElasticSearch” author Solr.pl co-founder Father and husband  Copyright 2013 Sematext Group. Inc. All rights reserved
  • 3. Copyright 2013 Sematext Group. Inc. All rights reserved
  • 4. Under the Hood Copyright 2013 Sematext Group. Inc. All rights reserved Lucene 4.3Lucene 4.3
  • 5. Expectations Scalability Fault toleranance High availablity Features Manageability Ease of installation Tools Support Copyright 2013 Sematext Group. Inc. All rights reserved
  • 6. Expectations vs Reality Only ElasticSearch nodes Single leader Copyright 2013 Sematext Group. Inc. All rights reserved Solr + ZooKeeper Leader per shard Distributed Fault tolerant Automatic leader election
  • 7. All Time Top Committers Copyright 2013 Sematext Group. Inc. All rights reserved
  • 8. Active Contributors Copyright 2013 Sematext Group. Inc. All rights reserved
  • 9. The Code Copyright 2013 Sematext Group. Inc. All rights reserved
  • 10. The Mailing Lists Copyright 2013 Sematext Group. Inc. All rights reserved
  • 11. Trends Copyright 2013 Sematext Group. Inc. All rights reserved
  • 12. Collection vs Index Collections and Indices can be spread among different nodes in the cluster Copyright 2013 Sematext Group. Inc. All rights reserved Collection – main logical index Index – main logical structure
  • 13. Apache Solr Index Structure Field and types defined in schema Automatic value copying Dynamic fields Custom similarity Custom postings format Multiple document types require shared schema Can be read using API Copyright 2013 Sematext Group. Inc. All rights reserved
  • 14. ElasticSearch Index Structure Schema - less Fields and types defined with HTTP API Multi – field support Nested and parent – child documents Custom similarity Custom postings format Multiple document with different structure Can be read and written using API Copyright 2013 Sematext Group. Inc. All rights reserved
  • 15. Shards and Replicas Many shards 0 or more replicas Replica can become leader Replicas can be created on live cluster Copyright 2013 Sematext Group. Inc. All rights reserved
  • 16. Configuration Static in solrconfig.xml Can be reloaded with core reload Static in elasticsearch.yml Changable at runtime Copyright 2013 Sematext Group. Inc. All rights reserved
  • 17. Discovery Copyright 2013 Sematext Group. Inc. All rights reserved Zen DiscoveryApache Zookeeper
  • 18. Solr & ZooKeeper Requires additional software Prevents split – brain situations Holds collections configurations ZooKeeper ensemble needed Copyright 2013 Sematext Group. Inc. All rights reserved
  • 19. ElasticSearch Zen Discovery Automatic node discovery Multicast and unicast discovery methods Automatic master detection Two - way failure detection Copyright 2013 Sematext Group. Inc. All rights reserved
  • 20. HTTP FTW HTTP REST API in ElasticSearch or Query String for simple queries HTTP with Query String in Apache Solr Both provide specialized Java API Copyright 2013 Sematext Group. Inc. All rights reserved
  • 21. Results Grouping Group on: field value query result function query Copyright 2013 Sematext Group. Inc. All rights reserved
  • 22. Prospective Search Called Percolator Matches documents to stored queries Copyright 2013 Sematext Group. Inc. All rights reserved
  • 23. Full Text Search Capabilities Variety of queries Control score calculation Different query parsers Advanced Lucene queries Copyright 2013 Sematext Group. Inc. All rights reserved
  • 24. Score Calculation Leverage Lucene scoring Control importance of: documents queries terms phrases Similiarity configuration Copyright 2013 Sematext Group. Inc. All rights reserved
  • 25. Apache Solr and Score Influence Index - time boosting Query - time Term boosts Field boosts Phrases boost Function queries Sub-queries used for boosting Copyright 2013 Sematext Group. Inc. All rights reserved
  • 26. ElasticSearch and Score Influence Index - time Query - time Different queries provide different boost controls Can calculate distributed term frequencies Negative and Positive boosting queries Custom score filters Scripts Copyright 2013 Sematext Group. Inc. All rights reserved
  • 27. ElasticSearch Query Rescore Reorders top N hits by using other query Executed on shards before results are returned to the node handling it Not executed with scan and count Copyright 2013 Sematext Group. Inc. All rights reserved
  • 28. ElasticSearch Nested Objects Indexed as separate documents Stored in the same part of index as root doc Hidden from standard queries and filters Need appropriate queries and filters (nested) Top level documents can be sorted on the basis of nested ones Copyright 2013 Sematext Group. Inc. All rights reserved
  • 29. Solr Parent – Child Relationship Used at query time Multi core joins possible select?q={!join from=parent to=id}color:Yellow Copyright 2013 Sematext Group. Inc. All rights reserved
  • 30. ElasticSearch Parent – Child Proper indexing required Indexed as separate documents Standard queries don’t return child documents Retrieve parent docs using queries and filters (has_child, has_parent, top_children) Copyright 2013 Sematext Group. Inc. All rights reserved
  • 31. Filters Used to narrown down query results Good candidates for caching and reuse Copyright 2013 Sematext Group. Inc. All rights reserved Addictive Can use different query parsers Can use local params Narrows down faceting results Defined using Query DSL Can be used for score calculation Doesn’t narrow down faceting results
  • 32. Faceting Copyright 2013 Sematext Group. Inc. All rights reserved Terms Range & query Terms statistics Spatial distance Pivot Histograms
  • 33. Real Time Or Not ? Get not yet indexed docs from transaction log Don’t need searcher reopening Copyright 2013 Sematext Group. Inc. All rights reserved Separate Get and Multi Get API Separate Realtime Get Handler
  • 34. Data Handling Single and batch indexing supported Copyright 2013 Sematext Group. Inc. All rights reserved JSON in / JSON out (and YAML) Different formats allowed (XML, JSON, CSV, binary)
  • 35. Partial Document Updates Not based on LUCENE-3837 Server-side doc reindexing Both servers use versioning Decreases network traffic Copyright 2013 Sematext Group. Inc. All rights reserved
  • 36. Apache Solr Partial Doc Update Sent to the standard update handler Requires _version_ field curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d '[ { "id" : "12345", "enabled" : { "set" : true } } ]' Copyright 2013 Sematext Group. Inc. All rights reserved
  • 37. ElasticSearch Partial Doc Update Special end – point exposed - _update Supports parameters like routing, parent, replication, percolate, etc (similar to Index API) Uses scripts to perform document updates curl -XPOST 'localhost:9200/sematext/test/12345/_update' -d '{ "script" : "ctx._source.enabled = enabled", "params" : { "enabled" : true } }' Copyright 2013 Sematext Group. Inc. All rights reserved
  • 38. Solr Collections API Collection creation reload deletion shards splitting Copyright 2013 Sematext Group. Inc. All rights reserved
  • 39. ElasticSearch Indices REST API Index creation deletion closing and opening refreshing existence checking Copyright 2013 Sematext Group. Inc. All rights reserved
  • 40. Apache Solr Shard Splitting Copyright 2013 Sematext Group. Inc. All rights reserved admin/collections?action=SPLITSHARD&collection=collection1&shard=shard1
  • 41. Cluster State Monitoring Copyright 2013 Sematext Group. Inc. All rights reserved Multiple MBeans exposed by JMX Multiple REST end – points exposed to get different statistics
  • 42. ElasticSearch Statistics API Health and state check Nodes information Cache statistics Segments information Index information Mappings information Copyright 2013 Sematext Group. Inc. All rights reserved SPM – „One to rule them all”
  • 43. ElasticSearch Cluster Settings Update Control rebalancing recovery allocation Change cluster configuration properties Copyright 2013 Sematext Group. Inc. All rights reserved
  • 44. ElasticSearch Custom Shard Allocation Cluster level: Index level: curl -XPUT localhost:9200/_cluster/settings -d '{ "persistent" : { "cluster.routing.allocation.exclude._ip" : "192.168.2.1" } }' curl -XPUT localhost:9200/sematext/_settings/ -d '{ "index.routing.allocation.include.tag" : "nodeOne,nodeTwo" }' Copyright 2013 Sematext Group. Inc. All rights reserved
  • 45. Moving Shards and Replicas Move shards between nodes on demand curl -XPOST 'localhost:9200/_cluster/reroute' -d '{ "commands" : [ {"move" : {"index" : "sematext", "shard" : 0, "from_node" : "node1", "to_node" : "node2"}}, {"allocate" : {"index" : "sematext", "shard" : 1, "node" : "node3"}} ] }' Copyright 2013 Sematext Group. Inc. All rights reserved
  • 46. Copyright 2013 Sematext Group. Inc. All rights reserved The Verdict
  • 47. And The Winner Is ? Copyright 2013 Sematext Group. Inc. All rights reserved
  • 48. We Are Hiring ! Dig Search ? Dig Analytics ? Dig Big Data ? Dig Performance ? Dig working with and in open – source ? We’re hiring world – wide ! http://sematext.com/about/jobs.html Copyright 2013 Sematext Group. Inc. All rights reserved
  • 49. Copyright 2013 Sematext Group. Inc. All rights reserved Rafał Kuć @kucrafal [email protected] Sematext @sematext http://sematext.com http://blog.sematext.com ElasticSearch Server 25% off: MREESS25 Thank You !