SlideShare a Scribd company logo
Elasticsearch - index server 
used as a document database 
! 
(with examples) 
! 
Robert Lujo, 2014
about me 
software 
professionally 17 y. 
freelancer 
more info -> linkedin
Elasticsearch 
search server based on Apache Lucene 
distributed, multitenant-capable 
full-text search engine 
RESTful web interface 
schema-free JSON documents 
NoSQL capabilities 
https://en.wikipedia.org/wiki/Elasticsearch
Elasticsearch 
first release in February 2010 
until now raised total funding > $100M 
latest release 1.3 & 1.4 beta 
+ Logstash+ Kibana => ELK stack 
Apache 2 Open Source License
Very popular 
and used by 
! 
! 
! 
! 
! 
… Wikimedia, Mozilla, Stack Exchange, Quora, CERN …! 
!
Professional services also available
What about docs?
Features 
Sys: ! 
real time data, distributed, multi-tenancy, real time 
analytics, high availability 
Dev:! 
restful api, document oriented, schema free, full text 
search, per-operation persistence, conflict management 
http://www.elasticsearch.org/overview/elasticsearch/
Install, run … 
prerequisite: JDK - Java (Lucene remember?) 
wget https://download.elasticsearch.org/.../ 
elasticsearch-1.3.4.zip 
unzip elasticsearch-1.3.4.zip 
elasticsearch-1.3.4/bin/elasticsearch
& use! 
# curl localhost:9200 
{ 
"status" : 200, 
"name" : "The Night Man", 
"version" : { 
"number" : "1.3.4", 
"build_hash" : “…”, 
"build_timestamp" : “…”, 
"build_snapshot" : false, 
"lucene_version" : "4.9" 
}, 
"tagline" : "You Know, for Search" 
}
create index 
& put some data 
# curl -XPUT localhost:9200/mainindex/company/1 
-d '{ 
"name" : "CoolComp Ltd.", 
"employees" : 10, 
"founded" : "2014-10-05", 
"services" : ["software", "consulting"], 
"management": [ 
{"role" : "CEO", 
"name" : "Petar Petrovich"}, 
{"name" : "Ivan Ivić"} 
], 
"updated" : "2014-10-05T22:31:55" 
}’ 
=> 
{"_index":"mainindex","_type":"company","_id":"1","_ver 
sion":4,"created":false}
fetch document by id 
(key/value database) 
# curl -XGET localhost:9200/mainindex/company/1 
! 
=> 
! 
{“_index":"mainindex", 
“_type":"company", 
“_id”:"1","_version" : 4, ”found”:true, 
"_source":{ 
"name" : "CoolComp Ltd.", 
"employees" : 10, 
… 
}}
search documents 
# curl -XGET 'http://localhost:9200/maindex/ 
_search?q=management.name:petar' # no type! 
{“took”:128,"timed_out":false,"_shards":{"total": 
5,"successful":5,"failed":0}, 
“hits”:{ 
"total":1, 
"max_score":0.15342641, 
“hits” : [ 
{“_index":"mainindex","_type":"company", 
“_id":"1", 
“_score":0.15342641, 
"_source":{ 
"name" : "CoolComp Ltd.", 
… 
"updated" : "2014-10-05T22:31:55"
Database is … 
an organized (or structured) collection of data 
! 
Database management system (DBMS) is …! 
software system provides interface between users and database(s) 
4 common groups of interactions: 
1. Data definition 
2. Update - CrUD 
3. Retrieval - cRud 
4. Administration
Elasticsearch is a database? 
1. Data definition 
2. Update - CrUD 
3. Retrieval - cRud 
4. Administration
Data representation 
- document-oriented-database 
Document-oriented-database - “NoSql branch”? Not really but … 
Document is … blah blah blah … something like this: 
! 
{ 
“_id” : 1, 
“_type” : “company”, 
"name" : "CoolComp Ltd.", 
"employees" : 10, 
"founded" : "2014-10-05", 
"services" : ["software", "consulting"], 
"management": [ 
{"role" : "CEO", 
"name" : "Petar Petrovich"}, 
{"name" : "Ivan Ivić"} 
], 
"updated" : "2014-10-05T22:31:55" 
}
Data representation 
- relational databases 
company: id name employees founded 
-- --------------- --------- ------------ 
1 'CoolComp Ltd.' 10 '2014-10-05' 
! 
services: id value 
--- ---------------- 
1 'software' 
2 'consulting' 
! 
company_services: id id_comp id_serv 
--- ------- -------- 
1 1 1 
2 1 2 
! 
person: id name 
-- ----------------- 
1 'Petar Petrovich' 
2 'Ivan Ivić' 
comp_management: id id_comp id_pers role 
--- ------- ------- ----- 
1 1 1 CEO 
2 1 2 MEM
Data definition 
Elasticsearch is “schemaless” 
But it provides defining schema - mappings 
Very important when setting up for search: 
• data types - string, integer, float, date/tst, 
boolean, binary, array, object, nested, geo, 
attachment 
• search analysers, boosting, etc.
Data definition 
- compared to RDBMS 
But we loose some things what RDBMS offers: 
• data validation / integrity 
• removing data redundancy - normalization 
• “fine grained” structure definition 
• standard and common usage (SQL)
Retrieval 
We had this example before:! 
! 
# curl -XGET 'http://localhost:9200/maindex/ 
_search?q=management.name:petar' # no type! 
! 
equivalent SQL query:! 
! 
select * 
from company 
where exists( 
select 1 
from comp_management cm 
inner join peron p 
on p.id=cm.id_pers 
where lower(p.name) like '%peter%');
Retrieval - ES-QDSL 
based on my experience, I would rather use ES: 
• for searches: full text, fuzzy, multi field, multi 
document types, multi indexes/databases 
• in programming - better to convert/deal with 
JSON than with ORM/raw SQL results 
• single web page applications
Retrieval - SQL 
on the other hand, I would rather use SQL and 
RDBMS: 
• when composing complex query - easier to 
do with SQL 
• for data exploring/researching 
! 
SQL is much more expressive DSL
Joining & denormalization 
object hierarchy … must be denormalized. 
increases retrieval performance (since no query joining is 
necessary), 
uses more space 
makes keeping things consistent and up-to-date more difficult 
They’re excellent for write-once-read-many-workloads 
https://www.found.no/foundation/elasticsearch-as-nosql/
Joining options 
ES has several ways to “join” objects/documents/types: 
1. embedding objects 
2. “nested” objects 
3. parent / child relation between types 
4. compose manual query 
When fetching by id - very handy (1 & 2). 
When quering - not so handy.
Updating - CrUD ! 
Elasticsearch 
I would rather use Elasticsearch: 
• when creating, updating and deleting single 
nested document
Updating - CrUD! 
RDBMS 
on the other hand, RDBS I found handy: 
• for flat entities/documents 
• for mass objects manipulation 
• transactions & integrity (ACID)
Administration 
install, configure, maintenance, monitoring, scaling 
… quite satisfing! 
! 
OS specific install - apt-get, yum, zypper, brew, … 
! 
plugins installation 
./bin/plugin -i Elasticsearch/marvel/latest
Administration - tools
Elasticsearch as Database 
! 
! 
! 
! 
! 
to avoid maintenance and development time overhead
Hybrid solution 
Elasticsearch + …
ES + … - hybrid solution 
So why can you use ElasticSearch as a single point 
of truth (SPOT)? 
Elasticsearch … used in addition to another 
database. 
A database system for constraints, correctness and 
robustness, transactionally updatable, 
master record which is then asynchronously 
pushed/pulled to Elasticsearch
Hybrid solution
Elasticsearch rivers 
besides classic indexing - rivers provide alternative way for inserting 
data into ES 
service that fetches the data from an external source (one shot or 
periodically) and puts the data into the cluster 
Besides listed on official site: 
• RDBMS/JDBC 
• MongoDB 
• Redis 
• Couchbase 
• …
Use-case - RDBMS & Elasticsearch 
• Indexing & reindexing subdocuments is major 
job 
• upsert mode 
• issues - not indexing, memory hungry, full 
reindex when new field/subdoc 
• building AST when building a query - quite 
demanding 
• satisfied with the final result!
What about others?
Riak & Solr 
September 16, 2014 - With 2.0, we have added distributed Solr to Riak Search. For every 
instance of Riak, there is an instance of Solr. While this drastically improves full-text search, it also 
improves Riak’s overall functionality. Riak Search now allows for Riak to act as a document store 
(instead of key/value) if needed. 
Despite being a part of Riak, Riak Search is a separate Erlang application. It monitors for changes 
to data in Riak and propagates those changes to indexes managed by Solr.
Couchbase and Elasticsearch 
integrates Couchbase Server and Elasticsearch, 
by streaming data in real-time from Couchbase to Elasticsearch. 
combined solution … with full-text search, indexing and querying and real-time 
analytics … content store or aggregation of data from different data sources. 
Couchbase Server provides easy scalability, low-latency document access, 
indexing and querying of JSON documents and real-time analytics with 
incremental map reduce.
MongoDB and Elasticsearch 
“addition of Elasticsearch 
represents only a first step 
in its mission to enable 
developers to choose the 
database that's right for 
their needs” 
“big weakness of 
MongoDB is the free text 
search, which MongoDB 
tried to address in version 
2.4 in some aspects.”
Not to forget good old school 
…
RDBMS with FTS
Elasticsearch use when … 
you need very good, reliable, handy, web oriented 
search index engine 
you have intensive read and document oriented 
application 
“write” balance - depending on how much - ES as 
a NoSQL only or as a hybrid solution
Summary 
no silver bullet, “the right tool for the job” 
learn & get familiar with different solutions and 
choose optimal one 
be objective & productive 
General trend are heterogenous => lot of 
integration tasks lately 
learn new things & have fun!
Thank you for your patience! 
Questions? 
! 
robert.lujo@gmail.com 
@trebor74hr

More Related Content

What's hot (19)

PPSX
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
PDF
Introduction to Elasticsearch
Jason Austin
 
PPTX
Elastic search overview
ABC Talks
 
PPTX
Intro to elasticsearch
Joey Wen
 
PPTX
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
PDF
Simple search with elastic search
markstory
 
PPTX
Elasticsearch
Ricardo Peres
 
PDF
Introduction to Elasticsearch
Sperasoft
 
PDF
Intro to Elasticsearch
Clifford James
 
PDF
Introduction to elasticsearch
hypto
 
PDF
Introduction to elasticsearch
pmanvi
 
PPTX
Elasticsearch - DevNexus 2015
Roy Russo
 
PPTX
Introduction to Elasticsearch
Bo Andersen
 
PDF
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
PPTX
Elastic Search
Navule Rao
 
PPTX
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
PPTX
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Robert Calcavecchia
 
PDF
Roaring with elastic search sangam2018
Vinay Kumar
 
PPTX
Elastic search Walkthrough
Suhel Meman
 
What I learnt: Elastic search & Kibana : introduction, installtion & configur...
Rahul K Chauhan
 
Introduction to Elasticsearch
Jason Austin
 
Elastic search overview
ABC Talks
 
Intro to elasticsearch
Joey Wen
 
Introduction to Elasticsearch with basics of Lucene
Rahul Jain
 
Simple search with elastic search
markstory
 
Elasticsearch
Ricardo Peres
 
Introduction to Elasticsearch
Sperasoft
 
Intro to Elasticsearch
Clifford James
 
Introduction to elasticsearch
hypto
 
Introduction to elasticsearch
pmanvi
 
Elasticsearch - DevNexus 2015
Roy Russo
 
Introduction to Elasticsearch
Bo Andersen
 
Elasticsearch: You know, for search! and more!
Philips Kokoh Prasetyo
 
Elastic Search
Navule Rao
 
ElasticSearch - DevNexus Atlanta - 2014
Roy Russo
 
Philly PHP: April '17 Elastic Search Introduction by Aditya Bhamidpati
Robert Calcavecchia
 
Roaring with elastic search sangam2018
Vinay Kumar
 
Elastic search Walkthrough
Suhel Meman
 

Viewers also liked (20)

PDF
LXC, Docker, and the future of software delivery | LinuxCon 2013
dotCloud
 
PDF
Hyperdex - A closer look
DECK36
 
KEY
Brunch With Coffee
Sébastien Gruhier
 
PDF
Blazes: coordination analysis for distributed programs
palvaro
 
PDF
Riak Search - Erlang Factory London 2010
Rusty Klophaus
 
PDF
Chloe and the Realtime Web
Trotter Cashion
 
PDF
(Functional) reactive programming (@pavlobaron)
Pavlo Baron
 
PDF
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
DATAVERSITY
 
PDF
NkSIP: The Erlang SIP application server
Carlos González Florido
 
PDF
Spring Cleaning for Your Smartphone
Lookout
 
PDF
Web-Oriented Architecture (WOA)
thetechnicalweb
 
PDF
Interoperability With RabbitMq
Alvaro Videla
 
PDF
In Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
Spike Brehm
 
PDF
Scalable XQuery Processing with Zorba on top of MongoDB
William Candillon
 
PPS
Erlang plus BDB: Disrupting the Conventional Web Wisdom
guest3933de
 
PDF
Shrinking the Haystack" using Solr and OpenNLP
lucenerevolution
 
PDF
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
C4Media
 
PPTX
AST - the only true tool for building JavaScript
Ingvar Stepanyan
 
PDF
Erlang as a cloud citizen, a fractal approach to throughput
Paolo Negri
 
PDF
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
LXC, Docker, and the future of software delivery | LinuxCon 2013
dotCloud
 
Hyperdex - A closer look
DECK36
 
Brunch With Coffee
Sébastien Gruhier
 
Blazes: coordination analysis for distributed programs
palvaro
 
Riak Search - Erlang Factory London 2010
Rusty Klophaus
 
Chloe and the Realtime Web
Trotter Cashion
 
(Functional) reactive programming (@pavlobaron)
Pavlo Baron
 
Complex Legacy System Archiving/Data Retention with MongoDB and Xquery
DATAVERSITY
 
NkSIP: The Erlang SIP application server
Carlos González Florido
 
Spring Cleaning for Your Smartphone
Lookout
 
Web-Oriented Architecture (WOA)
thetechnicalweb
 
Interoperability With RabbitMq
Alvaro Videla
 
In Pursuit of the Holy Grail: Building Isomorphic JavaScript Apps
Spike Brehm
 
Scalable XQuery Processing with Zorba on top of MongoDB
William Candillon
 
Erlang plus BDB: Disrupting the Conventional Web Wisdom
guest3933de
 
Shrinking the Haystack" using Solr and OpenNLP
lucenerevolution
 
Scaling Gilt: from Monolithic Ruby Application to Distributed Scala Micro-Ser...
C4Media
 
AST - the only true tool for building JavaScript
Ingvar Stepanyan
 
Erlang as a cloud citizen, a fractal approach to throughput
Paolo Negri
 
Pregel: A System for Large-Scale Graph Processing
Chris Bunch
 
Ad

Similar to ElasticSearch - index server used as a document database (20)

PPTX
Elastic search intro-@lamper
medcl
 
PPTX
曾勇 Elastic search-intro
Shaoning Pan
 
PPTX
Elasticsearch
Divij Sehgal
 
ODP
Реляционные или нереляционные (Josh Berkus)
Ontico
 
PPTX
Presentation: mongo db & elasticsearch & membase
Ardak Shalkarbayuli
 
PDF
No sq lv1_0
Tuan Luong
 
PDF
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
PPTX
About elasticsearch
Minsoo Jun
 
PDF
Elasticsearch and Spark
Audible, Inc.
 
PPTX
Couchbase - NoSQL for you! (SDP 2014)
SirKetchup
 
PDF
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
PPTX
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
PPTX
Elastic Stack Introduction
Vikram Shinde
 
PPTX
MongoDB by Emroz sardar.
Emroz Sardar
 
KEY
Mongodb intro
christkv
 
PDF
Couchbase - Yet Another Introduction
Kelum Senanayake
 
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
PPTX
Elastic search and Symfony3 - A practical approach
SymfonyMu
 
PPTX
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
PPT
mongodb-120401144140-phpapp01 claud camputing
moeincanada007
 
Elastic search intro-@lamper
medcl
 
曾勇 Elastic search-intro
Shaoning Pan
 
Elasticsearch
Divij Sehgal
 
Реляционные или нереляционные (Josh Berkus)
Ontico
 
Presentation: mongo db & elasticsearch & membase
Ardak Shalkarbayuli
 
No sq lv1_0
Tuan Luong
 
Elasticsearch JVM-MX Meetup April 2016
Domingo Suarez Torres
 
About elasticsearch
Minsoo Jun
 
Elasticsearch and Spark
Audible, Inc.
 
Couchbase - NoSQL for you! (SDP 2014)
SirKetchup
 
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
kristgen
 
SQL to NoSQL: Top 6 Questions
Mike Broberg
 
Elastic Stack Introduction
Vikram Shinde
 
MongoDB by Emroz sardar.
Emroz Sardar
 
Mongodb intro
christkv
 
Couchbase - Yet Another Introduction
Kelum Senanayake
 
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Daniel N
 
Elastic search and Symfony3 - A practical approach
SymfonyMu
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
CCG
 
mongodb-120401144140-phpapp01 claud camputing
moeincanada007
 
Ad

More from Robert Lujo (6)

PDF
Natural language processing (NLP) introduction
Robert Lujo
 
PDF
Django dev-env-my-way
Robert Lujo
 
PDF
Object.__class__.__dict__ - python object model and friends - with examples
Robert Lujo
 
PDF
Funkcija, objekt, python
Robert Lujo
 
ODP
Python - na uzlazu ili silazu?
Robert Lujo
 
ODP
Razvoj softvera: crno/bijeli svijet?
Robert Lujo
 
Natural language processing (NLP) introduction
Robert Lujo
 
Django dev-env-my-way
Robert Lujo
 
Object.__class__.__dict__ - python object model and friends - with examples
Robert Lujo
 
Funkcija, objekt, python
Robert Lujo
 
Python - na uzlazu ili silazu?
Robert Lujo
 
Razvoj softvera: crno/bijeli svijet?
Robert Lujo
 

Recently uploaded (20)

PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
NEW-Viral>Wondershare Filmora 14.5.18.12900 Crack Free
sherryg1122g
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 

ElasticSearch - index server used as a document database

  • 1. Elasticsearch - index server used as a document database ! (with examples) ! Robert Lujo, 2014
  • 2. about me software professionally 17 y. freelancer more info -> linkedin
  • 3. Elasticsearch search server based on Apache Lucene distributed, multitenant-capable full-text search engine RESTful web interface schema-free JSON documents NoSQL capabilities https://en.wikipedia.org/wiki/Elasticsearch
  • 4. Elasticsearch first release in February 2010 until now raised total funding > $100M latest release 1.3 & 1.4 beta + Logstash+ Kibana => ELK stack Apache 2 Open Source License
  • 5. Very popular and used by ! ! ! ! ! … Wikimedia, Mozilla, Stack Exchange, Quora, CERN …! !
  • 8. Features Sys: ! real time data, distributed, multi-tenancy, real time analytics, high availability Dev:! restful api, document oriented, schema free, full text search, per-operation persistence, conflict management http://www.elasticsearch.org/overview/elasticsearch/
  • 9. Install, run … prerequisite: JDK - Java (Lucene remember?) wget https://download.elasticsearch.org/.../ elasticsearch-1.3.4.zip unzip elasticsearch-1.3.4.zip elasticsearch-1.3.4/bin/elasticsearch
  • 10. & use! # curl localhost:9200 { "status" : 200, "name" : "The Night Man", "version" : { "number" : "1.3.4", "build_hash" : “…”, "build_timestamp" : “…”, "build_snapshot" : false, "lucene_version" : "4.9" }, "tagline" : "You Know, for Search" }
  • 11. create index & put some data # curl -XPUT localhost:9200/mainindex/company/1 -d '{ "name" : "CoolComp Ltd.", "employees" : 10, "founded" : "2014-10-05", "services" : ["software", "consulting"], "management": [ {"role" : "CEO", "name" : "Petar Petrovich"}, {"name" : "Ivan Ivić"} ], "updated" : "2014-10-05T22:31:55" }’ => {"_index":"mainindex","_type":"company","_id":"1","_ver sion":4,"created":false}
  • 12. fetch document by id (key/value database) # curl -XGET localhost:9200/mainindex/company/1 ! => ! {“_index":"mainindex", “_type":"company", “_id”:"1","_version" : 4, ”found”:true, "_source":{ "name" : "CoolComp Ltd.", "employees" : 10, … }}
  • 13. search documents # curl -XGET 'http://localhost:9200/maindex/ _search?q=management.name:petar' # no type! {“took”:128,"timed_out":false,"_shards":{"total": 5,"successful":5,"failed":0}, “hits”:{ "total":1, "max_score":0.15342641, “hits” : [ {“_index":"mainindex","_type":"company", “_id":"1", “_score":0.15342641, "_source":{ "name" : "CoolComp Ltd.", … "updated" : "2014-10-05T22:31:55"
  • 14. Database is … an organized (or structured) collection of data ! Database management system (DBMS) is …! software system provides interface between users and database(s) 4 common groups of interactions: 1. Data definition 2. Update - CrUD 3. Retrieval - cRud 4. Administration
  • 15. Elasticsearch is a database? 1. Data definition 2. Update - CrUD 3. Retrieval - cRud 4. Administration
  • 16. Data representation - document-oriented-database Document-oriented-database - “NoSql branch”? Not really but … Document is … blah blah blah … something like this: ! { “_id” : 1, “_type” : “company”, "name" : "CoolComp Ltd.", "employees" : 10, "founded" : "2014-10-05", "services" : ["software", "consulting"], "management": [ {"role" : "CEO", "name" : "Petar Petrovich"}, {"name" : "Ivan Ivić"} ], "updated" : "2014-10-05T22:31:55" }
  • 17. Data representation - relational databases company: id name employees founded -- --------------- --------- ------------ 1 'CoolComp Ltd.' 10 '2014-10-05' ! services: id value --- ---------------- 1 'software' 2 'consulting' ! company_services: id id_comp id_serv --- ------- -------- 1 1 1 2 1 2 ! person: id name -- ----------------- 1 'Petar Petrovich' 2 'Ivan Ivić' comp_management: id id_comp id_pers role --- ------- ------- ----- 1 1 1 CEO 2 1 2 MEM
  • 18. Data definition Elasticsearch is “schemaless” But it provides defining schema - mappings Very important when setting up for search: • data types - string, integer, float, date/tst, boolean, binary, array, object, nested, geo, attachment • search analysers, boosting, etc.
  • 19. Data definition - compared to RDBMS But we loose some things what RDBMS offers: • data validation / integrity • removing data redundancy - normalization • “fine grained” structure definition • standard and common usage (SQL)
  • 20. Retrieval We had this example before:! ! # curl -XGET 'http://localhost:9200/maindex/ _search?q=management.name:petar' # no type! ! equivalent SQL query:! ! select * from company where exists( select 1 from comp_management cm inner join peron p on p.id=cm.id_pers where lower(p.name) like '%peter%');
  • 21. Retrieval - ES-QDSL based on my experience, I would rather use ES: • for searches: full text, fuzzy, multi field, multi document types, multi indexes/databases • in programming - better to convert/deal with JSON than with ORM/raw SQL results • single web page applications
  • 22. Retrieval - SQL on the other hand, I would rather use SQL and RDBMS: • when composing complex query - easier to do with SQL • for data exploring/researching ! SQL is much more expressive DSL
  • 23. Joining & denormalization object hierarchy … must be denormalized. increases retrieval performance (since no query joining is necessary), uses more space makes keeping things consistent and up-to-date more difficult They’re excellent for write-once-read-many-workloads https://www.found.no/foundation/elasticsearch-as-nosql/
  • 24. Joining options ES has several ways to “join” objects/documents/types: 1. embedding objects 2. “nested” objects 3. parent / child relation between types 4. compose manual query When fetching by id - very handy (1 & 2). When quering - not so handy.
  • 25. Updating - CrUD ! Elasticsearch I would rather use Elasticsearch: • when creating, updating and deleting single nested document
  • 26. Updating - CrUD! RDBMS on the other hand, RDBS I found handy: • for flat entities/documents • for mass objects manipulation • transactions & integrity (ACID)
  • 27. Administration install, configure, maintenance, monitoring, scaling … quite satisfing! ! OS specific install - apt-get, yum, zypper, brew, … ! plugins installation ./bin/plugin -i Elasticsearch/marvel/latest
  • 29. Elasticsearch as Database ! ! ! ! ! to avoid maintenance and development time overhead
  • 31. ES + … - hybrid solution So why can you use ElasticSearch as a single point of truth (SPOT)? Elasticsearch … used in addition to another database. A database system for constraints, correctness and robustness, transactionally updatable, master record which is then asynchronously pushed/pulled to Elasticsearch
  • 33. Elasticsearch rivers besides classic indexing - rivers provide alternative way for inserting data into ES service that fetches the data from an external source (one shot or periodically) and puts the data into the cluster Besides listed on official site: • RDBMS/JDBC • MongoDB • Redis • Couchbase • …
  • 34. Use-case - RDBMS & Elasticsearch • Indexing & reindexing subdocuments is major job • upsert mode • issues - not indexing, memory hungry, full reindex when new field/subdoc • building AST when building a query - quite demanding • satisfied with the final result!
  • 36. Riak & Solr September 16, 2014 - With 2.0, we have added distributed Solr to Riak Search. For every instance of Riak, there is an instance of Solr. While this drastically improves full-text search, it also improves Riak’s overall functionality. Riak Search now allows for Riak to act as a document store (instead of key/value) if needed. Despite being a part of Riak, Riak Search is a separate Erlang application. It monitors for changes to data in Riak and propagates those changes to indexes managed by Solr.
  • 37. Couchbase and Elasticsearch integrates Couchbase Server and Elasticsearch, by streaming data in real-time from Couchbase to Elasticsearch. combined solution … with full-text search, indexing and querying and real-time analytics … content store or aggregation of data from different data sources. Couchbase Server provides easy scalability, low-latency document access, indexing and querying of JSON documents and real-time analytics with incremental map reduce.
  • 38. MongoDB and Elasticsearch “addition of Elasticsearch represents only a first step in its mission to enable developers to choose the database that's right for their needs” “big weakness of MongoDB is the free text search, which MongoDB tried to address in version 2.4 in some aspects.”
  • 39. Not to forget good old school …
  • 41. Elasticsearch use when … you need very good, reliable, handy, web oriented search index engine you have intensive read and document oriented application “write” balance - depending on how much - ES as a NoSQL only or as a hybrid solution
  • 42. Summary no silver bullet, “the right tool for the job” learn & get familiar with different solutions and choose optimal one be objective & productive General trend are heterogenous => lot of integration tasks lately learn new things & have fun!
  • 43. Thank you for your patience! Questions? ! [email protected] @trebor74hr