ElasticSearch - index server used as a document database

Elasticsearch - index server
used as a document database
!
(with examples)
!
Robert Lujo, 2014

about me
software
professionally 17 y.
freelancer
more info -> linkedin

Elasticsearch
search server based on Apache Lucene
distributed, multitenant-capable
full-text search engine
RESTful web interface
schema-free JSON documents
NoSQL capabilities
https://en.wikipedia.org/wiki/Elasticsearch

Elasticsearch
first release in February 2010
until now raised total funding > $100M
latest release 1.3 & 1.4 beta
+ Logstash+ Kibana => ELK stack
Apache 2 Open Source License

Very popular
and used by
!
!
!
!
!
… Wikimedia, Mozilla, Stack Exchange, Quora, CERN …!
!

Professional services also available

Features
Sys: !
real time data, distributed, multi-tenancy, real time
analytics, high availability
Dev:!
restful api, document oriented, schema free, full text
search, per-operation persistence, conflict management
http://www.elasticsearch.org/overview/elasticsearch/

Install, run …
prerequisite: JDK - Java (Lucene remember?)
wget https://download.elasticsearch.org/.../
elasticsearch-1.3.4.zip
unzip elasticsearch-1.3.4.zip
elasticsearch-1.3.4/bin/elasticsearch

& use!
# curl localhost:9200
{
"status" : 200,
"name" : "The Night Man",
"version" : {
"number" : "1.3.4",
"build_hash" : “…”,
"build_timestamp" : “…”,
"build_snapshot" : false,
"lucene_version" : "4.9"
},
"tagline" : "You Know, for Search"
}

create index
& put some data
# curl -XPUT localhost:9200/mainindex/company/1
-d '{
"name" : "CoolComp Ltd.",
"employees" : 10,
"founded" : "2014-10-05",
"services" : ["software", "consulting"],
"management": [
{"role" : "CEO",
"name" : "Petar Petrovich"},
{"name" : "Ivan Ivić"}
],
"updated" : "2014-10-05T22:31:55"
}’
=>
{"_index":"mainindex","_type":"company","_id":"1","_ver
sion":4,"created":false}

fetch document by id
(key/value database)
# curl -XGET localhost:9200/mainindex/company/1
!
=>
!
{“_index":"mainindex",
“_type":"company",
“_id”:"1","_version" : 4, ”found”:true,
"_source":{
"employees" : 10,
…
}}

search documents
# curl -XGET 'http://localhost:9200/maindex/
_search?q=management.name:petar' # no type!
{“took”:128,"timed_out":false,"_shards":{"total":
5,"successful":5,"failed":0},
“hits”:{
"total":1,
"max_score":0.15342641,
“hits” : [
{“_index":"mainindex","_type":"company",
“_id":"1",
“_score":0.15342641,
"_source":{
…
"updated" : "2014-10-05T22:31:55"

Database is …
an organized (or structured) collection of data
!
Database management system (DBMS) is …!
software system provides interface between users and database(s)
4 common groups of interactions:
1. Data definition
2. Update - CrUD
3. Retrieval - cRud
4. Administration

Elasticsearch is a database?
1. Data definition
2. Update - CrUD
3. Retrieval - cRud
4. Administration

Data representation
- document-oriented-database
Document-oriented-database - “NoSql branch”? Not really but …
Document is … blah blah blah … something like this:
!
{
“_id” : 1,
“_type” : “company”,
"employees" : 10,
"founded" : "2014-10-05",
"services" : ["software", "consulting"],
"management": [
{"role" : "CEO",
"name" : "Petar Petrovich"},
{"name" : "Ivan Ivić"}
],
"updated" : "2014-10-05T22:31:55"
}

Data representation
- relational databases
company: id name employees founded
-- --------------- --------- ------------
1 'CoolComp Ltd.' 10 '2014-10-05'
!
services: id value
--- ----------------
1 'software'
2 'consulting'
!
company_services: id id_comp id_serv
--- ------- --------
1 1 1
2 1 2
!
person: id name
-- -----------------
1 'Petar Petrovich'
2 'Ivan Ivić'
comp_management: id id_comp id_pers role
--- ------- ------- -----
1 1 1 CEO
2 1 2 MEM

Data definition
Elasticsearch is “schemaless”
But it provides defining schema - mappings
Very important when setting up for search:
• data types - string, integer, float, date/tst,
boolean, binary, array, object, nested, geo,
attachment
• search analysers, boosting, etc.

Data definition
- compared to RDBMS
But we loose some things what RDBMS offers:
• data validation / integrity
• removing data redundancy - normalization
• “fine grained” structure definition
• standard and common usage (SQL)

Retrieval
We had this example before:!
!
# curl -XGET 'http://localhost:9200/maindex/
_search?q=management.name:petar' # no type!
!
equivalent SQL query:!
!
select *
from company
where exists(
select 1
from comp_management cm
inner join peron p
on p.id=cm.id_pers
where lower(p.name) like '%peter%');

Retrieval - ES-QDSL
based on my experience, I would rather use ES:
• for searches: full text, fuzzy, multi field, multi
document types, multi indexes/databases
• in programming - better to convert/deal with
JSON than with ORM/raw SQL results
• single web page applications

Retrieval - SQL
on the other hand, I would rather use SQL and
RDBMS:
• when composing complex query - easier to
do with SQL
• for data exploring/researching
!
SQL is much more expressive DSL

Joining & denormalization
object hierarchy … must be denormalized.
increases retrieval performance (since no query joining is
necessary),
uses more space
makes keeping things consistent and up-to-date more difficult
They’re excellent for write-once-read-many-workloads
https://www.found.no/foundation/elasticsearch-as-nosql/

Joining options
ES has several ways to “join” objects/documents/types:
1. embedding objects
2. “nested” objects
3. parent / child relation between types
4. compose manual query
When fetching by id - very handy (1 & 2).
When quering - not so handy.

Updating - CrUD !
Elasticsearch
I would rather use Elasticsearch:
• when creating, updating and deleting single
nested document

Updating - CrUD!
RDBMS
on the other hand, RDBS I found handy:
• for flat entities/documents
• for mass objects manipulation
• transactions & integrity (ACID)

Administration
install, configure, maintenance, monitoring, scaling
… quite satisfing!
!
OS specific install - apt-get, yum, zypper, brew, …
!
plugins installation
./bin/plugin -i Elasticsearch/marvel/latest

Elasticsearch as Database
!
!
!
!
!
to avoid maintenance and development time overhead

Hybrid solution
Elasticsearch + …

ES + … - hybrid solution
So why can you use ElasticSearch as a single point
of truth (SPOT)?
Elasticsearch … used in addition to another
database.
A database system for constraints, correctness and
robustness, transactionally updatable,
master record which is then asynchronously
pushed/pulled to Elasticsearch

Elasticsearch rivers
besides classic indexing - rivers provide alternative way for inserting
data into ES
service that fetches the data from an external source (one shot or
periodically) and puts the data into the cluster
Besides listed on official site:
• RDBMS/JDBC
• MongoDB
• Redis
• Couchbase
• …

Use-case - RDBMS & Elasticsearch
• Indexing & reindexing subdocuments is major
job
• upsert mode
• issues - not indexing, memory hungry, full
reindex when new field/subdoc
• building AST when building a query - quite
demanding
• satisfied with the final result!

Riak & Solr
September 16, 2014 - With 2.0, we have added distributed Solr to Riak Search. For every
instance of Riak, there is an instance of Solr. While this drastically improves full-text search, it also
improves Riak’s overall functionality. Riak Search now allows for Riak to act as a document store
(instead of key/value) if needed.
Despite being a part of Riak, Riak Search is a separate Erlang application. It monitors for changes
to data in Riak and propagates those changes to indexes managed by Solr.

Couchbase and Elasticsearch
integrates Couchbase Server and Elasticsearch,
by streaming data in real-time from Couchbase to Elasticsearch.
combined solution … with full-text search, indexing and querying and real-time
analytics … content store or aggregation of data from different data sources.
Couchbase Server provides easy scalability, low-latency document access,
indexing and querying of JSON documents and real-time analytics with
incremental map reduce.

MongoDB and Elasticsearch
“addition of Elasticsearch
represents only a first step
in its mission to enable
developers to choose the
database that's right for
their needs”
“big weakness of
MongoDB is the free text
search, which MongoDB
tried to address in version
2.4 in some aspects.”

Not to forget good old school
…

Elasticsearch use when …
you need very good, reliable, handy, web oriented
search index engine
you have intensive read and document oriented
application
“write” balance - depending on how much - ES as
a NoSQL only or as a hybrid solution

Summary
no silver bullet, “the right tool for the job”
learn & get familiar with different solutions and
choose optimal one
be objective & productive
General trend are heterogenous => lot of
integration tasks lately
learn new things & have fun!

Thank you for your patience!
Questions?
!
robert.lujo@gmail.com
@trebor74hr

ElasticSearch - index server used as a document database

More Related Content

What's hot (19)

Viewers also liked (20)

Similar to ElasticSearch - index server used as a document database (20)

More from Robert Lujo (6)

Recently uploaded (20)

ElasticSearch - index server used as a document database