SlideShare a Scribd company logo
Growing with ElasticSearch
Devi A S L @ RootConf
11th
May, 2018
About me
● Over a decade of experience in building software
● Lead developer/Architect at PowerToFly
Our journey with ElasticSearch
2014: launched with Postgres Full text search
2015: Faceted Search with ES v1.4
2016: Log monitoring system with ELK 2.3
2017: Analytics pipeline with ELK 5.5
Search for a search engine
Postgres
v9.3
Sphinx
v2.1
Solr
v4.x
ElasticSearch
v1.4
Full text search ✓ ✓ ✓ ✓
Support for facets ❌ ✓ ✓ ✓
Cluster ready ❌ ❌ Limited ✓
Search in PDFs ❌ ❌ ✓ ✓
REST APIs ❌ ❌ ❌ ✓
Nested docs,
Parent-Child relations
❌ NA Limited ✓
Powerful and Flexible
Query DSL
❌ NA ❌ ✓
distributed, multitenant-capable, full-text search engine.
● Built upon battle tested Lucene
● Powerful and flexible Query DSL
● Powerful Aggregations
● REST APIs for everything
● Ease with nested documents and parent-child relationships
● Suitable eco system for data pipelines
The goodness of ElasticSearch
Growing with elastic search
What sits where ?
Internet
Search
Service
ES
cluster
Periodic
Indexing
job
Postgres
DB
Primary datastore
for
core data
jobs, candidates
data
Log monitoring with ELK
Log monitoring: From a third-party solution to ELK based
AWS
S3
ElasticSearch cluster
web & worker nodes
with filebeat
logstash
Dashboards
on
Kibana
Daily indices
logs
Growing with elastic search
Analytics pipeline with ELK stack
Recommendation
engine
Web Application
ElasticSearch cluster
web nodes
with filebeat
logstash
User activity
Kibana
Dashboards
Daily indices
Growing with elastic search
Handling growth
● enable slow query log, customizable per index
Search performance tuning
● Avoid nested documents, if you can
Document modelling
● Deep pagination is costly with search API
Use scroll API where applicable
● POST /unused_index/_close
● POST /index_with_more_segments/_forcemerge
● Use _rollover API to let hot/recent indexes use best servers
Manage your indexes
● Disable indexing, storing, norms, _source when you don’t need
● Use smallest numeric data or make it keyword
● Optimize number of primary shards
● Use bulk requests, optimize their size
Index performance tuning
Summary
● Elastic stack is growing and improving - see if it fits your needs
● Defaults are good only to start - know what they are and tune them
● Different indexes for different data
● Understand your needs and model your documents well
Thank You!
@asldevi

More Related Content

PDF
Jinchao demo
Jinchao Lin
 
PPTX
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
PDF
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
PDF
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
PDF
SAS integration with NoSQL data
Kevin Lee
 
PDF
Presto: Fast SQL on Everything
David Phillips
 
PPTX
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
Shawn Jones
 
PDF
Whowas: History of resources at APNIC
APNIC
 
Jinchao demo
Jinchao Lin
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Tracking data lineage at Stitch Fix
Stitch Fix Algorithms
 
A compute infrastructure for data scientists
Stitch Fix Algorithms
 
SAS integration with NoSQL data
Kevin Lee
 
Presto: Fast SQL on Everything
David Phillips
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
Shawn Jones
 
Whowas: History of resources at APNIC
APNIC
 

What's hot (20)

PDF
Online Model Updating with Spark Streaming
Keira Zhou
 
PDF
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Databricks
 
PPTX
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
PPTX
Microsoft Machine Learning Smackdown
Lynn Langit
 
PPTX
Graphql
Girish Talekar
 
PPT
Mindtalk Tech - Behind the scenes
robin_sy
 
PDF
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
PPTX
Spline 2 - Vision and Architecture Overview
Vaclav Kosar
 
PPTX
Logs, metrics and real time data analytics
Ewere Diagboya
 
PPTX
The IoT and big data
Gal Ben-Haim
 
PDF
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
PPTX
DevOps, Yet Another IT Revolution
Richard Langlois P. Eng.
 
PPTX
KD-2013-Optimizing-Document-Search-using-Lucene
Harshakumar Ummerpillai
 
PDF
Designing Data-Intensive Applications
Oleg Mürk
 
PPTX
Presto for apps deck varada prestoconf
Ori Reshef
 
PPTX
Finding new Customers using D&B and Excel Power Query
Lynn Langit
 
PPTX
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
PPTX
Visualizing large datasets with elasticsearch and kibana
Dan Fey
 
PDF
Search Engine Working Technology
Vidco Digital
 
PPTX
Azure Functions & Serverless Computing
Abhimanyu Singhal
 
Online Model Updating with Spark Streaming
Keira Zhou
 
Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platfor...
Databricks
 
Virtualizing Analytics with Apache Spark: Keynote by Arsalan Tavakoli
Spark Summit
 
Microsoft Machine Learning Smackdown
Lynn Langit
 
Mindtalk Tech - Behind the scenes
robin_sy
 
Unlocking Value in Device Data Using Spark: Spark Summit East talk by John La...
Spark Summit
 
Spline 2 - Vision and Architecture Overview
Vaclav Kosar
 
Logs, metrics and real time data analytics
Ewere Diagboya
 
The IoT and big data
Gal Ben-Haim
 
MongoDB .local Houston 2019: Building an IoT Streaming Analytics Platform to ...
MongoDB
 
DevOps, Yet Another IT Revolution
Richard Langlois P. Eng.
 
KD-2013-Optimizing-Document-Search-using-Lucene
Harshakumar Ummerpillai
 
Designing Data-Intensive Applications
Oleg Mürk
 
Presto for apps deck varada prestoconf
Ori Reshef
 
Finding new Customers using D&B and Excel Power Query
Lynn Langit
 
CouchbasetoHadoop_Matt_Michael_Justin v4
Michael Kehoe
 
Visualizing large datasets with elasticsearch and kibana
Dan Fey
 
Search Engine Working Technology
Vidco Digital
 
Azure Functions & Serverless Computing
Abhimanyu Singhal
 
Ad

Similar to Growing with elastic search (20)

PDF
Introduction to elasticsearch
pmanvi
 
DOCX
Prashant_Agrawal_CV
Prashant Agrawal
 
PDF
Meetup070416 Presentations
Ana Rebelo
 
PPTX
An Intro to Elasticsearch and Kibana
ObjectRocket
 
PDF
Roaring with elastic search sangam2018
Vinay Kumar
 
PPTX
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
PDF
Getting Started with Elasticsearch
Alibaba Cloud
 
PDF
Isolating Streaming Ingest and Queries Using RocksDB
HostedbyConfluent
 
PPT
Configuring elasticsearch for performance and scale
Bharvi Dixit
 
PPTX
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA
 
PPTX
Elastic & Azure & Episever, Case Evira
Mikko Huilaja
 
PDF
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
PDF
AWS Big Data in everyday use at Yle
Rolf Koski
 
PDF
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Amazon Web Services Korea
 
PPTX
Elastic search overview
ABC Talks
 
PDF
Real time analytics on deep learning @ strata data 2019
Zhenxiao Luo
 
PDF
Explore Elasticsearch and Why It’s Worth Using
Inexture Solutions
 
PPTX
Apache Solr vs Oracle Endeca
Pedro Melo Pereira
 
PDF
Enhancing SEO Efficiency Using Python in 2025
Abbas Kashefi
 
PPTX
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Introduction to elasticsearch
pmanvi
 
Prashant_Agrawal_CV
Prashant Agrawal
 
Meetup070416 Presentations
Ana Rebelo
 
An Intro to Elasticsearch and Kibana
ObjectRocket
 
Roaring with elastic search sangam2018
Vinay Kumar
 
Visualizing Austin's data with Elasticsearch and Kibana
ObjectRocket
 
Getting Started with Elasticsearch
Alibaba Cloud
 
Isolating Streaming Ingest and Queries Using RocksDB
HostedbyConfluent
 
Configuring elasticsearch for performance and scale
Bharvi Dixit
 
Data Con LA 2022 - Pre- Recorded - OpenSearch: Everything You Need to Know Ab...
Data Con LA
 
Elastic & Azure & Episever, Case Evira
Mikko Huilaja
 
Sparking up Data Engineering: Spark Summit East talk by Rohan Sharma
Spark Summit
 
AWS Big Data in everyday use at Yle
Rolf Koski
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
Amazon Web Services Korea
 
Elastic search overview
ABC Talks
 
Real time analytics on deep learning @ strata data 2019
Zhenxiao Luo
 
Explore Elasticsearch and Why It’s Worth Using
Inexture Solutions
 
Apache Solr vs Oracle Endeca
Pedro Melo Pereira
 
Enhancing SEO Efficiency Using Python in 2025
Abbas Kashefi
 
Multi Source Data Analysis using Spark and Tellius
datamantra
 
Ad

Recently uploaded (20)

PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Software Development Methodologies in 2025
KodekX
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Doc9.....................................
SofiaCollazos
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
The Future of Artificial Intelligence (AI)
Mukul
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 

Growing with elastic search

  • 1. Growing with ElasticSearch Devi A S L @ RootConf 11th May, 2018
  • 2. About me ● Over a decade of experience in building software ● Lead developer/Architect at PowerToFly
  • 3. Our journey with ElasticSearch 2014: launched with Postgres Full text search 2015: Faceted Search with ES v1.4 2016: Log monitoring system with ELK 2.3 2017: Analytics pipeline with ELK 5.5
  • 4. Search for a search engine Postgres v9.3 Sphinx v2.1 Solr v4.x ElasticSearch v1.4 Full text search ✓ ✓ ✓ ✓ Support for facets ❌ ✓ ✓ ✓ Cluster ready ❌ ❌ Limited ✓ Search in PDFs ❌ ❌ ✓ ✓ REST APIs ❌ ❌ ❌ ✓ Nested docs, Parent-Child relations ❌ NA Limited ✓ Powerful and Flexible Query DSL ❌ NA ❌ ✓
  • 5. distributed, multitenant-capable, full-text search engine. ● Built upon battle tested Lucene ● Powerful and flexible Query DSL ● Powerful Aggregations ● REST APIs for everything ● Ease with nested documents and parent-child relationships ● Suitable eco system for data pipelines The goodness of ElasticSearch
  • 7. What sits where ? Internet Search Service ES cluster Periodic Indexing job Postgres DB Primary datastore for core data jobs, candidates data
  • 9. Log monitoring: From a third-party solution to ELK based AWS S3 ElasticSearch cluster web & worker nodes with filebeat logstash Dashboards on Kibana Daily indices logs
  • 12. Recommendation engine Web Application ElasticSearch cluster web nodes with filebeat logstash User activity Kibana Dashboards Daily indices
  • 15. ● enable slow query log, customizable per index Search performance tuning
  • 16. ● Avoid nested documents, if you can Document modelling
  • 17. ● Deep pagination is costly with search API Use scroll API where applicable
  • 18. ● POST /unused_index/_close ● POST /index_with_more_segments/_forcemerge ● Use _rollover API to let hot/recent indexes use best servers Manage your indexes
  • 19. ● Disable indexing, storing, norms, _source when you don’t need ● Use smallest numeric data or make it keyword ● Optimize number of primary shards ● Use bulk requests, optimize their size Index performance tuning
  • 20. Summary ● Elastic stack is growing and improving - see if it fits your needs ● Defaults are good only to start - know what they are and tune them ● Different indexes for different data ● Understand your needs and model your documents well