SlideShare a Scribd company logo
ELASTICSEARCH –
SCALABILITY AND
MULTITENANCY
Bozhidar Bozhanov
ABOUT ME
• Founder at LogSentinel, an information security startup
• LogSentinel SIEM – product that indexes billions of logs with Elasticsearch
• https://techblog.bozho.net
• https://twitter.com/bozhobg
SCALABILITY AND MULTITENANCY
• Scalability – how to process millions (billions) of documents on multiple machines
• Multitenancy – how to have our system support multiple users/organizations while
segregating their data
• One can exist without the other
• Both are architectural and implementation tasks, not (just) work for Ops.
• „We’ ll push the data in whatever form and Ops will take care of the scaling “
ELASTICSEARCH BSICS
• “You know, for search”
• Indexing documents (document = anything)
• Full-text search and keyword search
• Allows for large clusters
• Licensing issues
USE-CASE: TIME-SERIES DATA
• Indexing events (logs, metrics, etc.)
• Wide-spread and widely applicable scenario
• Documents almost always have a timestamp
SHARDS
ZOOM-IN
LIMITING FACTORS
• One shard shouldn’t be to large
• Ideally between 10 and 50 GB; otherwise recovery after failure may not work
• The number of shards on a node is limited by RAM
• Lucene segments are append-only
• A large number of segments reduce performance
MULTITENANCY
• Cluster-per-tenant
• Heavy for administrations
• No real multitenancy
• Expensive
• Index-per-tenant
• Also heave for administration
• Doesn’t scale well
• Tenant-based routing
• Recommended in most cases
TENANT-BASED ROUTING
• _routing=<tenantId> or _routing=<tenantOwnedResourceId>
• E.g.. userId or dataSourceId
• Routing parameter designates which shard to be used for storing the document
• _routing for search requests tells Elasticsearch where to look for the data =>
faster search
• shard_num = hash(_routing) % num_primary_shards
• mappings._routing.required: true
STRUCTURE OF INDEXED DATA
• One field can have only one type
• The type is determined on index creation or on first indexed document with that
field
• User1 creates custom param “duration” of type String
• User2 wants to create “duration” of a numeric type -> error
• Solution: custom parameter hierarchies by type: params, numericParams,
dateParams, …
SCALABILITY
• „We add more machines and it’s good“?
• Recommended shard size (10-50 GB)
• We can’t change shards on a running index
• Lucene Segments are read-only:
• Deleting a document = bad
• Updating a document = bad
OPTIONS FOR STRUCTURING INDEXES
• We need a structure to allow indexing and searching in an arbitrarily large amount
of data
• One big, ever-growing index
• Convenient for small amounts of data, but faces all scalability problems
• Index-per-day / index-per-week / index-per-size
• Index-per-day-per-retention
• Rollover
• Deletion should be done by deleting whole indexes, not individual documents
MANY INDEXES FOR SEARCH, ONE FOR
INDEXING
• One search query can be directed to many indexes based on an index alias
• Supporting one (or several) active indexes for ingesting documents
• All other indexes– read-only
• This solves the problem with:
• Growing data and growing size of shards
• Deleting old data
EFFECTIVE INDEXING
• In real time (problem: too many requests to Elasticsearch)
• Storing in a database and indexing with a batch job
• Message queue (complex to implement) (we use Kafka)
• In-memory queue (might lose data)
• Batch-indexing when a given size or time threshold is reached
• Hybrid: bulk processing + database
• Quick indexing with in-memory queue + subsequent check based on the data in the database
• Avoid updates (=delete + insert)
CONCLUSION
• Elasticsearch is easy to get running
• …and complex for scaling
• Changes to a production setup are hard
• We must not throw scalability and multitenancy tasks to the Ops teams – they are
application problems
• Elasticsearch internals impose unintuitive limitations (“The law of leaky
abstractions”)
THANK YOU
Contacts: https://www.linkedin.com/in/bozhi
dar-bozhanov/
https://techblog.bozho.net
https://twitter.com/bozhobg
RESOURCES
• https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html
• https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/
• https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/
• https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-
speed.html
• https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/
• https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/

More Related Content

What's hot (20)

PDF
Semi Structured Data
MariaDB plc
 
PPTX
Securing Passwords
Mandeep Singh
 
PPTX
Big Data Overview Part 1
William Simms
 
PDF
Active directory 101
Utkarsh Agrawal
 
PDF
Securing data and preventing data breaches
MariaDB plc
 
PDF
MongoDB meetup at Hike
Bharvi Dixit
 
PPTX
Market Trends in Microsoft Azure
GlobalLogic Ukraine
 
PDF
Fast, Powerful and Scalable Analytics
MariaDB plc
 
PPTX
Elasticsearch tuning
NIKHIL DUBEY
 
PPTX
Introduction to Fauna
alialaei7
 
PDF
Building Advanced RESTFul services
Ortus Solutions, Corp
 
PPTX
FaunaDB security
alialaei7
 
PPTX
Internet of Things Cologne 2015: MongoDB Technical Presentation
MongoDB
 
PPTX
Test driving Azure Search and DocumentDB
Andrew Siemer
 
PPTX
Building enterprise records management solutions for share point 2010
Eric Shupps
 
PPTX
Securing private keys
Ahsan Habib
 
PPTX
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
PDF
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
NoSQLmatters
 
PDF
Análisis del roadmap del Elastic Stack
Elasticsearch
 
PDF
Getting Started with SQLite
Mindfire Solutions
 
Semi Structured Data
MariaDB plc
 
Securing Passwords
Mandeep Singh
 
Big Data Overview Part 1
William Simms
 
Active directory 101
Utkarsh Agrawal
 
Securing data and preventing data breaches
MariaDB plc
 
MongoDB meetup at Hike
Bharvi Dixit
 
Market Trends in Microsoft Azure
GlobalLogic Ukraine
 
Fast, Powerful and Scalable Analytics
MariaDB plc
 
Elasticsearch tuning
NIKHIL DUBEY
 
Introduction to Fauna
alialaei7
 
Building Advanced RESTFul services
Ortus Solutions, Corp
 
FaunaDB security
alialaei7
 
Internet of Things Cologne 2015: MongoDB Technical Presentation
MongoDB
 
Test driving Azure Search and DocumentDB
Andrew Siemer
 
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Securing private keys
Ahsan Habib
 
Survey of the Microsoft Azure Data Landscape
Ike Ellis
 
Sebastian Cohnen – Building a Startup with NoSQL - NoSQL matters Barcelona 2014
NoSQLmatters
 
Análisis del roadmap del Elastic Stack
Elasticsearch
 
Getting Started with SQLite
Mindfire Solutions
 

Similar to Elasticsearch - Scalability and Multitenancy (20)

PDF
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
PDF
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
PPTX
Elasticsearch - DevNexus 2015
Roy Russo
 
PPTX
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
PDF
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
PPTX
quick intro to elastic search
medcl
 
ODP
Deep Dive Into Elasticsearch
Knoldus Inc.
 
ODP
Elasticsearch for beginners
Neil Baker
 
PPTX
About elasticsearch
Minsoo Jun
 
PPTX
Search and analyze your data with elasticsearch
Anton Udovychenko
 
PDF
Is your Elastic Cluster Stable and Production Ready?
DoiT International
 
PDF
SFScon19 - Martin Malfertheiner - Writing to ElasticSearch
South Tyrol Free Software Conference
 
PPTX
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
PPTX
Elastic search
Binit Pathak
 
PPTX
Elasticsearch
Divij Sehgal
 
ODP
Elasticsearch selected topics
Cube Solutions
 
PPTX
Perl and Elasticsearch
Dean Hamstead
 
PPTX
Elasticsearch an overview
Amit Juneja
 
PPSX
Elasticsearch - basics and beyond
Ernesto Reig
 
PPTX
Elastic pivorak
Pivorak MeetUp
 
Elasticsearch, a distributed search engine with real-time analytics
Tiziano Fagni
 
Elasticsearch speed is key
Enterprise Search Warsaw Meetup
 
Elasticsearch - DevNexus 2015
Roy Russo
 
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
JavaCro'15 - Elasticsearch as a search alternative to a relational database -...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
quick intro to elastic search
medcl
 
Deep Dive Into Elasticsearch
Knoldus Inc.
 
Elasticsearch for beginners
Neil Baker
 
About elasticsearch
Minsoo Jun
 
Search and analyze your data with elasticsearch
Anton Udovychenko
 
Is your Elastic Cluster Stable and Production Ready?
DoiT International
 
SFScon19 - Martin Malfertheiner - Writing to ElasticSearch
South Tyrol Free Software Conference
 
Elasticsearch, Logstash, Kibana. Cool search, analytics, data mining and more...
Oleksiy Panchenko
 
Elastic search
Binit Pathak
 
Elasticsearch
Divij Sehgal
 
Elasticsearch selected topics
Cube Solutions
 
Perl and Elasticsearch
Dean Hamstead
 
Elasticsearch an overview
Amit Juneja
 
Elasticsearch - basics and beyond
Ernesto Reig
 
Elastic pivorak
Pivorak MeetUp
 
Ad

More from Bozhidar Bozhanov (20)

PPTX
Откриване на фалшиви клетки за подслушване
Bozhidar Bozhanov
 
PPTX
Wiretap Detector - detecting cell-site simulators
Bozhidar Bozhanov
 
PPTX
Антикорупционен софтуер
Bozhidar Bozhanov
 
PDF
Nothing is secure.pdf
Bozhidar Bozhanov
 
PPTX
Blockchain overview - types, use-cases, security and usabilty
Bozhidar Bozhanov
 
PPTX
Електронна държава
Bozhidar Bozhanov
 
PPTX
Blockchain - what is it good for?
Bozhidar Bozhanov
 
PPTX
Algorithmic and technological transparency
Bozhidar Bozhanov
 
PPTX
Scaling horizontally on AWS
Bozhidar Bozhanov
 
PDF
Alternatives for copyright protection online
Bozhidar Bozhanov
 
PPTX
GDPR for developers
Bozhidar Bozhanov
 
PPTX
Политики, основани на данни
Bozhidar Bozhanov
 
PDF
Отворено законодателство
Bozhidar Bozhanov
 
PPTX
Overview of Message Queues
Bozhidar Bozhanov
 
PPTX
Electronic governance steps in the right direction?
Bozhidar Bozhanov
 
PPTX
Сигурност на електронното управление
Bozhidar Bozhanov
 
PPTX
Opensource government
Bozhidar Bozhanov
 
PDF
Биометрична идентификация
Bozhidar Bozhanov
 
PDF
Biometric identification
Bozhidar Bozhanov
 
PPTX
Регулации и технологии
Bozhidar Bozhanov
 
Откриване на фалшиви клетки за подслушване
Bozhidar Bozhanov
 
Wiretap Detector - detecting cell-site simulators
Bozhidar Bozhanov
 
Антикорупционен софтуер
Bozhidar Bozhanov
 
Nothing is secure.pdf
Bozhidar Bozhanov
 
Blockchain overview - types, use-cases, security and usabilty
Bozhidar Bozhanov
 
Електронна държава
Bozhidar Bozhanov
 
Blockchain - what is it good for?
Bozhidar Bozhanov
 
Algorithmic and technological transparency
Bozhidar Bozhanov
 
Scaling horizontally on AWS
Bozhidar Bozhanov
 
Alternatives for copyright protection online
Bozhidar Bozhanov
 
GDPR for developers
Bozhidar Bozhanov
 
Политики, основани на данни
Bozhidar Bozhanov
 
Отворено законодателство
Bozhidar Bozhanov
 
Overview of Message Queues
Bozhidar Bozhanov
 
Electronic governance steps in the right direction?
Bozhidar Bozhanov
 
Сигурност на електронното управление
Bozhidar Bozhanov
 
Opensource government
Bozhidar Bozhanov
 
Биометрична идентификация
Bozhidar Bozhanov
 
Biometric identification
Bozhidar Bozhanov
 
Регулации и технологии
Bozhidar Bozhanov
 
Ad

Recently uploaded (20)

PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
July Patch Tuesday
Ivanti
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 

Elasticsearch - Scalability and Multitenancy

  • 2. ABOUT ME • Founder at LogSentinel, an information security startup • LogSentinel SIEM – product that indexes billions of logs with Elasticsearch • https://techblog.bozho.net • https://twitter.com/bozhobg
  • 3. SCALABILITY AND MULTITENANCY • Scalability – how to process millions (billions) of documents on multiple machines • Multitenancy – how to have our system support multiple users/organizations while segregating their data • One can exist without the other • Both are architectural and implementation tasks, not (just) work for Ops. • „We’ ll push the data in whatever form and Ops will take care of the scaling “
  • 4. ELASTICSEARCH BSICS • “You know, for search” • Indexing documents (document = anything) • Full-text search and keyword search • Allows for large clusters • Licensing issues
  • 5. USE-CASE: TIME-SERIES DATA • Indexing events (logs, metrics, etc.) • Wide-spread and widely applicable scenario • Documents almost always have a timestamp
  • 8. LIMITING FACTORS • One shard shouldn’t be to large • Ideally between 10 and 50 GB; otherwise recovery after failure may not work • The number of shards on a node is limited by RAM • Lucene segments are append-only • A large number of segments reduce performance
  • 9. MULTITENANCY • Cluster-per-tenant • Heavy for administrations • No real multitenancy • Expensive • Index-per-tenant • Also heave for administration • Doesn’t scale well • Tenant-based routing • Recommended in most cases
  • 10. TENANT-BASED ROUTING • _routing=<tenantId> or _routing=<tenantOwnedResourceId> • E.g.. userId or dataSourceId • Routing parameter designates which shard to be used for storing the document • _routing for search requests tells Elasticsearch where to look for the data => faster search • shard_num = hash(_routing) % num_primary_shards • mappings._routing.required: true
  • 11. STRUCTURE OF INDEXED DATA • One field can have only one type • The type is determined on index creation or on first indexed document with that field • User1 creates custom param “duration” of type String • User2 wants to create “duration” of a numeric type -> error • Solution: custom parameter hierarchies by type: params, numericParams, dateParams, …
  • 12. SCALABILITY • „We add more machines and it’s good“? • Recommended shard size (10-50 GB) • We can’t change shards on a running index • Lucene Segments are read-only: • Deleting a document = bad • Updating a document = bad
  • 13. OPTIONS FOR STRUCTURING INDEXES • We need a structure to allow indexing and searching in an arbitrarily large amount of data • One big, ever-growing index • Convenient for small amounts of data, but faces all scalability problems • Index-per-day / index-per-week / index-per-size • Index-per-day-per-retention • Rollover • Deletion should be done by deleting whole indexes, not individual documents
  • 14. MANY INDEXES FOR SEARCH, ONE FOR INDEXING • One search query can be directed to many indexes based on an index alias • Supporting one (or several) active indexes for ingesting documents • All other indexes– read-only • This solves the problem with: • Growing data and growing size of shards • Deleting old data
  • 15. EFFECTIVE INDEXING • In real time (problem: too many requests to Elasticsearch) • Storing in a database and indexing with a batch job • Message queue (complex to implement) (we use Kafka) • In-memory queue (might lose data) • Batch-indexing when a given size or time threshold is reached • Hybrid: bulk processing + database • Quick indexing with in-memory queue + subsequent check based on the data in the database • Avoid updates (=delete + insert)
  • 16. CONCLUSION • Elasticsearch is easy to get running • …and complex for scaling • Changes to a production setup are hard • We must not throw scalability and multitenancy tasks to the Ops teams – they are application problems • Elasticsearch internals impose unintuitive limitations (“The law of leaky abstractions”)
  • 18. RESOURCES • https://www.elastic.co/guide/en/elasticsearch/reference/current/size-your-shards.html • https://techblog.bozho.net/elasticsearch-multitenancy-with-routing/ • https://techblog.bozho.net/near-real-time-indexing-with-elasticsearch/ • https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing- speed.html • https://www.loggly.com/blog/nine-tips-configuring-elasticsearch-for-high-performance/ • https://tech.ebayinc.com/engineering/elasticsearch-performance-tuning-practice-at-ebay/