SlideShare a Scribd company logo
Apache®, Apache Druid®, Druid®, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.
peter.marshall@imply.io
20 years in Enterprise Architecture
CRM, EDRM, ERP, EIP, Digital Services,
Security, BI, RI, and MDM
BA Theology (!) and Computer Studies
TOGAF certified
Book collector & A/V buyer
Prime Timeline = proper timeline
#werk
petermarshall.io
What the collaborations do
Some principles
Each collaboration
Wrap-up
Questions!
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
 Apache Druid®: A Dance of Distributed Processes
● A job to do
● Compute to do the ingestion
● A place to store optimised data
● A question to answer
● Data to process!
● Compute to answer queries
● Somewhere to put the data that’s
near to the query process
● Some rules to follow
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
https://www.archimatetool.com/
What the collaborations do
Some principles
Each collaboration
Wrap-up
Questions!
Zookeeper
Overlord
Ingest
Spec
Task
Log
S3, GCS, ABS, HFDS, HTTP, Local, MySql, Postgres, Druid
Kafka Kinesis
Middle Manager
(Indexer)
Zookeeper
Overlord
Middle Manager
(Indexer)
Deep Store
Zookeeper
Overlord
Middle Manager
(Indexer) columnarise
index & encode
time-shardDeep Store
Deep Store
Metadata
Store
Zookeeper
Overlord
Middle Manager
(Indexer)
Metadata
Store
Broker
Middle Manager
(Indexer)
Overlord
Deep Store
Zookeeper
SQL
and
Native
Metadata
Store
Zookeeper
Broker
Middle Manager
(Indexer)
Overlord
Deep Store
Metadata
Store
Zookeeper
Broker
Middle Manager
(Indexer)
Overlord
Deep Store Historical
Middle Manager
(Indexer)
Metadata
Store
Zookeeper
Historical
BrokerOverlord
Deep Store
OverlordOverlordCoordinator
Metadata
Store
Historical
Zookeeper
BrokerOverlord
Middle Manager
(Indexer)
Deep Store
OverlordOverlordCoordinator
Metadata
Store
Historical
Zookeeper
BrokerOverlord
Middle Manager
(Indexer)
Deep Store
Zookeeper
Historical
OverlordOverlordCoordinator
Metadata
Store
Historical
BrokerOverlord
Middle Manager
(Indexer)
Deep Store
Historical
Zookeeper
Historical
Historical
OverlordOverlordCoordinator
Metadata
Store
Historical
BrokerOverlord
Middle Manager
(Indexer)
Deep Store
 Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes
What the collaborations do
Some principles
Each collaboration
Wrap-up
Questions!
 Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes
https://www.archimatetool.com/
Zookeeper
Coordinator
Overlord
Broker
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
★ A job to do
★ Compute to do the ingestion
★ A place to store optimised data
★ A question to answer
★ Data to process!
★ Compute to answer queries
★ Somewhere to put the data that’s
near to the query process
★ Some rules to follow
Query
Distributed execution of SQL / Druid
Native queries on the cluster
Ingestion
Ingestion tasks that bring data into Druid
from storage and delivery services
Distribution
Replication and distribution of the
ingested data according to rules
http://druid.apache.org
Imply Distribution
https://imply.io/get-started
@druidio
Add Apache Druid
as a skill
Apache Distribution
https://github.com/apache/druid
ASF Slack
#druid
Druid Community
https://druid.apache.org/community/
Meetup Groups
https://www.meetup.com/pro/apache-druid/
Google Groups Druid User Forum
https://groups.google.com/
What the collaborations do
Some principles
Each collaboration
Wrap-up
Questions!

More Related Content

What's hot (20)

PDF
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
PPTX
Scaling Data Quality @ Netflix
Michelle Ufford
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Spark shuffle introduction
colorant
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PDF
The Apache Spark File Format Ecosystem
Databricks
 
PDF
Hyperspace for Delta Lake
Databricks
 
PDF
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
PPTX
Building a modern data warehouse
James Serra
 
PDF
Neo4j 4.1 overview
Neo4j
 
PPTX
Data platform modernization with Databricks.pptx
CalvinSim10
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
PDF
Democratizing Data at Airbnb
Neo4j
 
PPTX
Netflix viewing data architecture evolution - QCon 2014
Philip Fisher-Ogden
 
PPTX
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
PPTX
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
PPTX
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
PDF
Observability for Data Pipelines With OpenLineage
Databricks
 
OSA Con 2022 - Apache Iceberg_ An Architectural Look Under the Covers - Alex ...
Altinity Ltd
 
Scaling Data Quality @ Netflix
Michelle Ufford
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Spark shuffle introduction
colorant
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
The Apache Spark File Format Ecosystem
Databricks
 
Hyperspace for Delta Lake
Databricks
 
ETL Made Easy with Azure Data Factory and Azure Databricks
Databricks
 
Building a modern data warehouse
James Serra
 
Neo4j 4.1 overview
Neo4j
 
Data platform modernization with Databricks.pptx
CalvinSim10
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Democratizing Data at Airbnb
Neo4j
 
Netflix viewing data architecture evolution - QCon 2014
Philip Fisher-Ogden
 
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
Observability for Data Pipelines With OpenLineage
Databricks
 

Similar to Apache Druid®: A Dance of Distributed Processes (20)

PDF
Druid: Under the Covers (Virtual Meetup)
Imply
 
PDF
August meetup - All about Apache Druid
Imply
 
PDF
Druid Adoption Tips and Tricks
Imply
 
PDF
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
PPTX
Understanding apache-druid
Suman Banerjee
 
PDF
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
PDF
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
HostedbyConfluent
 
PPTX
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
confluent
 
PDF
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
confluent
 
PDF
Druid meetup @ Netflix (11/14/2018 )
Jaebin Yoon
 
PPTX
The of Operational Analytics Data Store
Rommel Garcia
 
PPTX
Batch to near-realtime: inspired by a real production incident
Shivji Kumar Jha
 
PDF
OSN_2022.pdf
Neil Buesing
 
PPTX
Apache Druid Design and Future prospect
c-bslim
 
PDF
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
PDF
Game Analytics at London Apache Druid Meetup
Jelena Zanko
 
PPTX
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
PPTX
Druid Overview by Rachel Pedreschi
Brian Olsen
 
PDF
Druid
Dori Waldman
 
Druid: Under the Covers (Virtual Meetup)
Imply
 
August meetup - All about Apache Druid
Imply
 
Druid Adoption Tips and Tricks
Imply
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
Understanding apache-druid
Suman Banerjee
 
A Day in the Life of a Druid Implementor and Druid's Roadmap
Itai Yaffe
 
Don’t Forget About Your Past—Optimizing Apache Druid Performance With Neil Bu...
HostedbyConfluent
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
confluent
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
confluent
 
Druid meetup @ Netflix (11/14/2018 )
Jaebin Yoon
 
The of Operational Analytics Data Store
Rommel Garcia
 
Batch to near-realtime: inspired by a real production incident
Shivji Kumar Jha
 
OSN_2022.pdf
Neil Buesing
 
Apache Druid Design and Future prospect
c-bslim
 
Fast analytics kudu to druid
Worapol Alex Pongpech, PhD
 
Game Analytics at London Apache Druid Meetup
Jelena Zanko
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
DataWorks Summit
 
Druid Overview by Rachel Pedreschi
Brian Olsen
 
Ad

More from Imply (15)

PPTX
Pivot 2.0 - The next generation visualization tool for your streaming data
Imply
 
PDF
Druid in Spot Instances
Imply
 
PDF
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Imply
 
PDF
Nielsen: Casting the Spell - Druid in Practice
Imply
 
PDF
Building Data Applications with Apache Druid
Imply
 
PDF
Maximizing Apache Druid performance: Beyond the basics
Imply
 
PDF
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Imply
 
PDF
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
PDF
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
Imply
 
PDF
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Imply
 
PDF
Benchmarking Apache Druid
Imply
 
PPTX
Why data warehouses cannot support hot analytics
Imply
 
PDF
What’s New in Imply 3.3 & Apache Druid 0.18
Imply
 
PDF
Apache Druid Vision and Roadmap
Imply
 
PDF
Analytics over Terabytes of Data at Twitter
Imply
 
Pivot 2.0 - The next generation visualization tool for your streaming data
Imply
 
Druid in Spot Instances
Imply
 
Zeotap: Data Modeling in Druid for Non temporal and Nested Data
Imply
 
Nielsen: Casting the Spell - Druid in Practice
Imply
 
Building Data Applications with Apache Druid
Imply
 
Maximizing Apache Druid performance: Beyond the basics
Imply
 
How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experi...
Imply
 
Building an Enterprise-Scale Dashboarding/Analytics Platform Powered by the C...
Imply
 
How TrafficGuard uses Druid to Fight Ad Fraud and Bots
Imply
 
Apache Druid: Lightning Fast Analytics on Real-time and Historical Data (Atla...
Imply
 
Benchmarking Apache Druid
Imply
 
Why data warehouses cannot support hot analytics
Imply
 
What’s New in Imply 3.3 & Apache Druid 0.18
Imply
 
Apache Druid Vision and Roadmap
Imply
 
Analytics over Terabytes of Data at Twitter
Imply
 
Ad

Recently uploaded (20)

PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Python basic programing language for automation
DanialHabibi2
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 

Apache Druid®: A Dance of Distributed Processes