SlideShare a Scribd company logo
@allenxwang
Multi-cluster, Multi-tenant and
Hierarchical Kafka Messaging Service
Allen Wang
Growing Pains for A Kafka Cluster
● A few brokers, handful topics, tens of partitions
○ Wonderful!
● Tens of brokers, tens of topics, hundreds of
partitions
○ Life is good!
● A hundred brokers, a hundred topics, thousands of
partitions
○ … OK
● Hundreds of brokers, hundreds of topics, one
hundred thousand partitions
○ ???
Why Huge Kafka Cluster Does Not Work
● Significant time increase on operations
○ Rolling binary update
■ Three minutes per broker, 500 brokers = 1 whole day
○ Rolling AMI (image) update with data copying
■ One hour per broker, 500 brokers = 20 days
● Increased latency due to number of partitions
○ https://www.confluent.io/blog/how-to-choose-the-number
-of-topicspartitions-in-a-kafka-cluster/
● Vulnerability to ZK/Controller failures
Scaling and Data Balancing Challenge
● The problem with partition reassignment
○ Time consuming
○ Replication traffic taking bandwidth
○ Complexity of bin packing for data balancing
The Consumer Fan-out Problem
BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn
● A single cluster may easily fit for bytes in, but not
necessarily for bytes out
Solve Consumer Fan-out with Hierarchies
Inevitability of Multi-cluster
The Idea
● Create many small and mostly “immutable”
clusters
● Organize them in a topology with routing service
connecting the clusters
Multi-Cluster Kafka Service At Netflix
Router
(w/ simple ETL)
Fronting
Kafka
Event
Producer
Consumer
Kafka
Management
HTTP
PROXY
Consumers
Multi-cluster Producers
● Support producing to multiple clusters at the same
time
● High level producer API implemented by multiple
embedded Kafka producers
public interface KsProducer<V> {
// ...
<T extends V> CompletableFuture<SendResult> send(T obj)
}
● Dynamic topic to cluster mapping
○ Enabled by NetflixOSS/Archaius
"t1, t2" : {
"where" : [{
"sink" : "fronting-kafka-1"
}]
},
"t3" : {
"where" : [{
"sink" : "fronting-kafka-2"
}]
},
"__default__" : {
"where" : [ {
"sink" : "fronting-kafka-2"
}]
}
@Stream("foo") // send to topic “foo”
public class Foo {
// ...
}
@Stream("bar") // send to topic “bar”
public class Bar {
// ...
}
KsProducer<Object> producer = // …
producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic
producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
Fronting Kafka
● For data collection and buffering
● Optimized for producers
○ Only consumers are routers
Scaling of Fronting Kafka
● Creating / destroying Kafka clusters
○ E.g., create new topic on new clusters and update topic to
cluster mapping
● No partition reassignment
Data Balancing
● Assign the same number of partitions of any topic
to every brokers
○ E.g., for clusters of 12 brokers, create topics with partitions
of 12, 24, 36
○ Guaranteed even distribution of data (aside from
occasional leader imbalance)
● Balance data among clusters by moving topics
○ Must dynamically update topic to cluster mapping
Topic Move
RouterFronting
Kafka
Event
Producer
Consumer
Kafka
Create topic “foo”
Consumer
“foo”
“foo”
Consumer Kafka
● Scaling
○ Add brokers and partitions for small cluster for non-keyed
topics
○ Create same topics on a new cluster and move consumers
Future Plan
● Cross-cluster topic
○ load sharing beyond single cluster
○ Auto-scale
○ Consumer/producer support needed
Multi-Cluster Consumer (Ongoing work)
● Same Kafka consumer interface
● Consume from multiple clusters with dynamic
topic to cluster mapping
○ Keep subscription state
○ Receive mapping updates
○ Create and delegate to underlying Kafka consumer for each
associated cluster on the fly
Multi-Cluster Consumer Topic to Cluster Mapping and
Code Example
{
"foo": [
{"vip": "cluster1"},
{"vip": "cluster2"}
],
“bar”: [
{“vip”: “cluster2”}
]
}
// Create a multi-cluster consumer
Consumer<String, String> multiClusterConsumer = ...
// subscribe as usual and keep subscription state
consumer.subscribe(new ArrayList<String>(“foo”));
while (...) {
// fetch from both clusters for topic “foo” and
// return the aggregated records
ConsumerRecords<String, String> records =
multiClusterConsumer.poll(2000);
process(records);
}
Topic move for Multi-cluster Consumers
Multi-cluster Consumer
Producer
“foo”: “cluster1” “foo”: [“cluster1”]
“foo”: “cluster2”
“foo”: [“cluster1”, “cluster2”]
“foo”: [“cluster2”]
cluster1
cluster2
Our Vision
Producers
“foo”
“foo”
“bar”
“bar”
“bar”
Multi-cluster
Consumer
Advanced Consumer
Router
Fronting Kafka w/
Cross-cluster Topics
Consumer Kafka
Multi-cluster
Consumer
What About Keyed Messages
● Few topics requiring keyed messages in Netflix
● A word of caution for keyed messages
○ Inflexible/skewed load balancing
○ Difficult to scale
● Handling of keyed messages
○ Currently only produced by routers to consumer Kafka
○ Hard to guarantee message ordering in multi-cluster setting
○ Key-consumer affinity is guaranteed
Think Differently on Scaling Kafka
The “broker” way The “cluster” way
Scale up Add brokers Add clusters
Data balance Move partitions to
different brokers
Move/expand topics to
different clusters
Producer Produce to different
brokers at the same time
Produce to different clusters at
the same time
Consumer Consume from different
brokers at the same time
Consume from different
clusters at the same time
Thank You
https://medium.com/netflix-techblog
https://jobs.netflix.com/

More Related Content

What's hot (20)

PPTX
Apache Kafka at LinkedIn
Discover Pinterest
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Extending Flink SQL for stream processing use cases
Flink Forward
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
PDF
So You Want to Write a Connector?
confluent
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PPTX
Apache kafka
Long Nguyen
 
PPTX
Kafka replication apachecon_2013
Jun Rao
 
PPTX
Introduction to Kafka and Zookeeper
Rahul Jain
 
PPTX
Apache kafka
Viswanath J
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PPTX
No data loss pipeline with apache kafka
Jiangjie Qin
 
PDF
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
PPTX
kafka
Amikam Snir
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
Apache Kafka at LinkedIn
Discover Pinterest
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Apache Kafka - Martin Podval
Martin Podval
 
Extending Flink SQL for stream processing use cases
Flink Forward
 
Apache Kafka
Saroj Panyasrivanit
 
So You Want to Write a Connector?
confluent
 
From Zero to Hero with Kafka Connect
confluent
 
ksqlDB: A Stream-Relational Database System
confluent
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache kafka
Long Nguyen
 
Kafka replication apachecon_2013
Jun Rao
 
Introduction to Kafka and Zookeeper
Rahul Jain
 
Apache kafka
Viswanath J
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
No data loss pipeline with apache kafka
Jiangjie Qin
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
Introduction to Kafka Streams
Guozhang Wang
 

Similar to Multi cluster, multitenant and hierarchical kafka messaging service slideshare (20)

PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
PDF
I can't believe it's not a queue: Kafka and Spring
Joe Kutner
 
PDF
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
HostedbyConfluent
 
PDF
Enabling Data Scientists to easily create and own Kafka Consumers
Stefan Krawczyk
 
PDF
Updating materialized views and caches using kafka
Zach Cox
 
PPTX
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Uber Real Time Data Analytics
Ankur Bansal
 
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PDF
Kafka Workshop
Alexandre André
 
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
PPTX
Data Pipeline at Tapad
Toby Matejovsky
 
PDF
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PDF
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
PDF
Integration for real-time Kafka SQL
Amit Nijhawan
 
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
PDF
Follow the (Kafka) Streams
confluent
 
PDF
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
HostedbyConfluent
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
I can't believe it's not a queue: Kafka and Spring
Joe Kutner
 
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
HostedbyConfluent
 
Enabling Data Scientists to easily create and own Kafka Consumers
Stefan Krawczyk
 
Updating materialized views and caches using kafka
Zach Cox
 
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Uber Real Time Data Analytics
Ankur Bansal
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Kafka Workshop
Alexandre André
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Data Pipeline at Tapad
Toby Matejovsky
 
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
Introduction to apache kafka
Samuel Kerrien
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
Integration for real-time Kafka SQL
Amit Nijhawan
 
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
Follow the (Kafka) Streams
confluent
 
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
HostedbyConfluent
 
Ad

Recently uploaded (20)

PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Digital Circuits, important subject in CS
contactparinay1
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Ad

Multi cluster, multitenant and hierarchical kafka messaging service slideshare

  • 1. @allenxwang Multi-cluster, Multi-tenant and Hierarchical Kafka Messaging Service Allen Wang
  • 2. Growing Pains for A Kafka Cluster ● A few brokers, handful topics, tens of partitions ○ Wonderful! ● Tens of brokers, tens of topics, hundreds of partitions ○ Life is good!
  • 3. ● A hundred brokers, a hundred topics, thousands of partitions ○ … OK ● Hundreds of brokers, hundreds of topics, one hundred thousand partitions ○ ???
  • 4. Why Huge Kafka Cluster Does Not Work ● Significant time increase on operations ○ Rolling binary update ■ Three minutes per broker, 500 brokers = 1 whole day ○ Rolling AMI (image) update with data copying ■ One hour per broker, 500 brokers = 20 days
  • 5. ● Increased latency due to number of partitions ○ https://www.confluent.io/blog/how-to-choose-the-number -of-topicspartitions-in-a-kafka-cluster/ ● Vulnerability to ZK/Controller failures
  • 6. Scaling and Data Balancing Challenge ● The problem with partition reassignment ○ Time consuming ○ Replication traffic taking bandwidth ○ Complexity of bin packing for data balancing
  • 8. BytesOut = (numberOfConsumers + replicationFactor - 1) ✕ BytesIn ● A single cluster may easily fit for bytes in, but not necessarily for bytes out
  • 9. Solve Consumer Fan-out with Hierarchies
  • 11. The Idea ● Create many small and mostly “immutable” clusters ● Organize them in a topology with routing service connecting the clusters
  • 12. Multi-Cluster Kafka Service At Netflix Router (w/ simple ETL) Fronting Kafka Event Producer Consumer Kafka Management HTTP PROXY Consumers
  • 13. Multi-cluster Producers ● Support producing to multiple clusters at the same time ● High level producer API implemented by multiple embedded Kafka producers public interface KsProducer<V> { // ... <T extends V> CompletableFuture<SendResult> send(T obj) }
  • 14. ● Dynamic topic to cluster mapping ○ Enabled by NetflixOSS/Archaius "t1, t2" : { "where" : [{ "sink" : "fronting-kafka-1" }] }, "t3" : { "where" : [{ "sink" : "fronting-kafka-2" }] }, "__default__" : { "where" : [ { "sink" : "fronting-kafka-2" }] }
  • 15. @Stream("foo") // send to topic “foo” public class Foo { // ... } @Stream("bar") // send to topic “bar” public class Bar { // ... } KsProducer<Object> producer = // … producer.send(new Foo()); // Send to Kafka cluster which has “foo” topic producer.send(new Bar()); // Send to Kafka cluster which has “bar” topic
  • 16. Fronting Kafka ● For data collection and buffering ● Optimized for producers ○ Only consumers are routers
  • 17. Scaling of Fronting Kafka ● Creating / destroying Kafka clusters ○ E.g., create new topic on new clusters and update topic to cluster mapping ● No partition reassignment
  • 18. Data Balancing ● Assign the same number of partitions of any topic to every brokers ○ E.g., for clusters of 12 brokers, create topics with partitions of 12, 24, 36 ○ Guaranteed even distribution of data (aside from occasional leader imbalance) ● Balance data among clusters by moving topics ○ Must dynamically update topic to cluster mapping
  • 20. Consumer Kafka ● Scaling ○ Add brokers and partitions for small cluster for non-keyed topics ○ Create same topics on a new cluster and move consumers
  • 21. Future Plan ● Cross-cluster topic ○ load sharing beyond single cluster ○ Auto-scale ○ Consumer/producer support needed
  • 22. Multi-Cluster Consumer (Ongoing work) ● Same Kafka consumer interface ● Consume from multiple clusters with dynamic topic to cluster mapping ○ Keep subscription state ○ Receive mapping updates ○ Create and delegate to underlying Kafka consumer for each associated cluster on the fly
  • 23. Multi-Cluster Consumer Topic to Cluster Mapping and Code Example { "foo": [ {"vip": "cluster1"}, {"vip": "cluster2"} ], “bar”: [ {“vip”: “cluster2”} ] } // Create a multi-cluster consumer Consumer<String, String> multiClusterConsumer = ... // subscribe as usual and keep subscription state consumer.subscribe(new ArrayList<String>(“foo”)); while (...) { // fetch from both clusters for topic “foo” and // return the aggregated records ConsumerRecords<String, String> records = multiClusterConsumer.poll(2000); process(records); }
  • 24. Topic move for Multi-cluster Consumers Multi-cluster Consumer Producer “foo”: “cluster1” “foo”: [“cluster1”] “foo”: “cluster2” “foo”: [“cluster1”, “cluster2”] “foo”: [“cluster2”] cluster1 cluster2
  • 26. What About Keyed Messages ● Few topics requiring keyed messages in Netflix ● A word of caution for keyed messages ○ Inflexible/skewed load balancing ○ Difficult to scale ● Handling of keyed messages ○ Currently only produced by routers to consumer Kafka ○ Hard to guarantee message ordering in multi-cluster setting ○ Key-consumer affinity is guaranteed
  • 27. Think Differently on Scaling Kafka The “broker” way The “cluster” way Scale up Add brokers Add clusters Data balance Move partitions to different brokers Move/expand topics to different clusters Producer Produce to different brokers at the same time Produce to different clusters at the same time Consumer Consume from different brokers at the same time Consume from different clusters at the same time