SlideShare a Scribd company logo
4
Most read
7
Most read
22
Most read
A Deep Dive into Kafka
Controller
Jun Rao
VP of Apache Kafka
Co-founder of Confluent
Apache Kafka overview
• Core
• Pub/sub
• Connect
• Integration
• Streams
• Processing
Kafka adoption in enterprises
6 of the top 10
travel companies
8 of the top 10
insurance companies
7 of the top 10
global banks
9 of the top 10
telecom companies
Kafka Replication
• Configurable replication factor
• Tolerating f – 1 failures with f replicas
• Automated failover
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
High Level Data Flow in Replication
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
topic1-part1 topic1-part1 topic1-part1
consumer
1
What’s controller
6
• One broker in a cluster acts as controller
• Monitor the liveness of brokers
• Elect new leaders on broker failure
• Communicate new leaders to brokers
Controller election
Zookeeper
/controller	à broker	0
Controller
broker	0 broker	3broker	2broker	1
Partition state: stored in ZK, cached in
controller
Zookeeper
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Controller
broker	0 broker	3broker	2broker	1
Controlled shutdown
SIG_TERM
Zookeeper
Controller
1
2
broker	2
part	t-0:	follower
part	t-1:	follower
broker	1
part	t-0:	leader
part	t-1:	leader
broker	0
Zookeeper
Controller
3
5
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
broker	0
4
/topics/t/0	à 2
/topics/t/1	à 2
Issues with controlled shutdown (pre 1.1)
Zookeeper
Controller
3
5
broker	0
4
Writes to ZK
are serial
Impact:
longer
shutdown
time
Communication of new
leaders not batched
Impact: client timeout
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
/topics/t/0	à 2
/topics/t/1	à 2
Controller failover
Zookeeper
/controller	à broker	0
Controller
broker	0 broker	3broker	2broker	1
1
Controller failover
Controller
broker	0 broker	3broker	2broker	1
1 2
Controller
Zookeeper
/controller	à broker	2
Controller failover
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Issues with controller failover (pre 1.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Reads from ZK are serial
Impact: availability
Zombie old controller
Impact: inconsistency
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Performance improvements in 1.1
15
• Controller uses async ZK api for reads/writes
• Controller communicates new leaders to brokers in batches
part	1 part	2 part	3 part	4
part	1
part	2
part	3
part	4
Old	(serial):
New	(pipelined):
/topics/t/0	à 2
/topics/t/1	à 2
Controlled shutdown (post 1.1)
Zookeeper
Controller
3
5
broker	0
4
Writes to ZK
pipelined
Communication of new
leaders batched
broker	2
part	t-0:	leader
part	t-1:	leader
broker	1
part	t-0:
part	t-1:
Controller failover (post 1.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
3
Controller
Reads from ZK pipelined
Zookeeper
/controller	à broker	2
/topic/t1/0	à leader:1
/topic/t1/1	à leader:3
…
/topic/t1/9	à leader:2
Results for controlled shutdown
18
• 5 ZK nodes and 5 brokers on different racks
• 25K topics, 1 partition, 2 replicas
• 10K partitions per broker
Kafka	1.0.0 Kafka	1.1.0
Controlled	shutdown	time 6.5	minutes 3	seconds
Results for controller failover
19
• 5 ZK nodes and 5 brokers on different racks
• 2K topics, 50 partitions, 1 replica
• Controller failover: reload100K partitions from ZK
Kafka	1.0.0 Kafka	1.1.0
State	reload	time 28	seconds 14	seconds
Fencing zombie controller
20
• ZK session expiration
• Better handling in the controller (1.1)
• Controller path deletion
• Writes to ZK conditioned on controller epoch (to be in 2.1)
Controller failover (expected in 2.1)
Controller
broker	0 broker	3broker	2broker	1
1 2
Controller
Zombie old controller
fenced
Zookeeper
/controller	à broker	2
Summary
• Significant performance improvement in controller in 1.1
• Allow 10X more partitions in a Kafka cluster
• Better fencing of zombie controller in 1.1 and 2.1
• More details in KAFKA-5027
Future work in controller
• Further improvement on controller failover
• Standby controller
• Better handling of quick broker restart (KAFKA-1120)
• Broker generation
Q/A
• Acknowledgment: Onur Karaman, Manikumar Reddy,
Prasanna Gautam, Ismael Juma, Mickael Maison, Sandor
Murakozi, Rajini Sivaram,Ted Yu, Zhanxiang Huang
• Apache Kafka: http://kafka.apache.org/
• Confluent: http://confluent.io/

More Related Content

What's hot (20)

PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
So You Want to Write a Connector?
confluent
 
PPTX
kafka
Amikam Snir
 
PPTX
Kafka at Peak Performance
Todd Palino
 
PDF
Disaster Recovery and High Availability with Kafka, SRM and MM2
Abdelkrim Hadjidj
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PPTX
Apache Kafka
Saroj Panyasrivanit
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Apache kafka
Kumar Shivam
 
PPTX
Apache Kafka - Overview
CodeOps Technologies LLP
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PPTX
A visual introduction to Apache Kafka
Paul Brebner
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Using Queryable State for Fun and Profit
Flink Forward
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
ksqlDB: A Stream-Relational Database System
confluent
 
From Zero to Hero with Kafka Connect
confluent
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
So You Want to Write a Connector?
confluent
 
Kafka at Peak Performance
Todd Palino
 
Disaster Recovery and High Availability with Kafka, SRM and MM2
Abdelkrim Hadjidj
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Introduction to Apache Kafka
AIMDek Technologies
 
Apache Kafka
Saroj Panyasrivanit
 
Stream processing using Kafka
Knoldus Inc.
 
Apache kafka
Kumar Shivam
 
Apache Kafka - Overview
CodeOps Technologies LLP
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
A visual introduction to Apache Kafka
Paul Brebner
 
Kafka 101
Clement Demonchy
 
Using Queryable State for Fun and Profit
Flink Forward
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 

Similar to A Deep Dive into Kafka Controller (20)

PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PDF
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
confluent
 
PPTX
Microservices interaction at scale using Apache Kafka
Ivan Ursul
 
PPTX
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
confluent
 
PDF
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
PDF
Grokking TechTalk #24: Kafka's principles and protocols
Grokking VN
 
PDF
Kafka Needs No Keeper
C4Media
 
PDF
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Training Slides: 102 - Basics - Tungsten Replicator - How We Move Your Data
Continuent
 
PPTX
Getting Started with Kafka on k8s
VMware Tanzu
 
PDF
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
PDF
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
Natan Silnitsky
 
PDF
Flink at netflix paypal speaker series
Monal Daxini
 
PPTX
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
PDF
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
PDF
Kafka High Availability in multi data center setup with floating Observers wi...
HostedbyConfluent
 
PDF
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
confluent
 
PDF
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
HostedbyConfluent
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Kafka Summit SF 2017 - Running Kafka as a Service at Scale
confluent
 
Microservices interaction at scale using Apache Kafka
Ivan Ursul
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
confluent
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking VN
 
Kafka Needs No Keeper
C4Media
 
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
HostedbyConfluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Training Slides: 102 - Basics - Tungsten Replicator - How We Move Your Data
Continuent
 
Getting Started with Kafka on k8s
VMware Tanzu
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward
 
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
Natan Silnitsky
 
Flink at netflix paypal speaker series
Monal Daxini
 
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward
 
Kafka High Availability in multi data center setup with floating Observers wi...
HostedbyConfluent
 
The Foundations of Multi-DC Kafka (Jakub Korab, Solutions Architect, Confluen...
confluent
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
HostedbyConfluent
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Digital Circuits, important subject in CS
contactparinay1
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 

A Deep Dive into Kafka Controller