SlideShare a Scribd company logo
Introducing KRaft: Kafka without
ZooKeeper
Colin McCabe
Principal Engineer
Confluent
About Me
2
I work on Apache Kafka at
Confluent.
Kafka Committer & PMC Member
3
● Introduction
● Demo
● Background
● KRaft Architecture
● Deploying KRaft
● Roadmap
● Q&A
Table of Contents
Introduction
4
Removing ZooKeeper
5
Current
ZK-Based
Architecture
Removing ZooKeeper
6
Current
ZK-Based
Architecture
New KRaft-Based
Architecture
Demo
7
8
Background
9
Apache ZooKeeper
10
● ZooKeeper is a service for maintaining
configuration, naming, and
distributed synchronization
● ZK is typically deployed in 3-node
clusters
● Traditionally, all Kafka clusters
needed an associated ZK cluster.
How Kafka Used ZooKeeper
11
As a metadata store for…
● Topics
● Partitions
● Configurations
● Quotas
● ACLs
And a way to coordinate
cluster membership and
elect a controller.
/brokers/ids/0
/brokers/ids/1
/brokers/topics/foo
/brokers/topics/foo/partitions/0/state
/brokers/topics/foo/partitions/1/state
/brokers/topics/foo/partitions/2/state
...
Kafka Clients and ZooKeeper
12
● Initially: admin tools and clients talked to ZK
Kafka Clients and ZooKeeper: Problems
13
● Initially: admin tools and clients talked to ZK
● Limited
security
● Complex
config
● Perf issues
● Limited
isolation
Isolating Kafka Clients from ZooKeeper
14
● Solution: admin tools and clients talk to brokers
Isolating Kafka Clients from ZooKeeper
15
● Solution: admin tools and clients talk to brokers
● ACLs
● Quotas
● Auditing
Brokers and ZooKeeper: Problems
16
● Limited metadata
scalability
● Long controller
failover times
● Complex
programming
paradigm
● Limited reqs/sec
KRaft Mode
17
● KRaft = Kafka on Raft
● Kafka stores its own
metadata in a Raft
quorum
● Leader = active
● Hot standbys
● 1 system
KRaft Architecture
18
Controller Startup with ZK
19
Controller Startup with ZK
20
● One broker wins the election
in ZooKeeper to be the
controller
Load full
metadata
Controller Startup with ZK
21
Controller Startup with ZK
22
● UpdateMetadataRequest
● LeaderAndIsrRequest
All
partitions
Problems with ZK-based Controller Startup
23
● Have to load all metadata synchronously on startup:
○ O(num_partitions), O(num_brokers)
○ Controller is unavailable during this time
■ Cold start: cluster unavailable.
■ Controller restart: admin ops and ISR changes unavailable
● Have to send all metadata to all brokers on startup
3 minutes
How KRaft Replaces ZooKeeper
24
● Instead of ZooKeeper, we have an internal topic
named __cluster_metadata
○ Replicated with KRaft
○ Single partition
○ The leader is the active controller
● KRaft: Raft for Kafka
○ The Raft protocol implemented in
Kafka
■ Records committed by a majority
of nodes
○ Self-managed quorum
■ Doesn’t rely on an external
system for leader election
Metadata Records
25
● Binary records containing
metadata
○ KRPC format
○ Auto-generated from protocol
schemas
○ Can also be translated into
JSON for human readability
● Two ways to evolve format
○ New record versions
○ Tagged fields
● Some records are deltas that
apply changes to existing state
{
"type": "REGISTER_BROKER_RECORD",
"version": 0,
"data": {
"brokerId": 1,
"incarnationId": "P3UFsWoNR-erL9PK98YLsA",
"brokerEpoch": 0,
"endPoints": [
{
"name": "PLAINTEXT",
"host": "localhost",
"port": 9092,
"securityProtocol": 0
}
],
"features": [],
"rack": null
}
}
Metadata as An Ordered Log
26
TopicRecord(name=foo, id=rtkInsMkQPiEBj6uz67rrQ)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=0, …)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=1, …)
PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=2, …)
ConfigRecord(name=num.io.threads, value=...)
RegisterBrokerRecord(id=4, endpoints=…, …)
10434
10435
10436
10437
10438
10439
2
1
Controller Startup with KRaft
27
● The blue nodes are
designated KRaft controllers
Controller Startup with KRaft
28
● The KRaft controllers will elect
a single leader
● The leader will have all
previously committed records
Controller Startup with KRaft
29
● The newly elected KRaft controller
is ready immediately
● Brokers fetch only the metadata
they need
● New brokers and brokers that are
behind fetch snapshots
Controller Failover
30
ZK Mode KRaft Mode
• Win controller
election
• Load all topics and
partitions
• Send LeaderAndIsr
+ UpdateMetadata
to all brokers
• Win KRaft
election
• Start handling
requests from
brokers
Rolling Nodes
31
ZK Mode KRaft Mode
Restarted node
begins with no
metadata. Must wait
for full metadata
update RPCs from
controller.
Restarted node
consults its local
metadata cache,
transfers only what it
doesn’t have, based
on metadata offset.
Calling an Admin API with ZK
32
● External client connects to a
random broker
Calling an Admin API with ZK
33
● Broker alters ZooKeeper
● Example: create znodes
for new topic
Calling an Admin API with ZK
34
● Znode watch triggers
● Controller loads changes
Calling an Admin API with ZK
35
● Controller pushes out changes to
brokers
● Incremental LeaderAndIsr
● Incremental UpdateMetadata
ZK-based Programming Model
36
Controller Thread
ZooKeeper
Create
/brokers/topics/foo
Request Handler
/brokers/topics
Changed
List /brokers/topics
Compute
changes
List /brokers/topics/foo
Problems
37
● What if the controller fails to send an update to a specific broker?
○ Metadata divergence
○ No easy way to know what we have and what we don’t have
● Difficult programming model
○ Multiple writers to ZooKeeper
○ Can’t assume your cache is up-to-date!
● ZK is the bottleneck for all admin operations
○ Often 1 admin operation leads to many ZK operations
○ Blocking
Calling an Admin API with KRaft
38
● External client connects to a
random broker
Calling an Admin API with KRaft
39
● Broker forwards the request to
the active controller for
processing
Calling an Admin API with KRaft
40
● Controller creates metadata
records and persists them to
__cluster_metadata
Calling an Admin API with KRaft
41
● Once the records have been
committed to the metadata
log, the active controller
returns the result to the
forwarding broker, which
returns it to the external client
Calling an Admin API with KRaft
42
● Brokers are continuously
fetching metadata from the
active controller.
● They become aware of the
admin changes by reading the
new metadata records.
KRaft-based Programming Model
43
Raft Quorum
Controller
CreateTopicsRequest
Request Handler
Write new records Last stable offset
advances
CreateTopicsResponse
KRaft-based Programming Model: Pipelining
44
Raft Quorum
Controller
CreateTopicsRequest
Request Handler
Write new records Last stable offset
advances
CreateTopicsResponse
Admin Operations in KRaft
45
● Pull-based metadata propagation model can recover from RPC send failures
easily
● Simpler programming model: single-writer
● Pipelining means that we can have multiple metadata operations in flight at once
Deploying KRaft
46
Deploying KRaft
47
X 4
X 3
Deploying KRaft
48
X 4
X 3
X 4
X 3
Deploying KRaft
49
X 4
X 3
X 3
X 3
● Combined mode
● Can run with 3 brokers and RF=3
New Tools
50
● “Double roll” process
for upgrading
inter.broker.protocol
● zookeeper-shell.sh
New Tools
51
● “Double roll” process
for upgrading
inter.broker.protocol
● zookeeper-shell.sh
● kafka-features.sh
with metadata flag
● metadata-shell.sh
● dump-log-segments.sh
kafka-dump-log
52
$ ./bin/kafka-dump-log.sh 
--cluster-metadata-decoder 
--files /tmp/logs/__cluster_metadata-0/00000000000000000000.log
Dumping /tmp/logs/__cluster_metadata-0/00000000000000000000.log
Starting offset: 0
baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1
[...]
| offset: 0 CreateTime: 1650857270775 keySize: 4 valueSize: 19
sequence: -1 headerKeys: [] controlType: LEADER_CHANGE(2)
baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 [...]
{"type":"REGISTER_BROKER_RECORD","version":0,"data":{"...
Interactive Metadata Shell
53
● Replaces zookeeper-shell in KRaft clusters
● Data sources
○ Snapshot
○ Running controller cluster
● Reads __cluster_metadata log entries into memory
● Constructs a “virtual filesystem” with the cluster’s information
● Commands available: ls, pwd, cd, find, etc.
kafka-metadata-shell
54
$ ./bin/kafka-metadata-shell.sh --snapshot 
/tmp/logs/__cluster_metadata-0/00000000000000000000.log
Loading...
Starting...
[ Kafka Metadata Shell ]
>> ls
brokers configs local metadataQuorum topicIds topics
>> cat /brokers/1/registration
RegisterBrokerRecord(brokerId=1, incarnationId=tHo3Z8dYSuONV5hA82BVug,
brokerEpoch=0, endPoints=[BrokerEndpoint(name='PLAINTEXT',
host='localhost', port=9092, securityProtocol=0)], features=[],
rack=null, fenced=true)
Monitoring the Quorum
55
● Metrics
○ MetadataOffset
○ SnapshotLag
○ SnapshotSizeBytes
○ MetadataCommitRate
○ MetadataCommitLatency
○ Current Raft state (follower, leader,
candidate, observer)
■ Important for controller health
● DescribeQuorum RPC
○ Leader ID
○ Leader epoch
○ High water mark
○ Current voters
■ Controllers
○ Current observers
■ Brokers
■ Possible metadata shell
instances, etc.
○ Log end offset of all followers
Roadmap
56
Roadmap
57
● AK 2.8: first KRaft release (EA)
● AK 3.3: KRaft is production-ready
for new clusters
● KRaft deployed in Confluent Cloud
Roadmap
58
● AK 2.8: first KRaft release (EA)
● AK 3.3: KRaft is production-ready
for new clusters
● KRaft deployed in Confluent Cloud
● AK 3.4: Upgrade from
ZK (EA)
● AK 3.5: ZK mode
deprecated
Roadmap
59
● AK 2.8: first KRaft release (EA)
● AK 3.3: KRaft is production-ready
for new clusters
● KRaft deployed in Confluent Cloud
● AK 3.4: Upgrade from
ZK (EA)
● AK 3.5: ZK mode
deprecated
● AK 4.0: ZK mode
removed
Feature Gaps in Kafka 3.3
60
● Upgrade from ZK mode
● Dynamic configurations on the controller
● SCRAM support
● Delegation tokens
● JBOD support
Thank You!
Colin McCabe
cmccabe@apache.org

More Related Content

What's hot (20)

PDF
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
HostedbyConfluent
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
A Deep Dive into Kafka Controller
confluent
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
Kafka at Peak Performance
Todd Palino
 
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PDF
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
PDF
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PDF
Vectors are the new JSON in PostgreSQL
Jonathan Katz
 
PPTX
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
PDF
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
PPTX
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
NTT DATA Technology & Innovation
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PDF
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
PDF
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
PPTX
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
NTT DATA Technology & Innovation
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PDF
PostgreSQL WAL for DBAs
PGConf APAC
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
HostedbyConfluent
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
A Deep Dive into Kafka Controller
confluent
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Kafka at Peak Performance
Todd Palino
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
pg_walinspectについて調べてみた!(第37回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
Vectors are the new JSON in PostgreSQL
Jonathan Katz
 
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
JVMに裏から手を出す!JVMTIに触れてみよう(オープンソースカンファレンス2020 Online/Hiroshima 講演資料)
NTT DATA Technology & Innovation
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PostgreSQL16新機能紹介 - libpq接続ロード・バランシング(第41回PostgreSQLアンカンファレンス@オンライン 発表資料)
NTT DATA Technology & Innovation
 
Kafka’s New Control Plane: The Quorum Controller | Colin McCabe, Confluent
HostedbyConfluent
 
祝!PostgreSQLレプリケーション10周年!徹底紹介!!
NTT DATA Technology & Innovation
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PostgreSQL WAL for DBAs
PGConf APAC
 

Similar to Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022 (20)

PDF
Bridge to the Future: Migrating to KRaft
HostedbyConfluent
 
PDF
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
HostedbyConfluent
 
PDF
Kafka Needs No Keeper
C4Media
 
DOCX
Kafk a with zoo keeper setup documentation
Thiyagarajan saminadane
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PDF
War Stories: DIY Kafka
confluent
 
PDF
War Stories: DIY Kafka
confluent
 
PDF
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
PDF
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Bhasker Kode
 
PDF
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang
 
PDF
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
Streaming Processing with a Distributed Commit Log
Joe Stein
 
PDF
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
PDF
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
PDF
Stream Processing with Apache Kafka and .NET
confluent
 
PDF
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Distributed messaging through Kafka
Dileep Kalidindi
 
Bridge to the Future: Migrating to KRaft
HostedbyConfluent
 
The Log of All Logs: Raft-based Consensus Inside Kafka | Guozhang Wang, Confl...
HostedbyConfluent
 
Kafka Needs No Keeper
C4Media
 
Kafk a with zoo keeper setup documentation
Thiyagarajan saminadane
 
Introduction to apache kafka
Samuel Kerrien
 
War Stories: DIY Kafka
confluent
 
War Stories: DIY Kafka
confluent
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
Kafka & Storm - FifthElephant 2015 by @bhaskerkode, Helpshift
Bhasker Kode
 
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Streaming Processing with a Distributed Commit Log
Joe Stein
 
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
Paul Brebner
 
Stream Processing with Apache Kafka and .NET
confluent
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Apache Kafka Introduction
Amita Mirajkar
 
Distributed messaging through Kafka
Dileep Kalidindi
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Biography of Daniel Podor.pdf
Daniel Podor
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 

Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022

  • 1. Introducing KRaft: Kafka without ZooKeeper Colin McCabe Principal Engineer Confluent
  • 2. About Me 2 I work on Apache Kafka at Confluent. Kafka Committer & PMC Member
  • 3. 3 ● Introduction ● Demo ● Background ● KRaft Architecture ● Deploying KRaft ● Roadmap ● Q&A Table of Contents
  • 8. 8
  • 10. Apache ZooKeeper 10 ● ZooKeeper is a service for maintaining configuration, naming, and distributed synchronization ● ZK is typically deployed in 3-node clusters ● Traditionally, all Kafka clusters needed an associated ZK cluster.
  • 11. How Kafka Used ZooKeeper 11 As a metadata store for… ● Topics ● Partitions ● Configurations ● Quotas ● ACLs And a way to coordinate cluster membership and elect a controller. /brokers/ids/0 /brokers/ids/1 /brokers/topics/foo /brokers/topics/foo/partitions/0/state /brokers/topics/foo/partitions/1/state /brokers/topics/foo/partitions/2/state ...
  • 12. Kafka Clients and ZooKeeper 12 ● Initially: admin tools and clients talked to ZK
  • 13. Kafka Clients and ZooKeeper: Problems 13 ● Initially: admin tools and clients talked to ZK ● Limited security ● Complex config ● Perf issues ● Limited isolation
  • 14. Isolating Kafka Clients from ZooKeeper 14 ● Solution: admin tools and clients talk to brokers
  • 15. Isolating Kafka Clients from ZooKeeper 15 ● Solution: admin tools and clients talk to brokers ● ACLs ● Quotas ● Auditing
  • 16. Brokers and ZooKeeper: Problems 16 ● Limited metadata scalability ● Long controller failover times ● Complex programming paradigm ● Limited reqs/sec
  • 17. KRaft Mode 17 ● KRaft = Kafka on Raft ● Kafka stores its own metadata in a Raft quorum ● Leader = active ● Hot standbys ● 1 system
  • 20. Controller Startup with ZK 20 ● One broker wins the election in ZooKeeper to be the controller
  • 22. Controller Startup with ZK 22 ● UpdateMetadataRequest ● LeaderAndIsrRequest All partitions
  • 23. Problems with ZK-based Controller Startup 23 ● Have to load all metadata synchronously on startup: ○ O(num_partitions), O(num_brokers) ○ Controller is unavailable during this time ■ Cold start: cluster unavailable. ■ Controller restart: admin ops and ISR changes unavailable ● Have to send all metadata to all brokers on startup 3 minutes
  • 24. How KRaft Replaces ZooKeeper 24 ● Instead of ZooKeeper, we have an internal topic named __cluster_metadata ○ Replicated with KRaft ○ Single partition ○ The leader is the active controller ● KRaft: Raft for Kafka ○ The Raft protocol implemented in Kafka ■ Records committed by a majority of nodes ○ Self-managed quorum ■ Doesn’t rely on an external system for leader election
  • 25. Metadata Records 25 ● Binary records containing metadata ○ KRPC format ○ Auto-generated from protocol schemas ○ Can also be translated into JSON for human readability ● Two ways to evolve format ○ New record versions ○ Tagged fields ● Some records are deltas that apply changes to existing state { "type": "REGISTER_BROKER_RECORD", "version": 0, "data": { "brokerId": 1, "incarnationId": "P3UFsWoNR-erL9PK98YLsA", "brokerEpoch": 0, "endPoints": [ { "name": "PLAINTEXT", "host": "localhost", "port": 9092, "securityProtocol": 0 } ], "features": [], "rack": null } }
  • 26. Metadata as An Ordered Log 26 TopicRecord(name=foo, id=rtkInsMkQPiEBj6uz67rrQ) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=0, …) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=1, …) PartitionRecord(id=rtkInsMkQPiEBj6uz67rrQ, index=2, …) ConfigRecord(name=num.io.threads, value=...) RegisterBrokerRecord(id=4, endpoints=…, …) 10434 10435 10436 10437 10438 10439 2 1
  • 27. Controller Startup with KRaft 27 ● The blue nodes are designated KRaft controllers
  • 28. Controller Startup with KRaft 28 ● The KRaft controllers will elect a single leader ● The leader will have all previously committed records
  • 29. Controller Startup with KRaft 29 ● The newly elected KRaft controller is ready immediately ● Brokers fetch only the metadata they need ● New brokers and brokers that are behind fetch snapshots
  • 30. Controller Failover 30 ZK Mode KRaft Mode • Win controller election • Load all topics and partitions • Send LeaderAndIsr + UpdateMetadata to all brokers • Win KRaft election • Start handling requests from brokers
  • 31. Rolling Nodes 31 ZK Mode KRaft Mode Restarted node begins with no metadata. Must wait for full metadata update RPCs from controller. Restarted node consults its local metadata cache, transfers only what it doesn’t have, based on metadata offset.
  • 32. Calling an Admin API with ZK 32 ● External client connects to a random broker
  • 33. Calling an Admin API with ZK 33 ● Broker alters ZooKeeper ● Example: create znodes for new topic
  • 34. Calling an Admin API with ZK 34 ● Znode watch triggers ● Controller loads changes
  • 35. Calling an Admin API with ZK 35 ● Controller pushes out changes to brokers ● Incremental LeaderAndIsr ● Incremental UpdateMetadata
  • 36. ZK-based Programming Model 36 Controller Thread ZooKeeper Create /brokers/topics/foo Request Handler /brokers/topics Changed List /brokers/topics Compute changes List /brokers/topics/foo
  • 37. Problems 37 ● What if the controller fails to send an update to a specific broker? ○ Metadata divergence ○ No easy way to know what we have and what we don’t have ● Difficult programming model ○ Multiple writers to ZooKeeper ○ Can’t assume your cache is up-to-date! ● ZK is the bottleneck for all admin operations ○ Often 1 admin operation leads to many ZK operations ○ Blocking
  • 38. Calling an Admin API with KRaft 38 ● External client connects to a random broker
  • 39. Calling an Admin API with KRaft 39 ● Broker forwards the request to the active controller for processing
  • 40. Calling an Admin API with KRaft 40 ● Controller creates metadata records and persists them to __cluster_metadata
  • 41. Calling an Admin API with KRaft 41 ● Once the records have been committed to the metadata log, the active controller returns the result to the forwarding broker, which returns it to the external client
  • 42. Calling an Admin API with KRaft 42 ● Brokers are continuously fetching metadata from the active controller. ● They become aware of the admin changes by reading the new metadata records.
  • 43. KRaft-based Programming Model 43 Raft Quorum Controller CreateTopicsRequest Request Handler Write new records Last stable offset advances CreateTopicsResponse
  • 44. KRaft-based Programming Model: Pipelining 44 Raft Quorum Controller CreateTopicsRequest Request Handler Write new records Last stable offset advances CreateTopicsResponse
  • 45. Admin Operations in KRaft 45 ● Pull-based metadata propagation model can recover from RPC send failures easily ● Simpler programming model: single-writer ● Pipelining means that we can have multiple metadata operations in flight at once
  • 49. Deploying KRaft 49 X 4 X 3 X 3 X 3 ● Combined mode ● Can run with 3 brokers and RF=3
  • 50. New Tools 50 ● “Double roll” process for upgrading inter.broker.protocol ● zookeeper-shell.sh
  • 51. New Tools 51 ● “Double roll” process for upgrading inter.broker.protocol ● zookeeper-shell.sh ● kafka-features.sh with metadata flag ● metadata-shell.sh ● dump-log-segments.sh
  • 52. kafka-dump-log 52 $ ./bin/kafka-dump-log.sh --cluster-metadata-decoder --files /tmp/logs/__cluster_metadata-0/00000000000000000000.log Dumping /tmp/logs/__cluster_metadata-0/00000000000000000000.log Starting offset: 0 baseOffset: 0 lastOffset: 0 count: 1 baseSequence: -1 [...] | offset: 0 CreateTime: 1650857270775 keySize: 4 valueSize: 19 sequence: -1 headerKeys: [] controlType: LEADER_CHANGE(2) baseOffset: 1 lastOffset: 1 count: 1 baseSequence: -1 [...] {"type":"REGISTER_BROKER_RECORD","version":0,"data":{"...
  • 53. Interactive Metadata Shell 53 ● Replaces zookeeper-shell in KRaft clusters ● Data sources ○ Snapshot ○ Running controller cluster ● Reads __cluster_metadata log entries into memory ● Constructs a “virtual filesystem” with the cluster’s information ● Commands available: ls, pwd, cd, find, etc.
  • 54. kafka-metadata-shell 54 $ ./bin/kafka-metadata-shell.sh --snapshot /tmp/logs/__cluster_metadata-0/00000000000000000000.log Loading... Starting... [ Kafka Metadata Shell ] >> ls brokers configs local metadataQuorum topicIds topics >> cat /brokers/1/registration RegisterBrokerRecord(brokerId=1, incarnationId=tHo3Z8dYSuONV5hA82BVug, brokerEpoch=0, endPoints=[BrokerEndpoint(name='PLAINTEXT', host='localhost', port=9092, securityProtocol=0)], features=[], rack=null, fenced=true)
  • 55. Monitoring the Quorum 55 ● Metrics ○ MetadataOffset ○ SnapshotLag ○ SnapshotSizeBytes ○ MetadataCommitRate ○ MetadataCommitLatency ○ Current Raft state (follower, leader, candidate, observer) ■ Important for controller health ● DescribeQuorum RPC ○ Leader ID ○ Leader epoch ○ High water mark ○ Current voters ■ Controllers ○ Current observers ■ Brokers ■ Possible metadata shell instances, etc. ○ Log end offset of all followers
  • 57. Roadmap 57 ● AK 2.8: first KRaft release (EA) ● AK 3.3: KRaft is production-ready for new clusters ● KRaft deployed in Confluent Cloud
  • 58. Roadmap 58 ● AK 2.8: first KRaft release (EA) ● AK 3.3: KRaft is production-ready for new clusters ● KRaft deployed in Confluent Cloud ● AK 3.4: Upgrade from ZK (EA) ● AK 3.5: ZK mode deprecated
  • 59. Roadmap 59 ● AK 2.8: first KRaft release (EA) ● AK 3.3: KRaft is production-ready for new clusters ● KRaft deployed in Confluent Cloud ● AK 3.4: Upgrade from ZK (EA) ● AK 3.5: ZK mode deprecated ● AK 4.0: ZK mode removed
  • 60. Feature Gaps in Kafka 3.3 60 ● Upgrade from ZK mode ● Dynamic configurations on the controller ● SCRAM support ● Delegation tokens ● JBOD support