SlideShare a Scribd company logo
Autonomous workload
rebalancing in Kafka
Indrajeet Kumar
Site Reliability Engineer - LinkedIn
Agenda
● Workload distribution problem
● Manual - Built-in tools
● Semi-automated - Kafka-assigner
● Autonomous - Cruise Control
Workload distribution problem
● Important for Distributed Systems
● Harder to work around with Stateful systems
Kafka Overview
App 2
App 1
App 3
Topic A
Topic C
Topic D
Topic B
Kafka
Kafka Overview
P 1
P 2
P 3
App 1
Topic A
P1
Broker X
P2
Broker Y
P3
Broker Z
R P3
R P2
R P1
Workload in Kafka
○ Leader Partitions
○ Total Partitions
○ Partition Sizes
P1
Broker X
P2
Broker Y
P3
Broker Z
R P3
R P2
R P1
Workload in Kafka
○ Leader Partitions
○ Total Partitions
○ Partition Sizes
P1
Broker X
P2
Broker Y
P3
Broker Z
R P3
R P2
R P1
Workload in Kafka
○ Leader Partitions
○ Total Partitions
○ Partition Sizes
P1
Broker X
P2
Broker Y
P3
Broker Z
R
P3
R
P2
R
P1
Workload in Kafka
○ Leader Partitions
○ Total Partitions
○ Partition Sizes
P1
Broker X
P2
Broker Y
P3
Broker Z
R P3
R
P2
R P1
Workload distribution problem - Some causes
● Major factors which affect workload balance are:
○ Bad partition distribution
○ Hard host failures
○ Soft host failures
○ Traffic patterns
● Rebalance the partitions!
○ Disk usage
○ Network usage
○ Number of partitions
○ Partition leadership count
Kafka workload distribution - Solution
Usual operations in Kafka
● Preferred Leader Election
● Partition rebalance
● Bump Partition counts
● Add/Remove brokers
Kafka at LinkedIn
Kafka at LinkedIn
● 4.5 Trillion messages a day
● 2500+ kafka brokers
● 1 PB In
● 3.9 PB Out
Kafka admin utilities
● Out of the box tools:
○ bin/kafka-reassign-partitions.sh
○ bin/kafka-preferred-replica-election.sh
Example run of built-in tools
● Rebalancing Partitions:
.
● <Example of kafka-preferred-replica-election.sh>
● Manual
● Less optimal
● Slow
Problems with stock tools
Kafka Assigner
Kafka assigner
● High level administrative commands
● Under the hood, it uses the ‘kafka-utils/bin/’ scripts
● It also allows to do complex rebalances with multiple goals
Kafka assigner
Preferred Leader election
Case of URPs
Kafka assigner
● Pros:
○ High level admin commands
○ Simple to use
○ Allows chaining rebalance goals
○ Easy to remove all partitions from a broker
Kafka assigner
● Cons:
○ Where did you run it?
○ In-optimal balances in certain cases
○ Needs manual invocation and supervision
Cruise Control
Cruise Control
● Central System
● Complete live health of the cluster
● Manual/Automatic management of workload
Design
User
Kafka Cruise Control
R
E
S
T
A
P
I
Analyzer
Failure
Detector
Workload
Monitor
KAFKA
Goal Executor
Autonomous workload rebalancing in kafka
Autonomous workload rebalancing in kafka
Cruise Control
CC setup requirements
● Kafka > 0.11.0.0
● Drop in jar
● Resource utilization tracking
● Multi-goal rebalance
● Anomaly detection
● Admin operations
Features already built-in
How is CC doing?
● Save SRE’s time to debug/fix kafka workload issues
● Very fast operations
● Central place to look at for globally distributed teams
● Self-heal !!
Resources
Kafka shipped admin-tools:
https://github.com/apache/kafka/tree/trunk/bin
Kafka Assigner:
https://github.com/linkedin/kafka-tools/wiki/Kafka-Assigner
Cruise Control:
https://github.com/linkedin/cruise-control
Connect with me: https://www.linkedin.com/in/indrajeetkm/
Questions

More Related Content

PPTX
Apache Kafka : Monitoring vs Alerting
Ratish Ravindran
 
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
PPTX
Stream data from Apache Kafka for processing with Apache Apex
Apache Apex
 
PDF
Clovaを支える技術 機械学習配信基盤のご紹介
LINE Corporation
 
PDF
Russell spring one2gx_messaging_india
GaryPRussell
 
PDF
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward
 
PPT
Tale of two streaming frameworks- Apace Storm & Apache Flink
Karthik Deivasigamani
 
PPTX
A Deep Dive into Kafka Controller
confluent
 
Apache Kafka : Monitoring vs Alerting
Ratish Ravindran
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
LINE Corporation
 
Stream data from Apache Kafka for processing with Apache Apex
Apache Apex
 
Clovaを支える技術 機械学習配信基盤のご紹介
LINE Corporation
 
Russell spring one2gx_messaging_india
GaryPRussell
 
Flink Forward San Francisco 2019: Elastic Data Processing with Apache Flink a...
Flink Forward
 
Tale of two streaming frameworks- Apace Storm & Apache Flink
Karthik Deivasigamani
 
A Deep Dive into Kafka Controller
confluent
 

What's hot (20)

PPTX
Portable Streaming Pipelines with Apache Beam
confluent
 
PPTX
Apache Apex Kafka Input Operator
Apache Apex
 
PDF
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
confluent
 
PDF
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward
 
PPTX
Architecture Sustaining LINE Sticker services
LINE Corporation
 
PPTX
Apache Apex Meetup at Cask
Apache Apex
 
PPTX
Distributed monitoring
Leon Torres
 
PDF
Topic and schema management-meetupberlin
confluent
 
PDF
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward
 
PPTX
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
~Eric Principe
 
PDF
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
PDF
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Yu-Jhe Li
 
PPTX
Apache Apex connector with Kafka 0.9 consumer API
Apache Apex
 
PPTX
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
PPTX
Kafka Retry and DLQ
George Teo
 
PDF
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
confluent
 
PDF
Introduction to Structured Streaming
datamantra
 
PDF
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
APNIC
 
PDF
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Codemotion
 
PDF
Introduction to Apache Apex - CoDS 2016
Bhupesh Chawda
 
Portable Streaming Pipelines with Apache Beam
confluent
 
Apache Apex Kafka Input Operator
Apache Apex
 
Organic Growth and A Good Night Sleep: Effective Kafka Operations at Pinteres...
confluent
 
Flink Forward SF 2017: Scott Kidder - Building a Real-Time Anomaly-Detection ...
Flink Forward
 
Architecture Sustaining LINE Sticker services
LINE Corporation
 
Apache Apex Meetup at Cask
Apache Apex
 
Distributed monitoring
Leon Torres
 
Topic and schema management-meetupberlin
confluent
 
Flink Forward San Francisco 2019: Building production Flink jobs with Airstre...
Flink Forward
 
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
~Eric Principe
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Ankur Bansal
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Yu-Jhe Li
 
Apache Apex connector with Kafka 0.9 consumer API
Apache Apex
 
Kafka Practices @ Uber - Seattle Apache Kafka meetup
Mingmin Chen
 
Kafka Retry and DLQ
George Teo
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
confluent
 
Introduction to Structured Streaming
datamantra
 
ICANN DNS Symposium (IDS 2019): RDAP CDN Distribution Experience
APNIC
 
Open Source Serverless: a practical view. - Gabriele Provinciali Luca Postacc...
Codemotion
 
Introduction to Apache Apex - CoDS 2016
Bhupesh Chawda
 
Ad

Similar to Autonomous workload rebalancing in kafka (20)

PDF
Tips & Tricks for Apache Kafka®
confluent
 
PDF
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
StreamNative
 
PDF
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
HostedbyConfluent
 
PDF
What's new in confluent platform 5.4 online talk
confluent
 
PDF
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
PDF
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
PPTX
Instaclustr Kafka Meetup Sydney Presentation
Ben Slater
 
PDF
kafka
Ariel Moskovich
 
PDF
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
PDF
Build real time stream processing applications using Apache Kafka
Hotstar
 
PDF
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
Thoughtworks
 
PDF
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
PDF
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
PDF
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
PDF
Uber: Kafka Consumer Proxy
confluent
 
PDF
LINE's Private Cloud - Meet Cloud Native World
LINE Corporation
 
ODP
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
PDF
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Tips & Tricks for Apache Kafka®
confluent
 
How Uber scaled its Real Time Infrastructure to Trillion events per day
DataWorks Summit
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Manage Pulsar Cluster Lifecycles with Kubernetes Operators - Pulsar Summit NA...
StreamNative
 
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
HostedbyConfluent
 
What's new in confluent platform 5.4 online talk
confluent
 
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
HostedbyConfluent
 
Instaclustr Kafka Meetup Sydney Presentation
Ben Slater
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
HostedbyConfluent
 
Build real time stream processing applications using Apache Kafka
Hotstar
 
A Functional Approach to Architecture - Kafka & Kafka Streams - Kevin Mas Rui...
Thoughtworks
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Garindra Prahandono
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
HostedbyConfluent
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
Uber: Kafka Consumer Proxy
confluent
 
LINE's Private Cloud - Meet Cloud Native World
LINE Corporation
 
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
Clusternaut: Orchestrating  Percona XtraDB Cluster with Kubernetes
Raghavendra Prabhu
 
Ad

Recently uploaded (20)

PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PPT
Ppt for engineering students application on field effect
lakshmi.ec
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
PDF
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
PPTX
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
PPTX
Color Model in Textile ( RGB, CMYK).pptx
auladhossain191
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
PDF
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
Ppt for engineering students application on field effect
lakshmi.ec
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
Introduction to Data Science: data science process
ShivarkarSandip
 
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
hatem173148
 
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
Civil Engineering Practices_BY Sh.JP Mishra 23.09.pptx
bineetmishra1990
 
Color Model in Textile ( RGB, CMYK).pptx
auladhossain191
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
Inventory management chapter in automation and robotics.
atisht0104
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
2010_Book_EnvironmentalBioengineering (1).pdf
EmilianoRodriguezTll
 
dse_final_merit_2025_26 gtgfffffcjjjuuyy
rushabhjain127
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 

Autonomous workload rebalancing in kafka