SlideShare a Scribd company logo
A Bridge over Troubled Water - Implementing
Exactly-Once Semantics and Escaping Kafka
Rebalance Storms
Antonovsky Yulia
© 2023 Akamai
2
© 2023 Akamai
3
© 2023 Akamai
4
About Me
Senior Software Engineer II at Akamai Technologies since 2020
Big Data Engineering experience since 2016
Started career as student intern at SAP Labs Israel in 2007
yulia-antonovsky
© 2023 Akamai
5
Agenda
➢ Introduction
➢ CSI Ingest architecture
➢ Managing Kafka Transactions
➢ Avoid Kafka endless rebalancing
➢ Q&A
© 2023 Akamai
6
About Akamai Technologies
Akamai Technologies is the largest content delivery network (CDN) services
provider in the world that also offers cloud and security services.
In numbers:
● 350K servers across the world
● 8B requests per day
● ~ 30% of the global internet traffic
We power and protect life online
© 2023 Akamai
7
About CSI Group (Cloud Security Intelligence)
Our team is responsible for the ongoing development and maintenance of a platform
designed to collect, analyze, and distill high-quality security intelligence information. We
handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data
events per day.
CSI Cluster
© 2023 Akamai
8
CSI Ingest Architecture
© 2023 Akamai
9
Drill Down
Standart iteration flow:
1. Consume kafka messages
2. Read files from Blob
3. Process the data
4. Write to Blob results
5. Produce kafka messages
© 2023 Akamai
10
Guardians of the Data
Just like the Guardians of the
Galaxy protect the universe, we
are dedicated to protecting the
accuracy of our customers' data
How can we prevent data loss or duplication when application pods are
continuously scaled in and out to handle data traffic?
© 2023 Akamai
11
Managing Kafka Transactions
● We actively manage partition offsets to ensure that we consume data from Kafka exactly
once.
● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data
even in the event of failures or retries.
● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all
writes either succeed or fail together.
© 2023 Akamai
12
KafkaTransactionManager
● It supports seamless processing of transactional data across one or more source and target
topics.
● The component handles the entire process from message consumption to committing or
aborting Kafka transactions.
● To simplify the use of Kafka transactions across all our applications, we developed a
component called KafkaTransactionManager.
© 2023 Akamai
13
KafkaTransactionManager API
© 2023 Akamai
14
KafkaTransactionManager API
kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets
kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics,
returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction
to retrieve additional messages.
kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target
topics. This method can be called multiple times within the same transaction to send additional messages.
kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated
consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure
occurs, the abortTransaction API must be called to ensure that the transaction is rolled back.
kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by
executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka
producer client is closed, and a new one is created.
© 2023 Akamai
15
Kafka Clients’ “Transactional” Configurations
● Kafka consumer client configurations:
○ enable.auto.commit = false
○ isolation.level = read_committed
● Kafka producer client configurations:
○ transactional.id = randomUUID()
○ transaction.timeout.ms - depends on application
© 2023 Akamai
16
Avoid Kafka endless rebalancing
Within a consumer group, Kafka changes the ownership of partition from one consumer to
another at certain events. The process of changing partition ownership across the
consumers is called a rebalance.
© 2023 Akamai
17
What Triggers a Rebalancing?
● The topic partition or partition replica count changes
● Consumer group properties are changed
● Consumer joins or leaves a group
Why it can rebalance forever?
● Networking issues
● System complexity
● Inappropriate configurations
● Scale up/down, k8s moves pods
● Application/pod restarts
● Not all pods start synchronously
© 2023 Akamai
18
Kafka “Rebalance” Configurations
All of the related configurations are Kafka consumer configurations.
session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for
a heartbeat signal from a consumer before removing it from the group.
heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to
the consumer coordinator.
max.poll.interval.ms: This setting determines the maximum delay between invocations of poll()
when using consumer group management.
group.instance.id: A unique identifier provided by the end-user for the consumer instance.
partition.assignment.strategy: A list of class names or class types, ordered by preference, of
supported partition assignment strategies that the client will use to distribute partition ownership.
© 2023 Akamai
19
Partition Assignment Strategy
CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative
rebalancing. Available since version 2.4.
RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a
contiguous range of partitions.
RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion.
StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as
many existing partition assignments as possible.
© 2023 Akamai
20
Kafka Rebalance Listener
ConsumerPartitionAssignor is a high-level interface that allows you to implement your own
custom partition assignment strategy.
● Rebalance listener can't prevent rebalancing but can minimize its impact
● Can only be triggered during polling
● In transactional iterations, it can save processing costs
ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before
and after the partition assignment.
© 2023 Akamai
21
Summary
★ Manage consumed offsets manually when using Kafka's transactional API.
★ Disable auto commit, use read committed mode on consumer client config,
and add transactional.id to producer config.
★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance.
★ Configure appropriate timeouts on consumer client and define
group.instance.id, when possible, to skip Kafka rebalances.
★ Choose a partition assignment strategy carefully, and experiment with
different strategies to determine the best fit.
© 2023 Akamai
22
Q&A
Thank you:)
Feel free to reach me out yulia-antonovsky

More Related Content

What's hot (20)

PPTX
Real Time analytics with Druid, Apache Spark and Kafka
Daria Litvinov
 
PDF
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PPTX
Introduction to Apache ZooKeeper
Saurav Haloi
 
PDF
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
HostedbyConfluent
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PDF
Introducing Kafka's Streams API
confluent
 
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
Introduction to Apache Beam
Jean-Baptiste Onofré
 
PPTX
Apache Kafka
emreakis
 
PDF
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
PDF
How to Automate Performance Tuning for Apache Spark
Databricks
 
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PDF
Iceberg: a fast table format for S3
DataWorks Summit
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PDF
How Apache Kafka® Works
confluent
 
PDF
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
High Concurrency Architecture at TIKI
Nghia Minh
 
Real Time analytics with Druid, Apache Spark and Kafka
Daria Litvinov
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Introduction to Apache ZooKeeper
Saurav Haloi
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
HostedbyConfluent
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Introducing Kafka's Streams API
confluent
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Introduction to Apache Beam
Jean-Baptiste Onofré
 
Apache Kafka
emreakis
 
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
How to Automate Performance Tuning for Apache Spark
Databricks
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Iceberg: a fast table format for S3
DataWorks Summit
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
How Apache Kafka® Works
confluent
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Spark Summit
 
Introduction to Apache Kafka
AIMDek Technologies
 
High Concurrency Architecture at TIKI
Nghia Minh
 

Similar to Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky (20)

PDF
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
PDF
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
PPT
Kafka-and-event-driven-architecture-OGYatra20.ppt
Inam Bukhary
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
PDF
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
PDF
A Primer Towards Running Kafka on Top of Kubernetes.pdf
AvinashUpadhyaya3
 
ODP
Kafka aws
Ariel Moskovich
 
PPTX
Building Data Streaming Platforms using OpenShift and Kafka
Nenad Bogojevic
 
PDF
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Kai Wähner
 
PDF
Redpanda and ClickHouse
Altinity Ltd
 
PPTX
Kafka and event driven architecture -og yatra20
Vinay Kumar
 
PPTX
Kafka and event driven architecture -apacoug20
Vinay Kumar
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Leveraging the power of the unbundled database
Alex Silva
 
PPTX
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
PDF
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent
 
PDF
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
PPTX
Event Driven Architectures with Apache Kafka
Matt Masuda
 
PDF
Comparison of Current Service Mesh Architectures
Mirantis
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka-and-event-driven-architecture-OGYatra20.ppt
Inam Bukhary
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar
 
A Primer Towards Running Kafka on Top of Kubernetes.pdf
AvinashUpadhyaya3
 
Kafka aws
Ariel Moskovich
 
Building Data Streaming Platforms using OpenShift and Kafka
Nenad Bogojevic
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Kai Wähner
 
Redpanda and ClickHouse
Altinity Ltd
 
Kafka and event driven architecture -og yatra20
Vinay Kumar
 
Kafka and event driven architecture -apacoug20
Vinay Kumar
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Apache Kafka Introduction
Amita Mirajkar
 
Leveraging the power of the unbundled database
Alex Silva
 
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent
 
Event Driven Architectures with Apache Kafka
Matt Masuda
 
Comparison of Current Service Mesh Architectures
Mirantis
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
How ETL Control Logic Keeps Your Pipelines Safe and Reliable.pdf
Stryv Solutions Pvt. Ltd.
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of Artificial Intelligence (AI)
Mukul
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

  • 1. A Bridge over Troubled Water - Implementing Exactly-Once Semantics and Escaping Kafka Rebalance Storms Antonovsky Yulia
  • 4. © 2023 Akamai 4 About Me Senior Software Engineer II at Akamai Technologies since 2020 Big Data Engineering experience since 2016 Started career as student intern at SAP Labs Israel in 2007 yulia-antonovsky
  • 5. © 2023 Akamai 5 Agenda ➢ Introduction ➢ CSI Ingest architecture ➢ Managing Kafka Transactions ➢ Avoid Kafka endless rebalancing ➢ Q&A
  • 6. © 2023 Akamai 6 About Akamai Technologies Akamai Technologies is the largest content delivery network (CDN) services provider in the world that also offers cloud and security services. In numbers: ● 350K servers across the world ● 8B requests per day ● ~ 30% of the global internet traffic We power and protect life online
  • 7. © 2023 Akamai 7 About CSI Group (Cloud Security Intelligence) Our team is responsible for the ongoing development and maintenance of a platform designed to collect, analyze, and distill high-quality security intelligence information. We handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data events per day. CSI Cluster
  • 8. © 2023 Akamai 8 CSI Ingest Architecture
  • 9. © 2023 Akamai 9 Drill Down Standart iteration flow: 1. Consume kafka messages 2. Read files from Blob 3. Process the data 4. Write to Blob results 5. Produce kafka messages
  • 10. © 2023 Akamai 10 Guardians of the Data Just like the Guardians of the Galaxy protect the universe, we are dedicated to protecting the accuracy of our customers' data How can we prevent data loss or duplication when application pods are continuously scaled in and out to handle data traffic?
  • 11. © 2023 Akamai 11 Managing Kafka Transactions ● We actively manage partition offsets to ensure that we consume data from Kafka exactly once. ● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data even in the event of failures or retries. ● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all writes either succeed or fail together.
  • 12. © 2023 Akamai 12 KafkaTransactionManager ● It supports seamless processing of transactional data across one or more source and target topics. ● The component handles the entire process from message consumption to committing or aborting Kafka transactions. ● To simplify the use of Kafka transactions across all our applications, we developed a component called KafkaTransactionManager.
  • 14. © 2023 Akamai 14 KafkaTransactionManager API kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics, returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction to retrieve additional messages. kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target topics. This method can be called multiple times within the same transaction to send additional messages. kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure occurs, the abortTransaction API must be called to ensure that the transaction is rolled back. kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka producer client is closed, and a new one is created.
  • 15. © 2023 Akamai 15 Kafka Clients’ “Transactional” Configurations ● Kafka consumer client configurations: ○ enable.auto.commit = false ○ isolation.level = read_committed ● Kafka producer client configurations: ○ transactional.id = randomUUID() ○ transaction.timeout.ms - depends on application
  • 16. © 2023 Akamai 16 Avoid Kafka endless rebalancing Within a consumer group, Kafka changes the ownership of partition from one consumer to another at certain events. The process of changing partition ownership across the consumers is called a rebalance.
  • 17. © 2023 Akamai 17 What Triggers a Rebalancing? ● The topic partition or partition replica count changes ● Consumer group properties are changed ● Consumer joins or leaves a group Why it can rebalance forever? ● Networking issues ● System complexity ● Inappropriate configurations ● Scale up/down, k8s moves pods ● Application/pod restarts ● Not all pods start synchronously
  • 18. © 2023 Akamai 18 Kafka “Rebalance” Configurations All of the related configurations are Kafka consumer configurations. session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for a heartbeat signal from a consumer before removing it from the group. heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to the consumer coordinator. max.poll.interval.ms: This setting determines the maximum delay between invocations of poll() when using consumer group management. group.instance.id: A unique identifier provided by the end-user for the consumer instance. partition.assignment.strategy: A list of class names or class types, ordered by preference, of supported partition assignment strategies that the client will use to distribute partition ownership.
  • 19. © 2023 Akamai 19 Partition Assignment Strategy CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative rebalancing. Available since version 2.4. RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a contiguous range of partitions. RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion. StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as many existing partition assignments as possible.
  • 20. © 2023 Akamai 20 Kafka Rebalance Listener ConsumerPartitionAssignor is a high-level interface that allows you to implement your own custom partition assignment strategy. ● Rebalance listener can't prevent rebalancing but can minimize its impact ● Can only be triggered during polling ● In transactional iterations, it can save processing costs ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before and after the partition assignment.
  • 21. © 2023 Akamai 21 Summary ★ Manage consumed offsets manually when using Kafka's transactional API. ★ Disable auto commit, use read committed mode on consumer client config, and add transactional.id to producer config. ★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance. ★ Configure appropriate timeouts on consumer client and define group.instance.id, when possible, to skip Kafka rebalances. ★ Choose a partition assignment strategy carefully, and experiment with different strategies to determine the best fit.
  • 22. © 2023 Akamai 22 Q&A Thank you:) Feel free to reach me out yulia-antonovsky