Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

A Bridge over Troubled Water - Implementing
Exactly-Once Semantics and Escaping Kafka
Rebalance Storms
Antonovsky Yulia

© 2023 Akamai
4
About Me
Senior Software Engineer II at Akamai Technologies since 2020
Big Data Engineering experience since 2016
Started career as student intern at SAP Labs Israel in 2007
yulia-antonovsky

© 2023 Akamai
5
Agenda
➢ Introduction
➢ CSI Ingest architecture
➢ Managing Kafka Transactions
➢ Avoid Kafka endless rebalancing
➢ Q&A

© 2023 Akamai
6
About Akamai Technologies
Akamai Technologies is the largest content delivery network (CDN) services
provider in the world that also offers cloud and security services.
In numbers:
● 350K servers across the world
● 8B requests per day
● ~ 30% of the global internet traffic
We power and protect life online

© 2023 Akamai
7
About CSI Group (Cloud Security Intelligence)
Our team is responsible for the ongoing development and maintenance of a platform
designed to collect, analyze, and distill high-quality security intelligence information. We
handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data
events per day.
CSI Cluster

© 2023 Akamai
8
CSI Ingest Architecture

© 2023 Akamai
9
Drill Down
Standart iteration flow:
1. Consume kafka messages
2. Read files from Blob
3. Process the data
4. Write to Blob results
5. Produce kafka messages

© 2023 Akamai
10
Guardians of the Data
Just like the Guardians of the
Galaxy protect the universe, we
are dedicated to protecting the
accuracy of our customers' data
How can we prevent data loss or duplication when application pods are
continuously scaled in and out to handle data traffic?

© 2023 Akamai
11
Managing Kafka Transactions
● We actively manage partition offsets to ensure that we consume data from Kafka exactly
once.
● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data
even in the event of failures or retries.
● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all
writes either succeed or fail together.

© 2023 Akamai
12
KafkaTransactionManager
● It supports seamless processing of transactional data across one or more source and target
topics.
● The component handles the entire process from message consumption to committing or
aborting Kafka transactions.
● To simplify the use of Kafka transactions across all our applications, we developed a
component called KafkaTransactionManager.

© 2023 Akamai
13
KafkaTransactionManager API

© 2023 Akamai
14
KafkaTransactionManager API
kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets
kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics,
returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction
to retrieve additional messages.
kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target
topics. This method can be called multiple times within the same transaction to send additional messages.
kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated
consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure
occurs, the abortTransaction API must be called to ensure that the transaction is rolled back.
kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by
executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka
producer client is closed, and a new one is created.

© 2023 Akamai
15
Kafka Clients’ “Transactional” Configurations
● Kafka consumer client configurations:
○ enable.auto.commit = false
○ isolation.level = read_committed
● Kafka producer client configurations:
○ transactional.id = randomUUID()
○ transaction.timeout.ms - depends on application

© 2023 Akamai
16
Avoid Kafka endless rebalancing
Within a consumer group, Kafka changes the ownership of partition from one consumer to
another at certain events. The process of changing partition ownership across the
consumers is called a rebalance.

© 2023 Akamai
17
What Triggers a Rebalancing?
● The topic partition or partition replica count changes
● Consumer group properties are changed
● Consumer joins or leaves a group
Why it can rebalance forever?
● Networking issues
● System complexity
● Inappropriate configurations
● Scale up/down, k8s moves pods
● Application/pod restarts
● Not all pods start synchronously

© 2023 Akamai
18
Kafka “Rebalance” Configurations
All of the related configurations are Kafka consumer configurations.
session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for
a heartbeat signal from a consumer before removing it from the group.
heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to
the consumer coordinator.
max.poll.interval.ms: This setting determines the maximum delay between invocations of poll()
when using consumer group management.
group.instance.id: A unique identifier provided by the end-user for the consumer instance.
partition.assignment.strategy: A list of class names or class types, ordered by preference, of
supported partition assignment strategies that the client will use to distribute partition ownership.

© 2023 Akamai
19
Partition Assignment Strategy
CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative
rebalancing. Available since version 2.4.
RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a
contiguous range of partitions.
RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion.
StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as
many existing partition assignments as possible.

© 2023 Akamai
20
Kafka Rebalance Listener
ConsumerPartitionAssignor is a high-level interface that allows you to implement your own
custom partition assignment strategy.
● Rebalance listener can't prevent rebalancing but can minimize its impact
● Can only be triggered during polling
● In transactional iterations, it can save processing costs
ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before
and after the partition assignment.

© 2023 Akamai
21
Summary
★ Manage consumed offsets manually when using Kafka's transactional API.
★ Disable auto commit, use read committed mode on consumer client config,
and add transactional.id to producer config.
★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance.
★ Configure appropriate timeouts on consumer client and define
group.instance.id, when possible, to skip Kafka rebalances.
★ Choose a partition assignment strategy carefully, and experiment with
different strategies to determine the best fit.

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

More Related Content

What's hot (20)

Similar to Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky (20)

More from HostedbyConfluent (20)

Recently uploaded (20)

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky