Apache Kafka:
New Features That You
Might Not Know About
Yaroslav Tkachenko
Software Architect at Activision
Apache Kafka Versions
0.10.0.0 0.11.0.0 1.0.0 2.0.0
0.10.0.1
0.10.2.2
0.11.0.1
0.11.0.3
1.0.1
1.1.1
2.0.1
2.1.0
May 2016 June 2017 November 2017 July 2018
0.11: New Message Format
Record Batch:
...
magic: 2
…
attributes:
…
bit 4: isTransactional
…
producerId: int64
producerEpoch: int16
records: [Record]
Message Format v2
Record:
...
key: byte[]
value: byte[]
headers: [Header]
Header:
...
headerKey: String
value: byte[]
0.11: Headers
Message Headers
public interface Header {
String key();
byte[] value();
}
List<Header> headers = Arrays.asList(
new RecordHeader("hkey1", "hvalue1".getBytes()),
new RecordHeader("hkey2", "hvalue2".getBytes())
);
new ProducerRecord<>("topic", 0, "key", "value", headers);
Pros
• No need to deserialize the whole
message payload for routing /
filtering use-cases
Cons
• Harder to save the headers
together with the payload when
archiving, persisting to data
stores or integrating with 3rd
party systems
Message Headers
Message Headers
0.11: Transactions
Transactions
• Atomic writes to multiple Kafka topics and partitions
• Offset commits happen in the same transaction
• transactional.id + epoch for every producer
• Consumers must use “read_committed” isolation level for consuming
only committed transactional data
Transactions
KafkaProducer producer = ...
producer.initTransactions();
KafkaConsumer consumer = ...
consumer.subscribe("inputTopic"));
ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
try {
producer.beginTransaction();
for (ConsumerRecord record : records) {
producer.send(processAndProduceRecord("outputTopic", record));
}
producer.sendOffsetsToTransaction(currentOffsets(consumer), groupId);
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
}
Transactions
Transactions
In practice, for a producer producing 1KB records at maximum
throughput, committing messages every 100ms results in only a
3% degradation in throughput.
https://www.confluent.io/blog/transactions-apache-kafka/
0.11: Exactly-Once
Delivery
Exactly-Once: Why
is it so Hard?
At most once
• May or may not be
received
• No duplicates
• Probably missing
data
Delivery Guarantees
Exactly once
• Delivery guaranteed
• No duplicates
• No missing data
At least once
• Delivery guaranteed
• Possible duplicates
• No missing data
Idempotent producer writesTransactions API
Atomic writes and reads
Transactions Idempotence
Idempotence
• Unique producer ID is assigned to each producer
• Monotonically increasing sequence number is generated for every
topic/partition write
• Broker persists and validates sequence numbers:
• lower number → duplicate, reject
• higher number → out-of-sequence error, reject
• exactly one greater than the last → allow
Enabling Exactly-Once in
Kafka Streams?
Just set “processing.guarantee” to “exactly_once”. That’s it!
Don’t need to think about checkpointing and related challenges (like in some
other frameworks...)
1.1: Controller
Improvements
Controller Improvements
• One Controller per cluster
• Responsible for state management of partitions and replicas
• Communicates with Zookeeper
Updating partition leaders in batches
during the controlled shutdown
Zookeeper Asynchronous API is used
during the controlled shutdown and
controller failover
Controlled shutdown time: 3 seconds
Updating partition leaders one by
one, sequentially during the
controlled shutdown
Zookeeper Synchronous API is
used during the controlled
shutdown and controller failover
Controlled shutdown time: 6.5
minutes
Before 1.1.0 After 1.1.0
2.0: Kafka Streams
Improvements
Kafka Streams Improvements
• Message header support in the Processor API
• TopicNameExtractor for dynamic routing
• kafka-streams-testutil helper for unit-testing
• Scala wrapper for the Streams DSL
Thanks!
@sap1ens

More Related Content

PDF
Multiplatform Apps with Spring, Kotlin, and RSocket
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PDF
Introduction to data flow management using apache nifi
PDF
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
PDF
Syslog Protocols
PDF
Dataflow with Apache NiFi
PDF
Apache Flink 101 - the rise of stream processing and beyond
PPTX
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Multiplatform Apps with Spring, Kotlin, and RSocket
Building a fully managed stream processing platform on Flink at scale for Lin...
Introduction to data flow management using apache nifi
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Syslog Protocols
Dataflow with Apache NiFi
Apache Flink 101 - the rise of stream processing and beyond
Thrift vs Protocol Buffers vs Avro - Biased Comparison

What's hot (20)

PDF
Kea DHCP – the new open source DHCP server from ISC
PDF
Uber Business Metrics Generation and Management Through Apache Flink
PDF
The linux networking architecture
PDF
Hunting for Evil with the Elastic Stack
PPTX
Hybrid automation framework
PPTX
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
PDF
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
ODP
Introduction to Kafka connect
PPTX
Apache Flink and what it is used for
PPTX
Zookeeper Tutorial for beginners
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
PPTX
Boost your productivity with Scala tooling!
PPT
Scala functions
PDF
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
PDF
Embedded Android : System Development - Part III (Audio / Video HAL)
PPTX
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
PDF
DataPower Security Hardening
PDF
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
PDF
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
Kea DHCP – the new open source DHCP server from ISC
Uber Business Metrics Generation and Management Through Apache Flink
The linux networking architecture
Hunting for Evil with the Elastic Stack
Hybrid automation framework
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Introduction to Kafka connect
Apache Flink and what it is used for
Zookeeper Tutorial for beginners
Flexible and Real-Time Stream Processing with Apache Flink
Boost your productivity with Scala tooling!
Scala functions
Flink 2.0: Navigating the Future of Unified Stream and Batch Processing
Embedded Android : System Development - Part III (Audio / Video HAL)
Ibm spectrum scale fundamentals workshop for americas part 5 ess gnr-usecases...
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
DataPower Security Hardening
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
XPDDS19 Keynote: Xen Dom0-less - Stefano Stabellini, Principal Engineer, Xilinx
Ad

Similar to Apache Kafka: New Features That You Might Not Know About (20)

PDF
Akka in Production - ScalaDays 2015
PDF
Logging for Production Systems in The Container Era
PPTX
Kafka overview v0.1
PDF
Treasure Data Summer Internship 2016
PDF
Apache Kafka Women Who Code Meetup
PDF
Apache Kafka - From zero to hero
PDF
Kafka zero to hero
PPTX
Citi TechTalk Session 2: Kafka Deep Dive
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
PDF
Common issues with Apache Kafka® Producer
PDF
Apache Kafka - Scalable Message Processing and more!
PDF
ログ収集プラットフォーム開発におけるElasticsearchの運用
PDF
Exactly-once Semantics in Apache Kafka
PPTX
Singer, Pinterest's Logging Infrastructure
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
PPTX
Past, Present, and Future of Apache Storm
PPTX
From a kafkaesque story to The Promised Land
PDF
Scaling big with Apache Kafka
PDF
Transaction preview of Apache Pulsar
PDF
SFBigAnalytics_20190724: Monitor kafka like a Pro
Akka in Production - ScalaDays 2015
Logging for Production Systems in The Container Era
Kafka overview v0.1
Treasure Data Summer Internship 2016
Apache Kafka Women Who Code Meetup
Apache Kafka - From zero to hero
Kafka zero to hero
Citi TechTalk Session 2: Kafka Deep Dive
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Common issues with Apache Kafka® Producer
Apache Kafka - Scalable Message Processing and more!
ログ収集プラットフォーム開発におけるElasticsearchの運用
Exactly-once Semantics in Apache Kafka
Singer, Pinterest's Logging Infrastructure
From a Kafkaesque Story to The Promised Land at LivePerson
Past, Present, and Future of Apache Storm
From a kafkaesque story to The Promised Land
Scaling big with Apache Kafka
Transaction preview of Apache Pulsar
SFBigAnalytics_20190724: Monitor kafka like a Pro
Ad

More from Yaroslav Tkachenko (18)

PDF
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
PDF
Streaming SQL for Data Engineers: The Next Big Thing?
PDF
Apache Flink Adoption at Shopify
PDF
Storing State Forever: Why It Can Be Good For Your Analytics
PDF
It's Time To Stop Using Lambda Architecture
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
PDF
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
PDF
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
PPTX
10 tips for making Bash a sane programming language
PDF
Actors or Not: Async Event Architectures
PDF
Kafka Streams: the easiest way to start with stream processing
PDF
Building Stateful Microservices With Akka
PDF
Querying Data Pipeline with AWS Athena
PPTX
Akka Microservices Architecture And Design
PDF
Why Actor-Based Systems Are The Best For Microservices
PPTX
Why actor-based systems are the best for microservices
PPTX
Building Eventing Systems for Microservice Architecture
PPTX
Быстрая и безболезненная разработка клиентской части веб-приложений
Dynamic Change Data Capture with Flink CDC and Consistent Hashing
Streaming SQL for Data Engineers: The Next Big Thing?
Apache Flink Adoption at Shopify
Storing State Forever: Why It Can Be Good For Your Analytics
It's Time To Stop Using Lambda Architecture
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Building Scalable and Extendable Data Pipeline for Call of Duty Games: Lesson...
Designing Scalable and Extendable Data Pipeline for Call Of Duty Games
10 tips for making Bash a sane programming language
Actors or Not: Async Event Architectures
Kafka Streams: the easiest way to start with stream processing
Building Stateful Microservices With Akka
Querying Data Pipeline with AWS Athena
Akka Microservices Architecture And Design
Why Actor-Based Systems Are The Best For Microservices
Why actor-based systems are the best for microservices
Building Eventing Systems for Microservice Architecture
Быстрая и безболезненная разработка клиентской части веб-приложений

Recently uploaded (20)

PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PDF
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
PPTX
Trending Python Topics for Data Visualization in 2025
PPTX
Computer Software - Technology and Livelihood Education
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
AI Guide for Business Growth - Arna Softech
PPTX
CNN LeNet5 Architecture: Neural Networks
PDF
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
PPTX
Introduction to Windows Operating System
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
iTop VPN Crack Latest Version Full Key 2025
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PDF
Website Design Services for Small Businesses.pdf
PDF
Salesforce Agentforce AI Implementation.pdf
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Wondershare Recoverit Full Crack New Version (Latest 2025)
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
Why Generative AI is the Future of Content, Code & Creativity?
DNT Brochure 2025 – ISV Solutions @ D365
Top 10 Software Development Trends to Watch in 2025 🚀.pdf
Trending Python Topics for Data Visualization in 2025
Computer Software - Technology and Livelihood Education
Advanced SystemCare Ultimate Crack + Portable (2025)
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
AI Guide for Business Growth - Arna Softech
CNN LeNet5 Architecture: Neural Networks
How to Make Money in the Metaverse_ Top Strategies for Beginners.pdf
Introduction to Windows Operating System
GSA Content Generator Crack (2025 Latest)
iTop VPN Crack Latest Version Full Key 2025
Topaz Photo AI Crack New Download (Latest 2025)
Website Design Services for Small Businesses.pdf
Salesforce Agentforce AI Implementation.pdf

Apache Kafka: New Features That You Might Not Know About

  • 1. Apache Kafka: New Features That You Might Not Know About Yaroslav Tkachenko Software Architect at Activision
  • 2. Apache Kafka Versions 0.10.0.0 0.11.0.0 1.0.0 2.0.0 0.10.0.1 0.10.2.2 0.11.0.1 0.11.0.3 1.0.1 1.1.1 2.0.1 2.1.0 May 2016 June 2017 November 2017 July 2018
  • 4. Record Batch: ... magic: 2 … attributes: … bit 4: isTransactional … producerId: int64 producerEpoch: int16 records: [Record] Message Format v2 Record: ... key: byte[] value: byte[] headers: [Header] Header: ... headerKey: String value: byte[]
  • 6. Message Headers public interface Header { String key(); byte[] value(); } List<Header> headers = Arrays.asList( new RecordHeader("hkey1", "hvalue1".getBytes()), new RecordHeader("hkey2", "hvalue2".getBytes()) ); new ProducerRecord<>("topic", 0, "key", "value", headers);
  • 7. Pros • No need to deserialize the whole message payload for routing / filtering use-cases Cons • Harder to save the headers together with the payload when archiving, persisting to data stores or integrating with 3rd party systems Message Headers
  • 10. Transactions • Atomic writes to multiple Kafka topics and partitions • Offset commits happen in the same transaction • transactional.id + epoch for every producer • Consumers must use “read_committed” isolation level for consuming only committed transactional data
  • 11. Transactions KafkaProducer producer = ... producer.initTransactions(); KafkaConsumer consumer = ... consumer.subscribe("inputTopic")); ConsumerRecords records = consumer.poll(Long.MAX_VALUE); try { producer.beginTransaction(); for (ConsumerRecord record : records) { producer.send(processAndProduceRecord("outputTopic", record)); } producer.sendOffsetsToTransaction(currentOffsets(consumer), groupId); producer.commitTransaction(); } catch (Exception e) { producer.abortTransaction(); }
  • 13. Transactions In practice, for a producer producing 1KB records at maximum throughput, committing messages every 100ms results in only a 3% degradation in throughput. https://www.confluent.io/blog/transactions-apache-kafka/
  • 16. At most once • May or may not be received • No duplicates • Probably missing data Delivery Guarantees Exactly once • Delivery guaranteed • No duplicates • No missing data At least once • Delivery guaranteed • Possible duplicates • No missing data
  • 17. Idempotent producer writesTransactions API Atomic writes and reads Transactions Idempotence
  • 18. Idempotence • Unique producer ID is assigned to each producer • Monotonically increasing sequence number is generated for every topic/partition write • Broker persists and validates sequence numbers: • lower number → duplicate, reject • higher number → out-of-sequence error, reject • exactly one greater than the last → allow
  • 19. Enabling Exactly-Once in Kafka Streams? Just set “processing.guarantee” to “exactly_once”. That’s it! Don’t need to think about checkpointing and related challenges (like in some other frameworks...)
  • 21. Controller Improvements • One Controller per cluster • Responsible for state management of partitions and replicas • Communicates with Zookeeper
  • 22. Updating partition leaders in batches during the controlled shutdown Zookeeper Asynchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 3 seconds Updating partition leaders one by one, sequentially during the controlled shutdown Zookeeper Synchronous API is used during the controlled shutdown and controller failover Controlled shutdown time: 6.5 minutes Before 1.1.0 After 1.1.0
  • 24. Kafka Streams Improvements • Message header support in the Processor API • TopicNameExtractor for dynamic routing • kafka-streams-testutil helper for unit-testing • Scala wrapper for the Streams DSL