SlideShare a Scribd company logo
Exactly-once Stream Processing
Matthias J. Sax, Software Engineer
Apache Kafka committer and PMC member
matthias@confluent.io | @MatthiasJSax
@MatthiasJSax
Exactly-once: Delivery vs Semantics
Exactly-once Delivery
• Academic distributed system problem:
• Can we send a message an ensure it’s delivered to the receiver exactly once?
• Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault)
• Provable not possible!
Deliver != Semantics
2
@MatthiasJSax
Take input record, process it, update result, and record progress.
No Error. No Problem.
What is Exactly-once Semantics About?
3
@MatthiasJSax
What happens if something goes wrong?
Error during read, processing, write, or record progress.
We retry!
But is it safe?
What is Exactly-once Semantics About?
4
@MatthiasJSax
5
Are retries safe? With exactly-once, yes!
Exactly-once is about masking errors via safe retries.
The result of an exactly-once retry,
is semantically the same as if no error had occurred.
What is Exactly-once Semantics About?
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
6
@MatthiasJSax
There is no* Write-only Exactly-once!
(*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
• Downstream read-only consumer!
8
@MatthiasJSax
There is NO Read-only Exactly-once!
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics.
Kafka for processing
• Pattern: Consume -> Process -> Produce
• Built-in exactly-once via Kafka Streams (or DIY).
• Also possible with external source/target system!
10
@MatthiasJSax
Let’s Break it Down
Steps in a Processing Pipeline
• Read input:
• Does not modify state; re-reading is always safe.
• Process data:
• Stateless re-processing (filter, map etc) is always safe.
• Stateful re-processing: need to roll-back state before we can retry.
• Update result:
• Need to “retract” (partial) results.
• Or: rely on idempotent updates. (There are dragons!)
• Record progress:
• Modifies state in the source system (or does it?)
11
@MatthiasJSax
Exactly-once
==
At-least-once + Idempotency
It depends…
@MatthiasJSax
Idempotent Updates (Internal State)?
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
13
Cnt: 73 Cnt: 74
73+1
Cnt: 74 Cnt: 75
74+1
Retry: L
@MatthiasJSax
Idempotent Updates? Maybe…
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
• Idempotency requires context agnostic state modifications, e.g., set a new address.
14
City: LA City: NY
Set “NY”
City: NY City: NY
Set “NY”
Retry: J
@MatthiasJSax
Idempotent Updates (External State)
The issue of time travel…
15
City: LA City: NY
Set “NY”
City: BO
Set “BO”
Read: NY Read: BO
Read: LA
@MatthiasJSax
Idempotent Updates (External State)
Retrying a sequence of updates:
16
City: BO City: NY
Set “NY”
City: BO
Set “BO”
Read: NY L
Read: BO J Read: BO J
@MatthiasJSax
Idempotency is not enough.
All State Changes must be Atomic!
@MatthiasJSax
All State Changes must be Atomic
What is ”state”?
• Internal processing state.
• External state, i.e., result state.
• External state, i.e., source progress.
Transactions for the rescue!
Do we want to (can we) do a cross-system distributed transaction?
Good news: we don’t have to…
18
@MatthiasJSax
Exactly-Once with Kafka and External Systems
19
Example: Downstream target RDBMS
(Async) offset update
(not part of the transaction)
Atomic write via
ACID transaction
State
Result
Offsets
@MatthiasJSax
Exactly-Once with Kafka and External Systems
20
Example: Downstream target RDBMS
State
Result
Offsets
Reset offsets
and retry
@MatthiasJSax
Kafka Connect (Part 1)
Exactly-once Sink
• Has “nothing” to do with Kafka:
• Kafka provides source system progress tracking via offsets.
• Connect provide API to fetch start offsets from target system.
• Depends on targe system properties / features.
• Each individual connector must implement it.
21
@MatthiasJSax
How does Kafka Tackle Exactly-once?
22
Kafka Transactions
Multi-partition/multi-topic atomic write:
0 0
0 0 0
1 1 1 1
2
2
2
3
4
3
1
2
t
1
-
p
0
t
1
-
p
1
t
2
-
p
0
t
2
-
p
1
t
2
-
p
2
2
3
@MatthiasJSax
How does Kafka Tackle Exactly-once?
23
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Exactly-Once with Kafka
24
Kafka as Sink
Requirement: ability to track source system progress.
result
state (via changelogs)
source progress (via custom metadata topic)
@MatthiasJSax
Kafka Connect (Part 2)
•
•
•
•
•
Exactly-once Source
• “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors”
• Tomorrow: 2pm
• Chris Egerton, Aiven
• KIP-618 (Apache Kafka 3.3):
• https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors
25
@MatthiasJSax
Kafka Streams
26
Kafka Transactions
Atomic read-process-write pattern:
@MatthiasJSax
Kafka Streams
27
__consumer_offsets
changelogs
result
Kafka Transactions
Multi-partition/multi-topic atomic write:
@MatthiasJSax
Kafka Streams
28
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.addOffsetsToTransaction(…);
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Kafka Streams
Single vs Multi-cluster
Kafka Streams (current) only works against a single broker cluster:
• Does not really matter. We still rely on the brokers as target system.
• Need source offsets but commit them via the producer.
• Single broker cluster only avoids “dual” commit of source offsets.
Supporting cross-cluster EOS with Kafka Streams is possible:
• Add custom metadata topic to targe cluster.
• Replace addOffsetsToTransaction() with send().
• Fetch consumer offset manually from metadata topic.
• Issues:
• EOS v2 implementation (producer per thread) not possible.
• Limited to single target cluster.
29
@MatthiasJSax
The Big Challenge
Error Handling in a (Distributed) Application
Kafka transaction allow to fence “zombie” producers.
Any EOS target system needs to support something similar (or rely on idempotency if possible).
Kafka Connect Sink Connectors:
• Idempotency or sink system fencing required—Connect framework cannot help at all.
Kafka Connect Source Connectors:
• Relies on producer fencing.
• Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation).
Kafka Streams:
• Relies on producer fencing (EOS v1) or consumer fencing (EOS v2).
• EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster.
30
@MatthiasJSax
What to do in Practice?
Publishing with producer-only app?
The important thing is to figure out where to resume on restart:
• Is there any “source progress” information you can store?
• You need to add a consumer to your app!
• On app restart:
• Initialize producer to fence potential zombie and to force any pending TX to complete.
• Use consumer (in read-committed mode) to inspect the target cluster’s data.
Reading with consumer-only app?
• If there is no target data system, only idempotency can help.
• With no target data system, everything is basically a side-effect.
31
@MatthiasJSax
Exactly-once Key Takeaways
(A) no producer-only EOS
(B) no consumer-only EOS
(C) read-process-write pattern
(1) need ability to track source system read progress
(2) require target system atomic write (plus fencing)
(3) source system progress is recorded in target system
Kafka built-in support via transactions + Zero coding with Kafka Streams
✅
@MatthiasJSax

More Related Content

What's hot (20)

PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Apples and Oranges - Comparing Kafka Streams and Flink with Bill Bejeck
HostedbyConfluent
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
PDF
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
PDF
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
PDF
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
PDF
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
PDF
When NOT to use Apache Kafka?
Kai Wähner
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PDF
Instana - ClickHouse presentation
Miel Donkers
 
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
PPTX
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 
PDF
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Kafka connect 101
Whiteklay
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
Kafka Connect and Streams (Concepts, Architecture, Features)
Kai Wähner
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Apples and Oranges - Comparing Kafka Streams and Flink with Bill Bejeck
HostedbyConfluent
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
confluent
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
Lessons from the Field: Applying Best Practices to Your Apache Spark Applicat...
Databricks
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
HostedbyConfluent
 
When NOT to use Apache Kafka?
Kai Wähner
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Instana - ClickHouse presentation
Miel Donkers
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 
Handle Large Messages In Apache Kafka
Jiangjie Qin
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Introduction to Kafka Streams
Guozhang Wang
 
Kafka connect 101
Whiteklay
 
Stream processing using Kafka
Knoldus Inc.
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kai Wähner
 

Similar to Exactly-once Stream Processing Done Right with Matthias J Sax (20)

PDF
Exactly-once Semantics in Apache Kafka
confluent
 
PDF
Transactions in Action: the Story of Exactly Once in Apache Kafka
HostedbyConfluent
 
PPTX
Kafka eos
Nitin Kumar
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
PDF
Structured Streaming with Kafka
datamantra
 
PDF
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Matthias J. Sax
 
PDF
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
PDF
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
PPTX
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
PDF
Data Streaming in Kafka
SilviuMarcu1
 
PDF
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
PPTX
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
PDF
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Databricks
 
PPTX
Building real time Data Pipeline using Spark Streaming
datamantra
 
PDF
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
PPTX
Kafka reliability velocity 17
Gwen (Chen) Shapira
 
PDF
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
confluent
 
PDF
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Exactly-once Semantics in Apache Kafka
confluent
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
HostedbyConfluent
 
Kafka eos
Nitin Kumar
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Structured Streaming with Kafka
datamantra
 
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Matthias J. Sax
 
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Data Streaming in Kafka
SilviuMarcu1
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Yahoo Developer Network
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Databricks
 
Building real time Data Pipeline using Spark Streaming
datamantra
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
Kafka reliability velocity 17
Gwen (Chen) Shapira
 
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
confluent
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
PDF
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
PDF
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
PDF
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
PDF
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
PDF
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
PDF
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
PDF
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
PDF
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
PDF
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
HostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
HostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
HostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
HostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
HostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
HostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
HostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
HostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
HostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
HostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
HostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
HostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
HostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
HostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
HostedbyConfluent
 
Ad

Recently uploaded (20)

PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 

Exactly-once Stream Processing Done Right with Matthias J Sax

  • 1. Exactly-once Stream Processing Matthias J. Sax, Software Engineer Apache Kafka committer and PMC member [email protected] | @MatthiasJSax
  • 2. @MatthiasJSax Exactly-once: Delivery vs Semantics Exactly-once Delivery • Academic distributed system problem: • Can we send a message an ensure it’s delivered to the receiver exactly once? • Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault) • Provable not possible! Deliver != Semantics 2
  • 3. @MatthiasJSax Take input record, process it, update result, and record progress. No Error. No Problem. What is Exactly-once Semantics About? 3
  • 4. @MatthiasJSax What happens if something goes wrong? Error during read, processing, write, or record progress. We retry! But is it safe? What is Exactly-once Semantics About? 4
  • 5. @MatthiasJSax 5 Are retries safe? With exactly-once, yes! Exactly-once is about masking errors via safe retries. The result of an exactly-once retry, is semantically the same as if no error had occurred. What is Exactly-once Semantics About?
  • 6. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! 6
  • 7. @MatthiasJSax There is no* Write-only Exactly-once! (*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
  • 8. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! • Downstream read-only consumer! 8
  • 9. @MatthiasJSax There is NO Read-only Exactly-once!
  • 10. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics. Kafka for processing • Pattern: Consume -> Process -> Produce • Built-in exactly-once via Kafka Streams (or DIY). • Also possible with external source/target system! 10
  • 11. @MatthiasJSax Let’s Break it Down Steps in a Processing Pipeline • Read input: • Does not modify state; re-reading is always safe. • Process data: • Stateless re-processing (filter, map etc) is always safe. • Stateful re-processing: need to roll-back state before we can retry. • Update result: • Need to “retract” (partial) results. • Or: rely on idempotent updates. (There are dragons!) • Record progress: • Modifies state in the source system (or does it?) 11
  • 13. @MatthiasJSax Idempotent Updates (Internal State)? Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! 13 Cnt: 73 Cnt: 74 73+1 Cnt: 74 Cnt: 75 74+1 Retry: L
  • 14. @MatthiasJSax Idempotent Updates? Maybe… Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! • Idempotency requires context agnostic state modifications, e.g., set a new address. 14 City: LA City: NY Set “NY” City: NY City: NY Set “NY” Retry: J
  • 15. @MatthiasJSax Idempotent Updates (External State) The issue of time travel… 15 City: LA City: NY Set “NY” City: BO Set “BO” Read: NY Read: BO Read: LA
  • 16. @MatthiasJSax Idempotent Updates (External State) Retrying a sequence of updates: 16 City: BO City: NY Set “NY” City: BO Set “BO” Read: NY L Read: BO J Read: BO J
  • 17. @MatthiasJSax Idempotency is not enough. All State Changes must be Atomic!
  • 18. @MatthiasJSax All State Changes must be Atomic What is ”state”? • Internal processing state. • External state, i.e., result state. • External state, i.e., source progress. Transactions for the rescue! Do we want to (can we) do a cross-system distributed transaction? Good news: we don’t have to… 18
  • 19. @MatthiasJSax Exactly-Once with Kafka and External Systems 19 Example: Downstream target RDBMS (Async) offset update (not part of the transaction) Atomic write via ACID transaction State Result Offsets
  • 20. @MatthiasJSax Exactly-Once with Kafka and External Systems 20 Example: Downstream target RDBMS State Result Offsets Reset offsets and retry
  • 21. @MatthiasJSax Kafka Connect (Part 1) Exactly-once Sink • Has “nothing” to do with Kafka: • Kafka provides source system progress tracking via offsets. • Connect provide API to fetch start offsets from target system. • Depends on targe system properties / features. • Each individual connector must implement it. 21
  • 22. @MatthiasJSax How does Kafka Tackle Exactly-once? 22 Kafka Transactions Multi-partition/multi-topic atomic write: 0 0 0 0 0 1 1 1 1 2 2 2 3 4 3 1 2 t 1 - p 0 t 1 - p 1 t 2 - p 0 t 2 - p 1 t 2 - p 2 2 3
  • 23. @MatthiasJSax How does Kafka Tackle Exactly-once? 23 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.commitTransaction(); // or .abortTransaction()
  • 24. @MatthiasJSax Exactly-Once with Kafka 24 Kafka as Sink Requirement: ability to track source system progress. result state (via changelogs) source progress (via custom metadata topic)
  • 25. @MatthiasJSax Kafka Connect (Part 2) • • • • • Exactly-once Source • “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors” • Tomorrow: 2pm • Chris Egerton, Aiven • KIP-618 (Apache Kafka 3.3): • https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors 25
  • 28. @MatthiasJSax Kafka Streams 28 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.addOffsetsToTransaction(…); producer.commitTransaction(); // or .abortTransaction()
  • 29. @MatthiasJSax Kafka Streams Single vs Multi-cluster Kafka Streams (current) only works against a single broker cluster: • Does not really matter. We still rely on the brokers as target system. • Need source offsets but commit them via the producer. • Single broker cluster only avoids “dual” commit of source offsets. Supporting cross-cluster EOS with Kafka Streams is possible: • Add custom metadata topic to targe cluster. • Replace addOffsetsToTransaction() with send(). • Fetch consumer offset manually from metadata topic. • Issues: • EOS v2 implementation (producer per thread) not possible. • Limited to single target cluster. 29
  • 30. @MatthiasJSax The Big Challenge Error Handling in a (Distributed) Application Kafka transaction allow to fence “zombie” producers. Any EOS target system needs to support something similar (or rely on idempotency if possible). Kafka Connect Sink Connectors: • Idempotency or sink system fencing required—Connect framework cannot help at all. Kafka Connect Source Connectors: • Relies on producer fencing. • Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation). Kafka Streams: • Relies on producer fencing (EOS v1) or consumer fencing (EOS v2). • EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster. 30
  • 31. @MatthiasJSax What to do in Practice? Publishing with producer-only app? The important thing is to figure out where to resume on restart: • Is there any “source progress” information you can store? • You need to add a consumer to your app! • On app restart: • Initialize producer to fence potential zombie and to force any pending TX to complete. • Use consumer (in read-committed mode) to inspect the target cluster’s data. Reading with consumer-only app? • If there is no target data system, only idempotency can help. • With no target data system, everything is basically a side-effect. 31
  • 32. @MatthiasJSax Exactly-once Key Takeaways (A) no producer-only EOS (B) no consumer-only EOS (C) read-process-write pattern (1) need ability to track source system read progress (2) require target system atomic write (plus fencing) (3) source system progress is recorded in target system Kafka built-in support via transactions + Zero coding with Kafka Streams ✅