SlideShare a Scribd company logo
1
Introducing Exactly Once
Semantics in Apache Kafka™
Apurva Mehta, Software Engineer,
Gehrig Kunz, Technical Product Marketing Manager
2
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka: Is it Practical?
• Next Steps
3
Exactly Once Semantics is a hard problem
4
An overview of messaging semantics
• At-most once
• At-least once
• Exactly-once
5
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
6
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
In other words: make stream processing easy,
simple, and reliable enough for everyone.
7
Apache Kafka’s existing semantics
At Least Once
8
Kafka’s Existing Semantics
9
Kafka’s Existing Semantics
10
Kafka’s Existing Semantics
11
Kafka’s Existing Semantics
12
Kafka’s Existing Semantics
13
Kafka’s Existing Semantics
14
What do we do now???
Kafka’s Existing Semantics
15
Kafka’s Existing Semantics: At Least Once
16
Kafka’s Existing Semantics: At Least Once
17
Kafka’s Existing Semantics: At Least Once
18
Why are duplicates introduced?
Various failures must be handled correctly:
• Broker can fail
• Producer-to-Broker RPC can fail
• Producer or Consumer client can fail
19
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates and headaches.
20
The age old engineering question
Before we make this work, are we sure we should?
21
KafkaCash: A Peer to Peer Lending App
A peer-to-peer lending platform.
22
Help Bob reach $1000, send him $10
23
KafkaCash, powered by Kafka
24
Offset commits
25
Reprocessed transfer, eek!
26
Lost money! Eek eek!
27
How did Kafka add exactly once semantics?
28
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer: Exactly-once, in-order, delivery
per partition.
• Transactions: Atomic writes across partitions.
• Exactly-once stream processing across read-process-
write tasks.
29
Part 1/3 : Idempotent Producer
Exactly-once, in-order, delivery per partition
30
Idempotent Producer Semantics
A single --successful!-- producer.send will result in
exactly one copy of the message in the log in all
circumstances.
31
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 0 (preferably MAX_INT)
32
The idempotent producer
33
The idempotent producer
34
The idempotent producer
35
The idempotent producer
36
The idempotent producer
37
The idempotent producer
38
The idempotent producer
39
The idempotent producer
40
TL;DR: idempotent producer
• Works transparently -- only one config change.
• Sequence numbers and producer ids are in the log.
• Resilient to broker failures, producer retries, etc.
41
Part 2/3 : Transactions
Atomic writes across multiple partitions.
42
Transactions semantics
• Atomic writes across multiple partitions.
• All messages in a transaction are made visible together,
or none are.
• Consumers must be configured to skip uncommitted
messages.
43
Producer config for transactions
• transactional.id = ‘some string’
• Typically based on the partition identifier in a partitioned,
stateful, app.
• Enables transaction recovery across producer sessions.
44
The transaction API
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
45
Transactions
46
1. Initialize the producer
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
47
Initializing ‘transactions’
48
2. Begin transactions and send data
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
49
Transactional sends
50
Transactional sends
51
3. Commit transaction
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}
52
Commit
53
Commit
54
Commit
55
Success!
56
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”
57
What do you get with isolation levels?
• read_committed: consumers read to the point where there
are no open transactions.
• read_uncommitted: will read everything.
• Messages read in offset order.
58
TL;DR: Transactions
• Atomic, multi-partition, writes.
• Use the new producer APIs for transactions.
• Consumers can filter out uncommitted or aborted
transactional messages.
59
Part 3/3 : Stream Processing
Stream Processing with
Exactly Once Semantics
60
Streams config
• processing.mode = “exactly_once”
61
End-to-end exactly-once semantics
• The read-process-write operation is atomic.
• Thus streams tasks produce valid answers even when
failures happen.
62
Back to KafkaCash
63
Exactly Once Semantics in Kafka
Is it practical?
64
Performance boost for Apache Kafka 0.11!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Details: https://bit.ly/kafka-eos-perf
65
Gains due to more efficient message format
66
What about the idempotent producer and transactions?
• Transactions: 3-5% overhead for 100ms transactions, 1KB
messages.
• Longer transactions and better batching result in better
performance.
• 20% overhead relative to at-most once delivery without
ordering guarantees.
• Idempotent producer alone has negligible overhead.
67
Putting it together
• We talked through an idempotent producer
• How we added transactions with atomic writes
• The impact it has on stream processing
68
When is it available?
Available to use in Kafka 0.11, June 2017.
69
Where we’ve come
2007
High throughput
messaging broker
2008
Highly available
replicated log 2012
Top Level
Apache Project
2016
Streams API
Connect API
2017
Exactly Once
Semantics
70
San Francisco
August 28, 2017
Organized by Confluent
71
What’s next for you
slackpass.io/
confluentcommunity
v
Try it
v v
Join the Community Let us know what
you think
@ConfluentDownload Confluent
Open Source
72
Thank You!

More Related Content

What's hot (20)

PDF
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
confluent
 
PDF
Integrating Apache Kafka Into Your Environment
confluent
 
PDF
Kafka Streams State Stores Being Persistent
confluent
 
PPTX
kafka
Amikam Snir
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PDF
Kafka streams windowing behind the curtain
confluent
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PDF
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Common issues with Apache Kafka® Producer
confluent
 
PDF
Securing Kafka
confluent
 
Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics
confluent
 
Integrating Apache Kafka Into Your Environment
confluent
 
Kafka Streams State Stores Being Persistent
confluent
 
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Stream processing using Kafka
Knoldus Inc.
 
Kafka streams windowing behind the curtain
confluent
 
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Apache Kafka Introduction
Amita Mirajkar
 
Kafka 101 and Developer Best Practices
confluent
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Performance Monitoring: Understanding Your Scylla Cluster
ScyllaDB
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Introduction to Kafka Streams
Guozhang Wang
 
Common issues with Apache Kafka® Producer
confluent
 
Securing Kafka
confluent
 

Viewers also liked (15)

PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PDF
Putting the Micro into Microservices with Stateful Stream Processing
confluent
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
Real-world Streaming Architectures
confluent
 
PDF
Reliability Guarantees for Apache Kafka
confluent
 
PDF
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
PDF
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
confluent
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PDF
Building Microservices with Apache Kafka
confluent
 
PDF
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
PDF
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
confluent
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Putting the Micro into Microservices with Stateful Stream Processing
confluent
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
Real-world Streaming Architectures
confluent
 
Reliability Guarantees for Apache Kafka
confluent
 
Kafka Summit NYC 2017 - Running Hundreds of Kafka Clusters with 5 People
confluent
 
Kafka Summit NYC 2017 - Building Advanced Streaming Applications using the La...
confluent
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Building Microservices with Apache Kafka
confluent
 
Common Patterns of Multi Data-Center Architectures with Apache Kafka
confluent
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Metrics Are Not Enough: Monitoring Apache Kafka and Streaming Applications
confluent
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Exactly-once Semantics in Apache Kafka (20)

PDF
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
PDF
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
PPTX
Kafka eos
Nitin Kumar
 
PDF
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
PPTX
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
PDF
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Matthias J. Sax
 
PDF
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Databricks
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
PDF
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
PDF
Transactions in Action: the Story of Exactly Once in Apache Kafka
HostedbyConfluent
 
PDF
Apache Kafka: New Features That You Might Not Know About
Yaroslav Tkachenko
 
PDF
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
DOCX
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
PDF
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
PPTX
Kafka_Transactions.pptx
Dalibor Blazevic
 
PDF
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
PPTX
Webinar patterns anti patterns
confluent
 
PDF
Scaling big with Apache Kafka
Nikolay Stoitsev
 
PDF
Building Microservices with Apache Kafka by Colin McCabe
Data Con LA
 
PDF
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
TDEA 2018 Kafka EOS (Exactly-once)
Erhwen Kuo
 
Exactly-once Stream Processing Done Right with Matthias J Sax
HostedbyConfluent
 
Kafka eos
Nitin Kumar
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
confluent
 
Introducing Exactly Once Semantics To Apache Kafka
Apurva Mehta
 
Building Stream Processing Applications with Apache Kafka's Exactly-Once Proc...
Matthias J. Sax
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Databricks
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
HostedbyConfluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Transactions in Action: the Story of Exactly Once in Apache Kafka
HostedbyConfluent
 
Apache Kafka: New Features That You Might Not Know About
Yaroslav Tkachenko
 
Apache Kafka 0.11 の Exactly Once Semantics
Yoshiyasu SAEKI
 
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
confluent
 
Kafka_Transactions.pptx
Dalibor Blazevic
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Webinar patterns anti patterns
confluent
 
Scaling big with Apache Kafka
Nikolay Stoitsev
 
Building Microservices with Apache Kafka by Colin McCabe
Data Con LA
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
confluent
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 

Recently uploaded (20)

PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Import Data Form Excel to Tally Services
Tally xperts
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 

Exactly-once Semantics in Apache Kafka