SlideShare a Scribd company logo
v = 346,39 m/s
v = 346,39 m/s Speed of sound at 25 °C
v = 349,29 m/s Speed of sound at 30 °C
front to audience: 2 m
latency: 5.7 ms
latency (back of room): 15 ms
#atix #streamingworkshop
v = 346,39 m/s Speed of sound at 25 °C
v = 349,29 m/s Speed of sound at 30 °C
front to audience: 2 m
latency: 5.7 ms
latency (back of room): 15 ms
#atix #streamingworkshop
v = 346,39 m/s Speed of sound at 25 °C
v = 349,29 m/s Speed of sound at 30 °C
front to audience: 2 m
latency: 5.7 ms
latency (back of room): 15 ms
#atix #streamingworkshop
v = 346,39 m/s Speed of sound at 25 °C
v = 349,29 m/s Speed of sound at 30 °C
front to audience: 2 m
latency: 5.7 ms
latency (back of room): 15 ms
#atix #streamingworkshop
v = 346,39 m/s Speed of sound at 25 °C
v = 349,29 m/s Speed of sound at 30 °C
front to audience: 2 m
latency: 5.7 ms
latency (back of room): 15 ms
#atix #streamingworkshop
take home message: I am faster than my ISP!
Source: speedcheck.org
#atix #streamingworkshop
How to tune Kafka for
production
Dr. Bernhard Hopfenmüller
24. Juli 2019
#atix #streamingworkshop
What do you need?
Latency Throughput
Durability Availability
???
IoT
#atix #streamingworkshop
whoami
Bernhard Hopfenmüller
Senior IT Consultant @ ATIX AG
current latency 8 ms
hopfenmueller@atix.de
github.com/Fobhep
Twitter: @fobhep
IRC: Fobhep
#atix #streamingworkshop
Throughput
#atix #streamingworkshop
Throughput: Batching
increase batch
size
increase linger
time
#atix #streamingworkshop
Throughput: Compression
Compress FULL
batches
lz4, snappy, zstd, gzip
be consistent - avoid
recompression!!
#atix #streamingworkshop
Throughput: Replication Guarantees
did my message
arrive?
acks=1 (default)
#atix #streamingworkshop
Throughput: Memory
Increase producer
buffer memory
Increase consumer
fetch size
#atix #streamingworkshop
Throughput: Consumer Groups
Parallelize data
procession
#atix #streamingworkshop
Throughput: JVM
-Xms6g -Xmx6g -XX:MetaspaceSize=96m -XX:+UseG1GC
-XX:MaxGCPauseMillis=20
-XX:InitiatingHeapOccupancyPercent=35
-XX:G1HeapRegionSize=16M
-XX:MinMetaspaceFreeRatio=50
-XX:MaxMetaspaceFreeRatio=80
Too long JVM GC pause time
could cause
Zookeeper session timeout
impact throughput
#atix #streamingworkshop
Throughput: Partitions
More Partition == Higher
Troughput?!
yes, but!
More Open File Handlers
#atix #streamingworkshop
Throughput: Partitions
More Partition == Higher
Troughput?!
yes, but!
More Open File Handlers
More Memory needed for
Producer and Consumer
#atix #streamingworkshop
Throughput: Summary
Producer:
batch.size: 100000 – 200000 (default 16384)
linger.ms: 10 – 100 (default 0)
compression.type: lz4 (default none, i.e., no compression)
acks: 1 (default 1)
buffer.memory: increase according to partitions (default 33554432)
Consumer:
fetch.min.bytes: ~100000 (default 1)
#atix #streamingworkshop
Latency
#atix #streamingworkshop
Latency: Batching
small batches!
linger.ms=0
not! no
batching
#atix #streamingworkshop
Latency: Compression
no compression
reduces bandwidth
utilization
less cpu load
but! benchmark,
codecs might be
good after all
#atix #streamingworkshop
Latency: Replication Guarantees
did my message
arrive?
#atix #streamingworkshop
Latency: Replication Guarantees
acks=1 (0 if you are
brave)
#atix #streamingworkshop
Latency: Memory
fetch.min.bytes=1
(fetch data asap)
#atix #streamingworkshop
Latency: Kafka Streams - External Queries
Source: Confluent
Query external
databases
high latency
#atix #streamingworkshop
Latency: Kafka Streams - Table Stream Duality
Source: Confluent
A table is a stream is
a table ...
#atix #streamingworkshop
Latency: Kafka Streams - Local Table Joins
Source: Confluent
Include external data
via Connect
join Tables locally
with streams
decrease latency
#atix #streamingworkshop
Latency: Kafka Streams - Topologies
Source: Confluent
Input from Kafka
Operations (filter map
aggregation join)
Back to Kafka
#atix #streamingworkshop
Latency: Kafka Streams - Topologies
Source: Confluent
Repartitioning may
occur
increase latency
#atix #streamingworkshop
Latency: Kafka Streams - Topologies
config.setProperty(
StreamsConfig.TOPOLOGY_OPTIMIZATION,
StreamsConfig.OPTIMIZE);
Avoid unnecessary
repartitioning
decrease latency
#atix #streamingworkshop
Latency: Partitions
Source: Confluent
More Partition == Higher
Troughput?!
yes, but!
broker has single thread for
replication (default)
latency might increase
limit topics per broker
increase num.replica.fetchers
#atix #streamingworkshop
Latency: Summary
Producer:
linger.ms: 0 (default)
compression.type: none (default)
acks:1 (default)
Consumer:
fetch.min.bytes=1
Streams:
StreamsConfig.TOPOLOGY_OPTIMIZATION StreamsConfig.OPTIMIZE
use kafka connect and local joins
Broker:
num.replica.fetchers: increase (default 1)
#atix #streamingworkshop
Durability
#atix #streamingworkshop
Durability: Producer Replication
Source: Confluent
choose a higher replication
factor
don’t forget internal topics!
__consumer_offsets
internal streams topics
exactly once semantics
#atix #streamingworkshop
Durability: Producer Acks
acks=all
#atix #streamingworkshop
Durability: Producer Retries
Source: Confluent
retries
duplicates
ordering
problems
#atix #streamingworkshop
Durability: Producer Idempotency
Source: Confluent
produce
idempotent!
#atix #streamingworkshop
Durability: Delivery Semantics
at-least-once delivery
at-most-once delivery
exactly-once delivery (Kafka only!)
#atix #streamingworkshop
Durability: Exactly Once Semantic
(
Send Data with PID and increasing seq
Broker: Check if seq > seq_old
write only if true
)
1. Idempotent
Producer
#atix #streamingworkshop
Durability: Exactly Once Semantic
(
Send Data
Read Data (Write to Commit Log)
)
Either write all of that or none
Source: Confluent
1. Idempotent
Producer
2. Multiple Partition
Transactions
#atix #streamingworkshop
Durability: Exactly Once Semantic
producer.initTransactions();
[...]
while (true) {
ConsumerRecords records = consumer.poll(Long.MAX_VALUE);
producer.beginTransaction();
for (ConsumerRecord record : records)
producer.send(producerRecord(“outputTopic”, record));
producer.sendOffsetsToTransaction(currentOffsets(consumer), group);
producer.commitTransaction();
}
Source: Confluent
1. Idempotent
Producer
2. Multiple
Partition
Transactions
#atix #streamingworkshop
Durability: Streams EOS
StreamsConfig.PROCESSING_GUARANTEE_CONFIG:
StreamsConfig.EXACTLY_ONCE
StreamsConfig.REPLICATION_FACTOR_CONFIG: 3 (default 1)
EOS for Streams very
simple
don’t forget
replication of streams
topics
#atix #streamingworkshop
Durability: Consumer
no auto commit!
enable.auto.commit false
for EOS (isolation.level =
read_commited)
#atix #streamingworkshop
Durability: Replica
min.insync.replica > 1
#atix #streamingworkshop
Durability: Summary
Producer:
replication.factor=3 (topic override available)
acks=all (default 1)
enable.idempotence=true (default false), to handle message duplication and ordering
Consumer:
enable.auto.commit=false (default true)
isolation.level=read_committed (for EOS transactions)
#atix #streamingworkshop
Durability: Summary
Streams:
StreamsConfig.REPLICATION_FACTOR_CONFIG: 3 (default 1)
StreamsConfig.PROCESSING_GUARANTEE_CONFIG: StreamsConfig.EXACTLY_ONCE
Broker:
default.replication.factor=3 (default 1)
auto.create.topics.enable=false (default true)
min.insync.replicas=2 (default 1); topic override available
broker.rack: rack of the broker (default null)
#atix #streamingworkshop
Availability
Source: Confluent
#atix #streamingworkshop
Availability: Replica
min.insync.replicas = 1
unclean.leader.election.enable =
true
#atix #streamingworkshop
Availability: Log Recovery
broker start: scan all log files
to sync
usually one thread per data
dir
increase to number of dirs in
log.files
#atix #streamingworkshop
Availability: Stream State Restoration
Source: Confluent
store data for stateful stream
ops
create copies of state for
reusing
num.standby.replicas > 0
#atix #streamingworkshop
Availability: Summary
Consumer:
session.timeout.ms: as low as feasible (default 10000)
Streams:
StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG: 1 or more(default 0)
Broker:
unclean.leader.election.enable=true (default false); topic override available
min.insync.replicas=1 (default 1); topic override available
num.recovery.threads.per.data.dir: number of directories in log.dirs (default 1)
#atix #streamingworkshop
And now?: Next steps
1. Decide what you want
Repeat:
2. Benchmark - Control Center!
3. Act on results
4. MONITOR! - Control Center!
#atix #streamingworkshop
And now?: Sources and Material
What is Kafka https://www.heise.de/select/i-
x/2019/4/1553935521978043
Kafka Optimization https://www.confluent.io/white-
paper/optimizing-your-apache-kafka-deployment/
Kafka Partition number https://www.confluent.io/blog/how-choose-
number-topics-partitions-kafka-cluster
RTFM https://docs.confluent.io/current/administer.html
#atix #streamingworkshop
And now?: Talk to us
Datacenter
Automation
Deploy
ConfigureRelease
#atix #streamingworkshop
Let’s optimize our latency!
#atix #streamingworkshop

More Related Content

What's hot (20)

PDF
Apache Flink internals
Kostas Tzoumas
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
PPTX
Kafka 101
Clement Demonchy
 
PDF
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
HostedbyConfluent
 
PDF
ksqlDB: A Stream-Relational Database System
confluent
 
PDF
Hardening Kafka Replication
confluent
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Introduction to Kafka Cruise Control
Jiangjie Qin
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PPTX
Kafka Tutorial: Advanced Producers
Jean-Paul Azar
 
PDF
Kafka Streams State Stores Being Persistent
confluent
 
PPTX
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
PDF
Introduction to Apache Kafka
Shiao-An Yuan
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Welcome to the Flink Community!
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
PDF
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
PDF
Kafka At Scale in the Cloud
confluent
 
PPTX
Kafka: Internals
Knoldus Inc.
 
Apache Flink internals
Kostas Tzoumas
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Kafka 101
Clement Demonchy
 
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
HostedbyConfluent
 
ksqlDB: A Stream-Relational Database System
confluent
 
Hardening Kafka Replication
confluent
 
Apache Kafka - Martin Podval
Martin Podval
 
Introduction to Kafka Cruise Control
Jiangjie Qin
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Kafka Tutorial: Advanced Producers
Jean-Paul Azar
 
Kafka Streams State Stores Being Persistent
confluent
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
Introduction to Apache Kafka
Shiao-An Yuan
 
Introduction to Kafka Streams
Guozhang Wang
 
Welcome to the Flink Community!
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
HostedbyConfluent
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Kafka At Scale in the Cloud
confluent
 
Kafka: Internals
Knoldus Inc.
 

Similar to How to tune Kafka® for production (20)

PDF
Golang Performance : microbenchmarks, profilers, and a war story
Aerospike
 
PDF
Spark Streaming with Cassandra
Jacek Lewandowski
 
KEY
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
SV Ruby on Rails Meetup
 
PPTX
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
PDF
Twitch Plays Pokémon: Twitch's Chat Architecture
C4Media
 
PDF
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
PDF
Reactor, Reactive streams and MicroServices
Stéphane Maldini
 
PDF
Serial-War
Xuechao Wu
 
PDF
Trying and evaluating the new features of GlusterFS 3.5
Keisuke Takahashi
 
KEY
Some Rough Fibrous Material
Murray Steele
 
PDF
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
PDF
Digital signal processing through speech, hearing, and Python
Mel Chua
 
ODP
Testing Wi-Fi with OSS Tools
All Things Open
 
PDF
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
ThierryAbalea
 
PDF
Wuala, P2P Online Storage
adunne
 
KEY
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
KEY
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
PDF
Streaming 101: Hello World
Josh Fischer
 
PDF
R-House (LSRC)
Fernand Galiana
 
PDF
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
Golang Performance : microbenchmarks, profilers, and a war story
Aerospike
 
Spark Streaming with Cassandra
Jacek Lewandowski
 
How I Learned to Stop Worrying and Love the Cloud - Wesley Beary, Engine Yard
SV Ruby on Rails Meetup
 
Using Libtracecmd to Analyze Your Latency and Performance Troubles
ScyllaDB
 
Twitch Plays Pokémon: Twitch's Chat Architecture
C4Media
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
apidays
 
Reactor, Reactive streams and MicroServices
Stéphane Maldini
 
Serial-War
Xuechao Wu
 
Trying and evaluating the new features of GlusterFS 3.5
Keisuke Takahashi
 
Some Rough Fibrous Material
Murray Steele
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Dan Halperin
 
Digital signal processing through speech, hearing, and Python
Mel Chua
 
Testing Wi-Fi with OSS Tools
All Things Open
 
Un monde où 1 ms vaut 100 M€ - Devoxx France 2015
ThierryAbalea
 
Wuala, P2P Online Storage
adunne
 
fog or: How I Learned to Stop Worrying and Love the Cloud (OpenStack Edition)
Wesley Beary
 
fog or: How I Learned to Stop Worrying and Love the Cloud
Wesley Beary
 
Streaming 101: Hello World
Josh Fischer
 
R-House (LSRC)
Fernand Galiana
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData
 
Ad

More from confluent (20)

PDF
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
PPTX
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
PDF
Migration, backup and restore made easy using Kannika
confluent
 
PDF
Five Things You Need to Know About Data Streaming in 2025
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
PDF
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
PDF
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
PDF
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
PDF
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
PDF
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
PDF
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
PDF
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
PDF
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
PDF
Unlocking value with event-driven architecture by Confluent
confluent
 
PDF
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
PDF
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
PDF
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
PDF
Building API data products on top of your real-time data infrastructure
confluent
 
PDF
Speed Wins: From Kafka to APIs in Minutes
confluent
 
PDF
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Stream Processing Handson Workshop - Flink SQL Hands-on Workshop (Korean)
confluent
 
Webinar Think Right - Shift Left - 19-03-2025.pptx
confluent
 
Migration, backup and restore made easy using Kannika
confluent
 
Five Things You Need to Know About Data Streaming in 2025
confluent
 
Data in Motion Tour Seoul 2024 - Keynote
confluent
 
Data in Motion Tour Seoul 2024 - Roadmap Demo
confluent
 
From Stream to Screen: Real-Time Data Streaming to Web Frontends with Conflue...
confluent
 
Confluent per il settore FSI: Accelerare l'Innovazione con il Data Streaming...
confluent
 
Data in Motion Tour 2024 Riyadh, Saudi Arabia
confluent
 
Build a Real-Time Decision Support Application for Financial Market Traders w...
confluent
 
Strumenti e Strategie di Stream Governance con Confluent Platform
confluent
 
Compose Gen-AI Apps With Real-Time Data - In Minutes, Not Weeks
confluent
 
Building Real-Time Gen AI Applications with SingleStore and Confluent
confluent
 
Unlocking value with event-driven architecture by Confluent
confluent
 
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
Building API data products on top of your real-time data infrastructure
confluent
 
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Ad

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 

How to tune Kafka® for production