SlideShare a Scribd company logo
Non-Kafkaesque
Apache Kafka
Semantics and performance tips & tricks
Yottabyte 2018 – Beijing
Otávio Carvalho
2
Disclaimer
3
This is not (only) about Franz Kafka
● "Kafkaesque" describes, as
the Oxford Dictionaries
would put it, "oppressive or
nightmarish qualities,"
or as Merriam-Webster
suggests, "having a
nightmarishly complex,
bizarre, or illogical quality."
4
Is it designed to be
Kafkaesque?
5
Overview
6
Overview
7
Is it a Message queue?
8
Brief pause for a flame war
9
Common misconceptions
● Common misconceptions usually are related to either
Semantics or Reliability
● AMQP does not provide ordering guarantees in use cases
with multiple consumers, which is a very common scenario
with multiple microservices and app instances
● Kafka and AMQP-based solutions are pretty much
interchangeable for the most common use cases (publish-
subscribe)
● Small order of messages per second (around 10k/sec)
● At-least-once semantics
● No strict ordering guarantees
10
Common misconceptions
If you have multiple processors in the queue,
there is no longer a guarantee that messages
will be processed in order
● AMQP 0.9.1
● “Messages published in one channel, passing through one exchange
and one queue and one outgoing channel will be received in the
same order that they were sent.”
● RabbitMQ
● “Messages can be returned to the queue using AMQP methods that
feature a requeue parameter, or due to a channel closing while
holding unacknowledged messages. Any of these scenarios caused
messages to be requeued at the back of the queue for RabbitMQ
releases earlier than 2.7.0. From RabbitMQ release 2.7.0, messages
are always held in the queue in publication order, even in the
presence of requeueing or channel closure.”
11
Consumer Groups
● Consumer groups are a great way to balance data
consumption and achieve two main goals:
● Split message consumption inside a group
● Each consumer receives messages from one or more
partitions and the same messages won't be received by
other consumers within the same group
● Provide publish/subscribe for multiple groups of
consumers
● Rules to balance message distribution are only for
members within the group. Two distinct consumer groups
assigned to the same topic will receive the same
messages (as expected in a publish/subscribe pattern)
12
Consumer Groups
13
Consumer Groups
● You should be careful with two things:
● Idle consumers
● More consumers than partitions in a given consumer
group will cause consumers to be idle. A consumer
can read one or more partitions at a given point in
time
● Rebalancing
● Whenever a consumer joins or leaves a consumer
group, the brokers rebalance the partitions across
consumers. This way, Kafka will load balance the
number of partitions assigned per application
instance
14
Semantics
15
Semantics
16
Kafka semantics
● In a distributed publish-subscribe messaging system, the
computers that make up the system can always fail
independently of one another.
● In the case of Kafka, an individual broker can crash, or a
network failure can happen while the producer is sending a
message to a topic.
● Depending on the action the producer takes to handle such
a failure, you can get different semantics:
● At-least-once
● At-most-once
● Exactly-once
17
Semantics: At-least-once
● At-least-once
● request.required.acks=-1, which means from a
producer standpoint that a successful write is done
when it has written the message to the topic and all of
the in-sync-replicas successfully
● Even with the highest producer acknowledgement level
of consistency enabled, in the case a producer ack
times out or receives in error, it might retry sending the
message.
● If the broker had failed right before it sent the ack but
after the message was successfully written to the
Kafka topic, this retry leads to a message being written
twice and hence delivered more than once to the end
consumer.
18
Semantics: At-most-once
● At-most-once
● If the producer does not retry when an ack times out or
returns an error, the message might end up not being
written to the Kafka topic, and hence not delivered to
the consumer.
● In most cases it will be, but in order to avoid the
possibility of duplication, we accept that sometimes
messages will not get through
19
Semantics: Exactly-once
● Exactly-once
● Hard and incurs on performance penalties, but doable
since Kafka 0.11
● Idempotent operations, relying on transactions,
sequence numbers and inspired by TCP protocol
● Even if a producer retries sending a message, it leads
to the message being delivered exactly once to the end
consumer.
● Exactly-once semantics is the most desirable
guarantee, but also a poorly understood one. This is
because it requires a cooperation between the
messaging system itself and the application producing
and consuming the messages.
20
Partitioning
21
Replication
22
Anatomy of a Topic
23
Partitions on the filesystem
● Partitioning trade-offs
● Each partition maps to a directory in the file system in
the broker. Within that log directory, there will be two
files (one for the index and another for the actual data)
per log segment.
● Each broker opens a file handle of both the index and
the data file of every log segment.
● The more partitions, the higher that one needs to
configure the open file handle limit in the underlying
operating system.
● IMPORTANT: Adjust the filesystem to support a
high number of file descriptors opened at the same
time (There are Kafka clusters running with more than
30 thousand open file handles per broker)
24
Partitions on the filesystem
25
General Perf Tuning
26
General Tuning
● If throughput is less than network capacity, adjust and evaluate
● Add more threads
● Increase batch size
● Add more producer instances
● Add more partitions
● Otherwise, try to identify the throughput bottleneck
● Is the bottleneck in user threads?
● Increase num of threads and see if throughput increases
● Pay attention to lock contention
● Is the bottleneck in sender thread
● Are queue times large? Are batch sizes avgs similar to batch.size?
● Is the bottleneck in the broker?
● Is the request latency very large?
27
OS/System Tuning
● OS/System Tuning
● For cross datacenter or high latency scenarios
● Tune buffer settings for sockets
● Tune buffer settings for network (OS TCP settings)
● Adjust message size according to network bandwidth
● Java/JVM Tuning
● Minimize GC pauses by using G1GC
● Try to keep heap size below 4GB
● Zookeeper tuning
● Lower load after consumer offset commits from Kafka
stopped writing to ZK
● Increase maxClientCnxns was needed in especial cases
28
Producer tuning
● Producer configurations
● request.required.acks
● Affect durability and should match the expected semantics
● batch.size
● Should be evaluated for each latency/throughput scenario
● linger.ms
● Maximum time to wait before sending a batch (that could
potentially not be full yet). Should be tuned together with
batch.size. May cause contention on send if set to a very low
number.
● compression.type
● Potentially large impact with large message sizes
● max.in.flight.requests.per.connection
● Multiple requests per connection can increase throughput in
some scenarios but also affect message ordering
29
Producer tuning
30
Broker tuning
● Broker configurations
● num.io.threads
● As many threads as you have disks
● log.flush.interval.messages
● Time prior to writing to disk. Higher improve performance but increases
risk of losing data in the event of a crash
● adjust num.network.threads based on
● <number-of-producers> + <number-of-consumers> + <replication-factor>
● Number of partitions
● A large number of partitions in a single broker can increase latency
● Partitions per physical disk storage
● Ideally one partition per physical disk storage to avoid I/O bottlenecks
31
Consumer tuning
● Consumer configurations
● Consumers vs Producers
● Should keep up with the number of producers
● Number of consumers in the Consumer Group
● Should have as many consumers in a group as there
are partitions
● replica.high.watermark.checkpoint.interval.ms
● If you have a checkpoint watermark for every event,
you will never lose a message, but it will significantly
impact performance
32
Leverage Kafka Perf Tools
● Use tools provided by Kafka to validate its performance
● kafka-producer-perf-test.sh
● kafka-consumer-perf-test.sh
● Monitor system resources during performance evaluation
with standard linux tools
● dstat
● iostat
33
Observability
34
Observability
● Monitoring Kafka while maintaining sanity in three steps:
● Retention
● How much data can we store on disk for each
topic partition?
● Replication
● How many copies of the data can we make?
● Consumer Lag
● How do we monitor how far behind our consumer
applications are from the producers?
35
Observability
● Kafka-side key metrics
● UnderReplicatedPartitions
● Increase when cluster is lagging to replicate. High
availability metrics cannot be met without replication
● IsrShrinksPerSec/IsrExpandsPerSec
● Should keep stable if we are not expanding the
broker cluster or removing partitions. It could be
failing behind leaders offset
(replica.lag.max.messages) or it has not contacted
the leader for some time (replica.socket.timeout.ms)
● UncleanLeaderElectionsPerSec
● Unable to identify a qualified partition leader among
Kafka brokers
36
Observability
● TotalTimeMs
● Total time taken to service a request (produce/fetch-
consume/fetch-follower)
● Sum of queue time, local time (leader), remote time
(follower response) and response time
● PurgatorySize
● Produce and fetch requests waiting to be satisfied
● BytesInPerSec/BytesOutPerSec
● Network throughput gives insights where potential
bottlenecks may lie
● LeaderElectionRateAndTimeMs
● Could translate to an offline broker
● ActiveControllerCount
● OfflinePartitionsCount
37
Observability
● Host-level broker key metrics
● Page cache hits ratio
● Kafka leverages OS kernel's page cache in order to provide a
reliable (disk-based) while performant (in-memory) message
pipeline
● Disk usage
● CPU usage
● Not a common bottleneck, even with compression enabled
● Network bytes sent/received
● Correlate with TCP retransmissions and packets being dropped
to identify network issues
● JVM metrics
● GC count
● GC time
38
Observability
● Kafka producer key metrics
● Request rate
● Response rate
● Rate of responses received from brokers. Behavior
depends on semantics due to request.required.acks
● Request latency
● Outgoing byte rate
● Identify sources of excessive traffic
● I/O wait time
● Excessive wait times mean the producers can't get data
fast enough
39
Observability
● Kafka consumer key metrics
● ConsumerLag/MaxLag
● Related to the use case, but should be low in cases
where consumer is processing real-time data
● BytesPerSec
● MessagesPerSec
● MinFetchRate
Thanks!
Please reach out for questions and feedbacks
Otávio Carvalho
ocarvalh@thoughtworks.com
@otaviocarvalho
ThoughtWorks Brazil – Porto Alegre Office
41
Appendix A - Monitoring Tools
● Kafka Manager
● https://github.com/yahoo/kafka-manager
● Confluent Control Center
● https://www.confluent.io/confluent-control-center/
● LinkedIn Burrow
● https://github.com/linkedin/Burrow
● Zalando Remora
● https://github.com/zalando-incubator/remora
● Manual injestion via JMX
● https://softwaremill.com/monitoring-apache-kafka-with-influx
db-grafana/
42
Appendix B - Operations Tools
● Confluent Auto Data Rebalancing
● https://docs.confluent.io/current/kafka/rebalancer/rebalancer.
html
● Rebalance manually
● https://blog.imaginea.com/how-to-rebalance-topics-in-kafka-clu
ster/
● https://gquintana.github.io/2016/10/17/Scaling-Kafka.html
● Automated cluster healing
● https://github.com/pinterest/doctorkafka
● Dynamic workload rebalance and self-healing of a Kafka
cluster
● https://github.com/linkedin/cruise-control
43
Links & References
● Kafka: The Definitive Guide –
http://shop.oreilly.com/product/0636920044123.do
● Kafka Official Documentation –
https://kafka.apache.org/documentation/
● Amazon Best Practices –
https://aws.amazon.com/blogs/big-data/best-practices-for-running-apac
he-kafka-on-aws/
● Tuning Kafka for Low Latency Guaranteed Messaging (LinkedIn) –
https://www.youtube.com/watch?v=oQe7PpDDdzA
● Partitioning Trade-offs (Confluent) –
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartiti
ons-in-a-kafka-cluster/
● Topic Partitioning (New Relic) –
https://blog.newrelic.com/engineering/effective-strategies-kafka-topic-par
titioning/
● Exactly-once semantics (Confluent) –
https://www.confluent.io/blog/exactly-once-semantics-are-possible-here
s-how-apache-kafka-does-it/
● RabbitMQ semantics – https://www.rabbitmq.com/semantics.html
44
Links & References
● Kafka at Scale in the Cloud (Netflix) –
https://www.slideshare.net/ConfluentInc/kafka-at-scale-in-the-cloud
● Kafka In-Depth Log Index Structure –
https://medium.com/@mukeshkumar_46704/in-depth-kafka-message-queue
-principles-of-high-reliability-42e464e66172
● Kafkapocalypse (New Relic) –
https://blog.newrelic.com/engineering/new-relic-kafkapocalypse/
● Monitoring Kafka Performance Metrics (Datadog) –
https://www.datadoghq.com/blog/monitoring-kafka-performance-metric
s/

More Related Content

What's hot (20)

PDF
RabbitMQ vs Apache Kafka - Part 1
Erlang Solutions
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PDF
Messaging queue - Kafka
Mayank Bansal
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Improving Kafka at-least-once performance at Uber
Ying Zheng
 
PDF
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 
PPTX
Understanding kafka
AmitDhodi
 
PDF
Kafka Technical Overview
Sylvester John
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PPTX
Apache kafka
Viswanath J
 
PDF
ES & Kafka
Diego Pacheco
 
PDF
Introduction to Apache ActiveMQ Artemis
Yoshimasa Tanabe
 
PPTX
Apache Kafka
emreakis
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Apache kafka
Srikrishna k
 
PDF
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
PDF
Kafka At Scale in the Cloud
confluent
 
PPTX
Kafka reliability velocity 17
Gwen (Chen) Shapira
 
RabbitMQ vs Apache Kafka - Part 1
Erlang Solutions
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Messaging queue - Kafka
Mayank Bansal
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Improving Kafka at-least-once performance at Uber
Ying Zheng
 
Kafka Evaluation - High Throughout Message Queue
Shafaq Abdullah
 
Understanding kafka
AmitDhodi
 
Kafka Technical Overview
Sylvester John
 
Apache Kafka - Free Friday
Otávio Carvalho
 
Apache kafka
Viswanath J
 
ES & Kafka
Diego Pacheco
 
Introduction to Apache ActiveMQ Artemis
Yoshimasa Tanabe
 
Apache Kafka
emreakis
 
Kafka 101
Clement Demonchy
 
Apache kafka
Srikrishna k
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka At Scale in the Cloud
confluent
 
Kafka reliability velocity 17
Gwen (Chen) Shapira
 

Similar to Non-Kafkaesque Apache Kafka - Yottabyte 2018 (20)

PDF
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
PPTX
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
PDF
Apache Kafka - From zero to hero
Apache Kafka TLV
 
PDF
Kafka zero to hero
Avi Levi
 
PPTX
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
PDF
Kafka Deep Dive
Knoldus Inc.
 
PDF
Perfug 20-11-2019 - Kafka Performances
Florent Ramiere
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
Grokking TechTalk #24: Kafka's principles and protocols
Grokking VN
 
PDF
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
Kafka101
Aparna Pillai
 
PPTX
Putting Kafka Into Overdrive
Todd Palino
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Why is My Stream Processing Job Slow? with Xavier Leaute
Databricks
 
PPTX
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
PDF
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
Introduction to Kafka
Ducas Francis
 
Introduction to Kafka and Event-Driven
arconsis
 
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Kafka zero to hero
Avi Levi
 
Streaming in Practice - Putting Apache Kafka in Production
confluent
 
Kafka Deep Dive
Knoldus Inc.
 
Perfug 20-11-2019 - Kafka Performances
Florent Ramiere
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Grokking TechTalk #24: Kafka's principles and protocols
Grokking VN
 
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Kafka101
Aparna Pillai
 
Putting Kafka Into Overdrive
Todd Palino
 
Apache Kafka Introduction
Amita Mirajkar
 
Why is My Stream Processing Job Slow? with Xavier Leaute
Databricks
 
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Ad

More from Otávio Carvalho (8)

PDF
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
Otávio Carvalho
 
PDF
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
Otávio Carvalho
 
PDF
Stream Processing - ThoughtWorks Architecture Group - 2017
Otávio Carvalho
 
PDF
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Otávio Carvalho
 
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
PDF
A Survey of the State-of-the-art in Event Processing
Otávio Carvalho
 
PDF
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
Otávio Carvalho
 
PDF
Utilização de traços de execução para migração de aplicações para a nuvem
Otávio Carvalho
 
GaruaGeo: Global Scale Data Aggregation in Hybrid Edge and Cloud Computing En...
Otávio Carvalho
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
Otávio Carvalho
 
Stream Processing - ThoughtWorks Architecture Group - 2017
Otávio Carvalho
 
Stream Processing: Uma visão geral - TDC Porto Alegre / FISL 17
Otávio Carvalho
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
A Survey of the State-of-the-art in Event Processing
Otávio Carvalho
 
Análise e Caracterização das Novas Ferramentas para Computação em Nuvem
Otávio Carvalho
 
Utilização de traços de execução para migração de aplicações para a nuvem
Otávio Carvalho
 
Ad

Recently uploaded (20)

PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Revolutionizing Code Modernization with AI
KrzysztofKkol1
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 

Non-Kafkaesque Apache Kafka - Yottabyte 2018

  • 1. Non-Kafkaesque Apache Kafka Semantics and performance tips & tricks Yottabyte 2018 – Beijing Otávio Carvalho
  • 3. 3 This is not (only) about Franz Kafka ● "Kafkaesque" describes, as the Oxford Dictionaries would put it, "oppressive or nightmarish qualities," or as Merriam-Webster suggests, "having a nightmarishly complex, bizarre, or illogical quality."
  • 4. 4 Is it designed to be Kafkaesque?
  • 7. 7 Is it a Message queue?
  • 8. 8 Brief pause for a flame war
  • 9. 9 Common misconceptions ● Common misconceptions usually are related to either Semantics or Reliability ● AMQP does not provide ordering guarantees in use cases with multiple consumers, which is a very common scenario with multiple microservices and app instances ● Kafka and AMQP-based solutions are pretty much interchangeable for the most common use cases (publish- subscribe) ● Small order of messages per second (around 10k/sec) ● At-least-once semantics ● No strict ordering guarantees
  • 10. 10 Common misconceptions If you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order ● AMQP 0.9.1 ● “Messages published in one channel, passing through one exchange and one queue and one outgoing channel will be received in the same order that they were sent.” ● RabbitMQ ● “Messages can be returned to the queue using AMQP methods that feature a requeue parameter, or due to a channel closing while holding unacknowledged messages. Any of these scenarios caused messages to be requeued at the back of the queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release 2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure.”
  • 11. 11 Consumer Groups ● Consumer groups are a great way to balance data consumption and achieve two main goals: ● Split message consumption inside a group ● Each consumer receives messages from one or more partitions and the same messages won't be received by other consumers within the same group ● Provide publish/subscribe for multiple groups of consumers ● Rules to balance message distribution are only for members within the group. Two distinct consumer groups assigned to the same topic will receive the same messages (as expected in a publish/subscribe pattern)
  • 13. 13 Consumer Groups ● You should be careful with two things: ● Idle consumers ● More consumers than partitions in a given consumer group will cause consumers to be idle. A consumer can read one or more partitions at a given point in time ● Rebalancing ● Whenever a consumer joins or leaves a consumer group, the brokers rebalance the partitions across consumers. This way, Kafka will load balance the number of partitions assigned per application instance
  • 16. 16 Kafka semantics ● In a distributed publish-subscribe messaging system, the computers that make up the system can always fail independently of one another. ● In the case of Kafka, an individual broker can crash, or a network failure can happen while the producer is sending a message to a topic. ● Depending on the action the producer takes to handle such a failure, you can get different semantics: ● At-least-once ● At-most-once ● Exactly-once
  • 17. 17 Semantics: At-least-once ● At-least-once ● request.required.acks=-1, which means from a producer standpoint that a successful write is done when it has written the message to the topic and all of the in-sync-replicas successfully ● Even with the highest producer acknowledgement level of consistency enabled, in the case a producer ack times out or receives in error, it might retry sending the message. ● If the broker had failed right before it sent the ack but after the message was successfully written to the Kafka topic, this retry leads to a message being written twice and hence delivered more than once to the end consumer.
  • 18. 18 Semantics: At-most-once ● At-most-once ● If the producer does not retry when an ack times out or returns an error, the message might end up not being written to the Kafka topic, and hence not delivered to the consumer. ● In most cases it will be, but in order to avoid the possibility of duplication, we accept that sometimes messages will not get through
  • 19. 19 Semantics: Exactly-once ● Exactly-once ● Hard and incurs on performance penalties, but doable since Kafka 0.11 ● Idempotent operations, relying on transactions, sequence numbers and inspired by TCP protocol ● Even if a producer retries sending a message, it leads to the message being delivered exactly once to the end consumer. ● Exactly-once semantics is the most desirable guarantee, but also a poorly understood one. This is because it requires a cooperation between the messaging system itself and the application producing and consuming the messages.
  • 23. 23 Partitions on the filesystem ● Partitioning trade-offs ● Each partition maps to a directory in the file system in the broker. Within that log directory, there will be two files (one for the index and another for the actual data) per log segment. ● Each broker opens a file handle of both the index and the data file of every log segment. ● The more partitions, the higher that one needs to configure the open file handle limit in the underlying operating system. ● IMPORTANT: Adjust the filesystem to support a high number of file descriptors opened at the same time (There are Kafka clusters running with more than 30 thousand open file handles per broker)
  • 24. 24 Partitions on the filesystem
  • 26. 26 General Tuning ● If throughput is less than network capacity, adjust and evaluate ● Add more threads ● Increase batch size ● Add more producer instances ● Add more partitions ● Otherwise, try to identify the throughput bottleneck ● Is the bottleneck in user threads? ● Increase num of threads and see if throughput increases ● Pay attention to lock contention ● Is the bottleneck in sender thread ● Are queue times large? Are batch sizes avgs similar to batch.size? ● Is the bottleneck in the broker? ● Is the request latency very large?
  • 27. 27 OS/System Tuning ● OS/System Tuning ● For cross datacenter or high latency scenarios ● Tune buffer settings for sockets ● Tune buffer settings for network (OS TCP settings) ● Adjust message size according to network bandwidth ● Java/JVM Tuning ● Minimize GC pauses by using G1GC ● Try to keep heap size below 4GB ● Zookeeper tuning ● Lower load after consumer offset commits from Kafka stopped writing to ZK ● Increase maxClientCnxns was needed in especial cases
  • 28. 28 Producer tuning ● Producer configurations ● request.required.acks ● Affect durability and should match the expected semantics ● batch.size ● Should be evaluated for each latency/throughput scenario ● linger.ms ● Maximum time to wait before sending a batch (that could potentially not be full yet). Should be tuned together with batch.size. May cause contention on send if set to a very low number. ● compression.type ● Potentially large impact with large message sizes ● max.in.flight.requests.per.connection ● Multiple requests per connection can increase throughput in some scenarios but also affect message ordering
  • 30. 30 Broker tuning ● Broker configurations ● num.io.threads ● As many threads as you have disks ● log.flush.interval.messages ● Time prior to writing to disk. Higher improve performance but increases risk of losing data in the event of a crash ● adjust num.network.threads based on ● <number-of-producers> + <number-of-consumers> + <replication-factor> ● Number of partitions ● A large number of partitions in a single broker can increase latency ● Partitions per physical disk storage ● Ideally one partition per physical disk storage to avoid I/O bottlenecks
  • 31. 31 Consumer tuning ● Consumer configurations ● Consumers vs Producers ● Should keep up with the number of producers ● Number of consumers in the Consumer Group ● Should have as many consumers in a group as there are partitions ● replica.high.watermark.checkpoint.interval.ms ● If you have a checkpoint watermark for every event, you will never lose a message, but it will significantly impact performance
  • 32. 32 Leverage Kafka Perf Tools ● Use tools provided by Kafka to validate its performance ● kafka-producer-perf-test.sh ● kafka-consumer-perf-test.sh ● Monitor system resources during performance evaluation with standard linux tools ● dstat ● iostat
  • 34. 34 Observability ● Monitoring Kafka while maintaining sanity in three steps: ● Retention ● How much data can we store on disk for each topic partition? ● Replication ● How many copies of the data can we make? ● Consumer Lag ● How do we monitor how far behind our consumer applications are from the producers?
  • 35. 35 Observability ● Kafka-side key metrics ● UnderReplicatedPartitions ● Increase when cluster is lagging to replicate. High availability metrics cannot be met without replication ● IsrShrinksPerSec/IsrExpandsPerSec ● Should keep stable if we are not expanding the broker cluster or removing partitions. It could be failing behind leaders offset (replica.lag.max.messages) or it has not contacted the leader for some time (replica.socket.timeout.ms) ● UncleanLeaderElectionsPerSec ● Unable to identify a qualified partition leader among Kafka brokers
  • 36. 36 Observability ● TotalTimeMs ● Total time taken to service a request (produce/fetch- consume/fetch-follower) ● Sum of queue time, local time (leader), remote time (follower response) and response time ● PurgatorySize ● Produce and fetch requests waiting to be satisfied ● BytesInPerSec/BytesOutPerSec ● Network throughput gives insights where potential bottlenecks may lie ● LeaderElectionRateAndTimeMs ● Could translate to an offline broker ● ActiveControllerCount ● OfflinePartitionsCount
  • 37. 37 Observability ● Host-level broker key metrics ● Page cache hits ratio ● Kafka leverages OS kernel's page cache in order to provide a reliable (disk-based) while performant (in-memory) message pipeline ● Disk usage ● CPU usage ● Not a common bottleneck, even with compression enabled ● Network bytes sent/received ● Correlate with TCP retransmissions and packets being dropped to identify network issues ● JVM metrics ● GC count ● GC time
  • 38. 38 Observability ● Kafka producer key metrics ● Request rate ● Response rate ● Rate of responses received from brokers. Behavior depends on semantics due to request.required.acks ● Request latency ● Outgoing byte rate ● Identify sources of excessive traffic ● I/O wait time ● Excessive wait times mean the producers can't get data fast enough
  • 39. 39 Observability ● Kafka consumer key metrics ● ConsumerLag/MaxLag ● Related to the use case, but should be low in cases where consumer is processing real-time data ● BytesPerSec ● MessagesPerSec ● MinFetchRate
  • 40. Thanks! Please reach out for questions and feedbacks Otávio Carvalho [email protected] @otaviocarvalho ThoughtWorks Brazil – Porto Alegre Office
  • 41. 41 Appendix A - Monitoring Tools ● Kafka Manager ● https://github.com/yahoo/kafka-manager ● Confluent Control Center ● https://www.confluent.io/confluent-control-center/ ● LinkedIn Burrow ● https://github.com/linkedin/Burrow ● Zalando Remora ● https://github.com/zalando-incubator/remora ● Manual injestion via JMX ● https://softwaremill.com/monitoring-apache-kafka-with-influx db-grafana/
  • 42. 42 Appendix B - Operations Tools ● Confluent Auto Data Rebalancing ● https://docs.confluent.io/current/kafka/rebalancer/rebalancer. html ● Rebalance manually ● https://blog.imaginea.com/how-to-rebalance-topics-in-kafka-clu ster/ ● https://gquintana.github.io/2016/10/17/Scaling-Kafka.html ● Automated cluster healing ● https://github.com/pinterest/doctorkafka ● Dynamic workload rebalance and self-healing of a Kafka cluster ● https://github.com/linkedin/cruise-control
  • 43. 43 Links & References ● Kafka: The Definitive Guide – http://shop.oreilly.com/product/0636920044123.do ● Kafka Official Documentation – https://kafka.apache.org/documentation/ ● Amazon Best Practices – https://aws.amazon.com/blogs/big-data/best-practices-for-running-apac he-kafka-on-aws/ ● Tuning Kafka for Low Latency Guaranteed Messaging (LinkedIn) – https://www.youtube.com/watch?v=oQe7PpDDdzA ● Partitioning Trade-offs (Confluent) – https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartiti ons-in-a-kafka-cluster/ ● Topic Partitioning (New Relic) – https://blog.newrelic.com/engineering/effective-strategies-kafka-topic-par titioning/ ● Exactly-once semantics (Confluent) – https://www.confluent.io/blog/exactly-once-semantics-are-possible-here s-how-apache-kafka-does-it/ ● RabbitMQ semantics – https://www.rabbitmq.com/semantics.html
  • 44. 44 Links & References ● Kafka at Scale in the Cloud (Netflix) – https://www.slideshare.net/ConfluentInc/kafka-at-scale-in-the-cloud ● Kafka In-Depth Log Index Structure – https://medium.com/@mukeshkumar_46704/in-depth-kafka-message-queue -principles-of-high-reliability-42e464e66172 ● Kafkapocalypse (New Relic) – https://blog.newrelic.com/engineering/new-relic-kafkapocalypse/ ● Monitoring Kafka Performance Metrics (Datadog) – https://www.datadoghq.com/blog/monitoring-kafka-performance-metric s/