Non-Kafkaesque Apache Kafka - Yottabyte 2018

Non-Kafkaesque
Apache Kafka
Semantics and performance tips & tricks
Yottabyte 2018 – Beijing
Otávio Carvalho

3
This is not (only) about Franz Kafka
● "Kafkaesque" describes, as
the Oxford Dictionaries
would put it, "oppressive or
nightmarish qualities,"
or as Merriam-Webster
suggests, "having a
nightmarishly complex,
bizarre, or illogical quality."

4
Is it designed to be
Kafkaesque?

9
Common misconceptions
● Common misconceptions usually are related to either
Semantics or Reliability
● AMQP does not provide ordering guarantees in use cases
with multiple consumers, which is a very common scenario
with multiple microservices and app instances
● Kafka and AMQP-based solutions are pretty much
interchangeable for the most common use cases (publish-
subscribe)
● Small order of messages per second (around 10k/sec)
● At-least-once semantics
● No strict ordering guarantees

10
Common misconceptions
If you have multiple processors in the queue,
there is no longer a guarantee that messages
will be processed in order
● AMQP 0.9.1
● “Messages published in one channel, passing through one exchange
and one queue and one outgoing channel will be received in the
same order that they were sent.”
● RabbitMQ
● “Messages can be returned to the queue using AMQP methods that
feature a requeue parameter, or due to a channel closing while
holding unacknowledged messages. Any of these scenarios caused
messages to be requeued at the back of the queue for RabbitMQ
releases earlier than 2.7.0. From RabbitMQ release 2.7.0, messages
are always held in the queue in publication order, even in the
presence of requeueing or channel closure.”

11
Consumer Groups
● Consumer groups are a great way to balance data
consumption and achieve two main goals:
● Split message consumption inside a group
● Each consumer receives messages from one or more
partitions and the same messages won't be received by
other consumers within the same group
● Provide publish/subscribe for multiple groups of
consumers
● Rules to balance message distribution are only for
members within the group. Two distinct consumer groups
assigned to the same topic will receive the same
messages (as expected in a publish/subscribe pattern)

13
Consumer Groups
● You should be careful with two things:
● Idle consumers
● More consumers than partitions in a given consumer
group will cause consumers to be idle. A consumer
can read one or more partitions at a given point in
time
● Rebalancing
● Whenever a consumer joins or leaves a consumer
group, the brokers rebalance the partitions across
consumers. This way, Kafka will load balance the
number of partitions assigned per application
instance

16
Kafka semantics
● In a distributed publish-subscribe messaging system, the
computers that make up the system can always fail
independently of one another.
● In the case of Kafka, an individual broker can crash, or a
network failure can happen while the producer is sending a
message to a topic.
● Depending on the action the producer takes to handle such
a failure, you can get different semantics:
● At-least-once
● At-most-once
● Exactly-once

17
Semantics: At-least-once
● At-least-once
● request.required.acks=-1, which means from a
producer standpoint that a successful write is done
when it has written the message to the topic and all of
the in-sync-replicas successfully
● Even with the highest producer acknowledgement level
of consistency enabled, in the case a producer ack
times out or receives in error, it might retry sending the
message.
● If the broker had failed right before it sent the ack but
after the message was successfully written to the
Kafka topic, this retry leads to a message being written
twice and hence delivered more than once to the end
consumer.

18
Semantics: At-most-once
● At-most-once
● If the producer does not retry when an ack times out or
returns an error, the message might end up not being
written to the Kafka topic, and hence not delivered to
the consumer.
● In most cases it will be, but in order to avoid the
possibility of duplication, we accept that sometimes
messages will not get through

19
Semantics: Exactly-once
● Exactly-once
● Hard and incurs on performance penalties, but doable
since Kafka 0.11
● Idempotent operations, relying on transactions,
sequence numbers and inspired by TCP protocol
● Even if a producer retries sending a message, it leads
to the message being delivered exactly once to the end
consumer.
● Exactly-once semantics is the most desirable
guarantee, but also a poorly understood one. This is
because it requires a cooperation between the
messaging system itself and the application producing
and consuming the messages.

23
Partitions on the filesystem
● Partitioning trade-offs
● Each partition maps to a directory in the file system in
the broker. Within that log directory, there will be two
files (one for the index and another for the actual data)
per log segment.
● Each broker opens a file handle of both the index and
the data file of every log segment.
● The more partitions, the higher that one needs to
configure the open file handle limit in the underlying
operating system.
● IMPORTANT: Adjust the filesystem to support a
high number of file descriptors opened at the same
time (There are Kafka clusters running with more than
30 thousand open file handles per broker)

24
Partitions on the filesystem

26
General Tuning
● If throughput is less than network capacity, adjust and evaluate
● Add more threads
● Increase batch size
● Add more producer instances
● Add more partitions
● Otherwise, try to identify the throughput bottleneck
● Is the bottleneck in user threads?
● Increase num of threads and see if throughput increases
● Pay attention to lock contention
● Is the bottleneck in sender thread
● Are queue times large? Are batch sizes avgs similar to batch.size?
● Is the bottleneck in the broker?
● Is the request latency very large?

27
OS/System Tuning
● OS/System Tuning
● For cross datacenter or high latency scenarios
● Tune buffer settings for sockets
● Tune buffer settings for network (OS TCP settings)
● Adjust message size according to network bandwidth
● Java/JVM Tuning
● Minimize GC pauses by using G1GC
● Try to keep heap size below 4GB
● Zookeeper tuning
● Lower load after consumer offset commits from Kafka
stopped writing to ZK
● Increase maxClientCnxns was needed in especial cases

28
Producer tuning
● Producer configurations
● request.required.acks
● Affect durability and should match the expected semantics
● batch.size
● Should be evaluated for each latency/throughput scenario
● linger.ms
● Maximum time to wait before sending a batch (that could
potentially not be full yet). Should be tuned together with
batch.size. May cause contention on send if set to a very low
number.
● compression.type
● Potentially large impact with large message sizes
● max.in.flight.requests.per.connection
● Multiple requests per connection can increase throughput in
some scenarios but also affect message ordering

30
Broker tuning
● Broker configurations
● num.io.threads
● As many threads as you have disks
● log.flush.interval.messages
● Time prior to writing to disk. Higher improve performance but increases
risk of losing data in the event of a crash
● adjust num.network.threads based on
● <number-of-producers> + <number-of-consumers> + <replication-factor>
● Number of partitions
● A large number of partitions in a single broker can increase latency
● Partitions per physical disk storage
● Ideally one partition per physical disk storage to avoid I/O bottlenecks

31
Consumer tuning
● Consumer configurations
● Consumers vs Producers
● Should keep up with the number of producers
● Number of consumers in the Consumer Group
● Should have as many consumers in a group as there
are partitions
● replica.high.watermark.checkpoint.interval.ms
● If you have a checkpoint watermark for every event,
you will never lose a message, but it will significantly
impact performance

32
Leverage Kafka Perf Tools
● Use tools provided by Kafka to validate its performance
● kafka-producer-perf-test.sh
● kafka-consumer-perf-test.sh
● Monitor system resources during performance evaluation
with standard linux tools
● dstat
● iostat

34
Observability
● Monitoring Kafka while maintaining sanity in three steps:
● Retention
● How much data can we store on disk for each
topic partition?
● Replication
● How many copies of the data can we make?
● Consumer Lag
● How do we monitor how far behind our consumer
applications are from the producers?

35
Observability
● Kafka-side key metrics
● UnderReplicatedPartitions
● Increase when cluster is lagging to replicate. High
availability metrics cannot be met without replication
● IsrShrinksPerSec/IsrExpandsPerSec
● Should keep stable if we are not expanding the
broker cluster or removing partitions. It could be
failing behind leaders offset
(replica.lag.max.messages) or it has not contacted
the leader for some time (replica.socket.timeout.ms)
● UncleanLeaderElectionsPerSec
● Unable to identify a qualified partition leader among
Kafka brokers

36
Observability
● TotalTimeMs
● Total time taken to service a request (produce/fetch-
consume/fetch-follower)
● Sum of queue time, local time (leader), remote time
(follower response) and response time
● PurgatorySize
● Produce and fetch requests waiting to be satisfied
● BytesInPerSec/BytesOutPerSec
● Network throughput gives insights where potential
bottlenecks may lie
● LeaderElectionRateAndTimeMs
● Could translate to an offline broker
● ActiveControllerCount
● OfflinePartitionsCount

37
Observability
● Host-level broker key metrics
● Page cache hits ratio
● Kafka leverages OS kernel's page cache in order to provide a
reliable (disk-based) while performant (in-memory) message
pipeline
● Disk usage
● CPU usage
● Not a common bottleneck, even with compression enabled
● Network bytes sent/received
● Correlate with TCP retransmissions and packets being dropped
to identify network issues
● JVM metrics
● GC count
● GC time

38
Observability
● Kafka producer key metrics
● Request rate
● Response rate
● Rate of responses received from brokers. Behavior
depends on semantics due to request.required.acks
● Request latency
● Outgoing byte rate
● Identify sources of excessive traffic
● I/O wait time
● Excessive wait times mean the producers can't get data
fast enough

39
Observability
● Kafka consumer key metrics
● ConsumerLag/MaxLag
● Related to the use case, but should be low in cases
where consumer is processing real-time data
● BytesPerSec
● MessagesPerSec
● MinFetchRate

Thanks!
Please reach out for questions and feedbacks
Otávio Carvalho
ocarvalh@thoughtworks.com
@otaviocarvalho
ThoughtWorks Brazil – Porto Alegre Office

41
Appendix A - Monitoring Tools
● Kafka Manager
● https://github.com/yahoo/kafka-manager
● Confluent Control Center
● https://www.confluent.io/confluent-control-center/
● LinkedIn Burrow
● https://github.com/linkedin/Burrow
● Zalando Remora
● https://github.com/zalando-incubator/remora
● Manual injestion via JMX
● https://softwaremill.com/monitoring-apache-kafka-with-influx
db-grafana/

42
Appendix B - Operations Tools
● Confluent Auto Data Rebalancing
● https://docs.confluent.io/current/kafka/rebalancer/rebalancer.
html
● Rebalance manually
● https://blog.imaginea.com/how-to-rebalance-topics-in-kafka-clu
ster/
● https://gquintana.github.io/2016/10/17/Scaling-Kafka.html
● Automated cluster healing
● https://github.com/pinterest/doctorkafka
● Dynamic workload rebalance and self-healing of a Kafka
cluster
● https://github.com/linkedin/cruise-control

43
Links & References
● Kafka: The Definitive Guide –
http://shop.oreilly.com/product/0636920044123.do
● Kafka Official Documentation –
https://kafka.apache.org/documentation/
● Amazon Best Practices –
https://aws.amazon.com/blogs/big-data/best-practices-for-running-apac
he-kafka-on-aws/
● Tuning Kafka for Low Latency Guaranteed Messaging (LinkedIn) –
https://www.youtube.com/watch?v=oQe7PpDDdzA
● Partitioning Trade-offs (Confluent) –
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartiti
ons-in-a-kafka-cluster/
● Topic Partitioning (New Relic) –
https://blog.newrelic.com/engineering/effective-strategies-kafka-topic-par
titioning/
● Exactly-once semantics (Confluent) –
https://www.confluent.io/blog/exactly-once-semantics-are-possible-here
s-how-apache-kafka-does-it/
● RabbitMQ semantics – https://www.rabbitmq.com/semantics.html

44
Links & References
● Kafka at Scale in the Cloud (Netflix) –
https://www.slideshare.net/ConfluentInc/kafka-at-scale-in-the-cloud
● Kafka In-Depth Log Index Structure –
https://medium.com/@mukeshkumar_46704/in-depth-kafka-message-queue
-principles-of-high-reliability-42e464e66172
● Kafkapocalypse (New Relic) –
https://blog.newrelic.com/engineering/new-relic-kafkapocalypse/
● Monitoring Kafka Performance Metrics (Datadog) –
https://www.datadoghq.com/blog/monitoring-kafka-performance-metric
s/

Non-Kafkaesque Apache Kafka - Yottabyte 2018

More Related Content

What's hot (20)

Similar to Non-Kafkaesque Apache Kafka - Yottabyte 2018 (20)

More from Otávio Carvalho (8)

Recently uploaded (20)

Non-Kafkaesque Apache Kafka - Yottabyte 2018