SlideShare a Scribd company logo
Introduction to Apache Kafka
Shiao-An Yuan
@sayuan
2017-12-13
Kafka
Producer
Kafka Overview
Consumer
.. 8 7 6 5 4 3 2 1 0
push pull
ZooKeeper
● Developed by LinkedIn
● Written in Scala
● Message queues
Topic
Why We Need Message Queues?
● Buffering
● Decoupling
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Use Cases
● Messaging
● Website Activity Tracking
● Metrics
● Log Aggregation
● Stream Processing
● Event Sourcing
● Commit Log
Features
● Durability
● Scalability
● Publish-subscribe
● Ordering
● High Availability
● High Throughput
● Delivery Semantics
Kafka
Producer
Let’s Add These Features One by One
Consumer
.. 8 7 6 5 4 3 2 1 0
push pull
Topic
Durability
Topic
Kafka
9 8 7 6 5 4 3 2 1 0
Producer
Segment Files
Consumer
Scalability
Kafka
Broker 1
Partition 0.. 8 7 6 5 4 3 2 .. ..
Broker 2
Partition 1.. 9 8 7 6 5 4 .. .. ..
Producer
Partitioning
Consumer
Kafka
Broker 1
Partition 0.. 8 7 6 5 4 3 2 .. ..
Broker 2
Partition 1.. 9 8 7 6 5 4 .. .. ..
Producer
Partitioning
Consumer
Producer
Producer
ConsumerKafka
Broker 1
Partition 0.. 8 7 6 5 4 3 2 .. ..
Broker 2
Partition 1.. 9 8 7 6 5 4 .. .. ..
Producer
Partitioning
Producer
Producer
A1
A2
A3
Publish-subscribe
Kafka
Broker 1
Partition 0.. 8 7 6 5 4 3 2 .. ..
Broker 2
Partition 1.. 9 8 7 6 5 4 .. .. ..
Producer
Consumer
Group A
Consumer Groups
A1
A2
Consumer
Group B
B1
Ordering
Consumer
Group A
Kafka
Broker 1
Partition 0.. 14 12 10 8 6 4 2 .. ..
Broker 2
Partition 1.. 13 11 9 7 5 3 .. .. ..
Producer
Partitioned by Key
A1
A2
High Availability
Kafka
Broker 1
Partition 0
(Leader)
.. 8 7 6 5 4 3 2 .. ..
Partition 1
(Follower).. 9 8 7 6 5 4 .. .. ..
Broker 2
Partition 1
(Leader)
.. 9 8 7 6 5 4 .. .. ..
Partition 0
(Follower)
.. 8 7 6 5 4 3 2 .. ..
Producer Consumer
Replication
Kafka
Broker 1
Partition 0
(Leader)
.. 8 7 6 5 4 3 2 .. ..
Partition 1
(Follower).. 9 8 7 6 5 4 .. .. ..
Broker 2
Partition 1
(Leader)
.. 9 8 7 6 5 4 .. .. ..
Partition 0
(Follower)
.. 8 7 6 5 4 3 2 .. ..
Producer Consumer
Only Produce to / Consume from Leader
Kafka
Broker 1
Partition 0
(Leader)
.. 8 7 6 5 4 3 2 .. ..
Partition 1
(Follower).. 9 8 7 6 5 4 .. .. ..
Broker 2
Partition 1
(Leader)
.. 9 8 7 6 5 4 .. .. ..
Partition 0
(Follower)
.. 8 7 6 5 4 3 2 .. ..
Producer Consumer
Follower Acts Like A Consumer
High Throughput
● Constant time suffices
● End-to-end batch compression
● Sequential access
● OS page cache
● Zero-copy
Sequential Access
http://queue.acm.org/detail.cfm?id=1563874
https://www.slideshare.net/popcornylu/jcconf-apache-kafka
Zero-copy
https://www.ibm.com/developerworks/linux/library/j-zerocopy/
Introduction to Apache Kafka
Benchmark
Pulling & Offset
Kafka
Broker 1
Partition 0
(Leader)
.. 8 7 6 5 4 3 2 .. ..
Broker 2
Partition 1
(Leader)
.. 9 8 7 6 5 4 .. .. ..
Producer
Consumer
Group A
A1
A2
Consumer
Group B
B1
Delivery Semantics
● At least once
● At most once
● Exactly once
KafkaProducer Consumer
Delivery Semantics (Produce)
● At least once
○ Retry if no ACK
● At most once
○ Never retry
● Exactly once
○ Idempotence (new in 0.11)
KafkaProducer
Delivery Semantics (Consume)
● At least once
○ Process messages, then commit offsets
● At most once
○ Commit offsets, then process messages
● Exactly once
○ At least once & idempotence
Kafka Consumer
Producer
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
val f: java.util.concurrent.Future[RecordMetadata] =
producer.send(new ProducerRecord[String, String]("my-topic", "key", "value"))
f.get // sync
producer.close
Consumer
val props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put("group.id", "group1") // consumer group
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](props)
consumer.subscribe(Arrays.asList("my-topic"))
while (true) {
val records = consumer.poll(1000) // timeout for long pull
for (record <- records.asScala)
println(record.partition, record.offset, record.key, record.value)
consumer.commitSync
}
Describe Topic
$ bin/kafka-topics.sh --describe --zookeeper "$zookeeper" --topic "test"
Topic:test PartitionCount:8 ReplicationFactor:2 Configs:retention.ms=7200000
Topic: test Partition: 0 Leader: 1 Replicas: 1,2 Isr: 1,2
Topic: test Partition: 1 Leader: 2 Replicas: 2,3 Isr: 2,3
Topic: test Partition: 2 Leader: 3 Replicas: 3,4 Isr: 3,4
Topic: test Partition: 3 Leader: 4 Replicas: 4,1 Isr: 1,4
Topic: test Partition: 4 Leader: 1 Replicas: 1,3 Isr: 1,3
Topic: test Partition: 5 Leader: 2 Replicas: 2,4 Isr: 2,4
Topic: test Partition: 6 Leader: 3 Replicas: 3,1 Isr: 1,3
Topic: test Partition: 7 Leader: 4 Replicas: 4,2 Isr: 4,2
Consumer Offset Checker
$ bin/kafka-consumer-groups.sh --bootstrap-server "localhost:9092" 
--describe --group "consumer1"
Group Topic Pid Offset logSize Lag
consumer1 test 0 60057 80539 20482
consumer1 test 1 47632 66548 18916
consumer1 test 2 11099 30020 18921
consumer1 test 3 45640 65214 19574
consumer1 test 4 60408 61840 1432
consumer1 test 5 76674 96495 19821
consumer1 test 6 63305 82647 19342
consumer1 test 7 06678 25373 18695
https://github.com/sayuan/kafka-offset-exporter
Monitoring
Cleanup Policy
● delete
● compact: retain the latest value for each key
Compaction
Old / New Consumer API
● Old API
○ --zookeeper localhost:9092
○ Offsets stored in ZooKeeper
● New API
○ --bootstrap-server localhost:9092
○ Offsets stored in Kafka’s internal topic
About Partitions
● Support increase the number of partitions only
● Configs (broker & topic)
○ retention.ms
○ retention.bytes
○ segment.bytes
Adding Brokers
● Reassigned partitions
○ Load balancing
○ Overhead
○ Throttle
● SiftScience/kafka-assigner
○ Minimized data movement
○ ekoontz/kafka-assigner
https://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams/
Kafka Connect
● Problems
○ Schema management
○ Fault tolerance
○ Parallelism
○ Latency
○ Delivery semantics
○ Operations and monitoring
● A framework
● Connectors developed by Confluent
○ S3, Elasticsearch, HDFS, JDBC
● 19 certified connectors developed by vendors
● 67 others connectors listed on official site
Kafka Stream
● Problems
○ Partitioning & scalability
○ Semantics & fault tolerance
○ State
○ Windowing & time
○ Re-processing
● A client library
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
https://www.confluent.io/blog/chain-services-exactly-guarantees/
Thanks

More Related Content

What's hot (20)

PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Apache kafka
Kumar Shivam
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PDF
Apache Kafka - Martin Podval
Martin Podval
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Apache Kafka
emreakis
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PDF
Kafka Streams: What it is, and how to use it?
confluent
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
PPTX
Apache kafka
Srikrishna k
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PPTX
Apache kafka
Jemin Patel
 
PPTX
Apache Kafka - Overview
CodeOps Technologies LLP
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Stream processing using Kafka
Knoldus Inc.
 
Apache kafka
Kumar Shivam
 
Introduction to Apache Kafka
Jeff Holoman
 
Apache Kafka - Martin Podval
Martin Podval
 
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka
emreakis
 
Kafka 101 and Developer Best Practices
confluent
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Introduction to Kafka Streams
Guozhang Wang
 
An Introduction to Apache Kafka
Amir Sedighi
 
Kafka Streams: What it is, and how to use it?
confluent
 
Introduction to apache kafka
Dimitris Kontokostas
 
Apache kafka
Srikrishna k
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache kafka
Jemin Patel
 
Apache Kafka - Overview
CodeOps Technologies LLP
 

Similar to Introduction to Apache Kafka (20)

PDF
Introduction to apache kafka
Samuel Kerrien
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PDF
Apache Kafka - From zero to hero
Apache Kafka TLV
 
PDF
Kafka zero to hero
Avi Levi
 
PDF
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
PPTX
Kafka RealTime Streaming
Viyaan Jhiingade
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PDF
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
PDF
Kafka Deep Dive
Knoldus Inc.
 
PDF
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
DOCX
KAFKA Quickstart
Vikram Singh Chandel
 
PDF
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
PPTX
04-Kafka.pptx
AdityaGanguly12
 
PPTX
04-Kafka.pptx
MannMehta13
 
PDF
Kafka Technical Overview
Sylvester John
 
PPTX
Kafka Fundamentals
Ketan Keshri
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
Introduction to apache kafka
Samuel Kerrien
 
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Kafka zero to hero
Avi Levi
 
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Kafka RealTime Streaming
Viyaan Jhiingade
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Kafka Deep Dive
Knoldus Inc.
 
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
KAFKA Quickstart
Vikram Singh Chandel
 
Building zero data loss pipelines with apache kafka
Avinash Ramineni
 
04-Kafka.pptx
AdityaGanguly12
 
04-Kafka.pptx
MannMehta13
 
Kafka Technical Overview
Sylvester John
 
Kafka Fundamentals
Ketan Keshri
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Ad

Recently uploaded (20)

PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
The Best NVIDIA GPUs for LLM Inference in 2025.pdf
Tamanna36
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
Ad

Introduction to Apache Kafka