SlideShare a Scribd company logo
Distributed messaging with
Apache Kafka
Saumitra Srivastav
@_saumitra_
http://www.meetup.com/Bangalore-Apache-Kafka-Group/
1
Introduction
Kafka is a:
• distributed
• replicated
• persistent
• partitioned
• high throughput
• pub-sub
messaging system.
Incubated at LinkedIn. Written in Scala.
2
Demo Application
Twitter stream analytics
3
Stream
Producer
Broker-1 Broker-2 Broker-3
Twitter
Streaming API
Kafka Cluster
Solr-1
Realtime search
Solr-2 Cassandra-1
Data Store for longer retention
Cassandra-2
Sentiment Analysis
4
Terminology
Topics: categories in which message
feed is maintained
Producer: Processes that publish
messages to a Kafka topic.
Consumers: processes that subscribe
to topics and process the feed of
published messages
Brokers: Servers which form a kafka
cluster and act as a data transport
channel between producers and
consumers.
Producer Producer
Consumer Consumer
Broker
Kafka Cluster
Broker Broker
5
Simplified View of a Kafka System
ZookeeperBroker 1 Broker 2 Broker 3
Producer 1 Producer 2
Consumer 1 Consumer 2 Consumer 3
6
Topics and Partitions
TOPIC – 1
(error log)
TOPIC – 2
(security log)
7
Partitions
• Each partition is an ordered, immutable sequence of
messages.
• Messages are continuously appended to it.
• Each message in partition is assigned a unique
sequential id number called offset.
• Any message in partition can be accessed using this
offset.
8
Partitions
• Partition servers 2 purposes:
1. Scaling
2. Parallelism
• Scaling
A topic can be divided into multiple partition, and
each partition can be on different servers.
• Parallelism
A consumer can consume from multiple partitions at
same time(while maintaining ordering guarantee).
9
Distribution & Replication
• The partitions of the log are distributed over Kafka cluster
• Each server handles data and requests for some number of
partition
• Each partition is replicated for fault tolerance.
• Each partition has one server which acts as the leader.
• The leader handles all read and write requests for the
partition.
• Followers keep replicating the leader.
10
Producers
• Producers publish data to the topics of their choice.
• Producer can choose the topic’s partition to which
message should be assigned.
• Partition can be selected in a round robin manner for
load balancing.
• Kafka doesn’t care about serialization format. All it
need is a byte array.
11
Consumers
• Other messaging systems basically follow 2 models:
• Queuing
• Publish-Subscribe
• Kafka uses a concept of consumer group which generalizes
both these models.
• Consumers label themselves with a consumer group name
• Each message published to a topic, is delivered to one
consumer instance, within each subscribing consumer group.
12
Consumers
13
Consumer Groups
ZookeeperBroker 1 Broker 2 Broker 3
Producer 1 Producer 2
Consumer 1 Consumer 2 Consumer 3
Consumer-Group A Consumer-Group B
14
Consumer groups
Zookeeper
Broker 1
Topic-1
Broker 2
Topic-1
Broker 3
Topic-1
Producer 1 Producer 2
Consumer 1
Consumer-Group A Consumer-Group B
P0 P3 P5 P2 P4
Consumer 2 Consumer 3
15
Message Persistence
• Unlike other messaging system, message are not
deleted on consumption.
• Message are retained until a configurable period of
time after which they are deleted (even if they are
NOT consumed).
• Consumers can re-consume any chunk of older
message using message offset.
• Kafka performance is effectively constant with respect
to data size, so huge data size is not an issue.
16
Demo
Running a multi-broker kafka cluster
17
Guarantees
1. Ordering guarantee
• Messages sent by a producer to a particular topic partition will be
appended in the order they are sent.
• A consumer instance sees messages in the order they are stored in the
log.
2. At least once delivery
3. Fault tolerance
For a topic with replication factor N, up to N-1 server failures will not cause
any data loss.
4. No corruption of data:
• over the network
• On the disk
18
Demo
Consumer/Producer Java API
19
Misc Design features
1. Stateless broker
• Each consumer maintains its own state(offset)
2. Load balancing
3. Asynchronous send
4. Push/pull model instead of Push/Push
5. Consumer Position
6. Offline Data Load
7. Simple API
8. Low Overhead
9. Batch send and receive
10. No message caching in JVM
11. Rely on file system buffering
• mostly sequential access patterns
12. Zero-copy transfer: file->socket
20
Use Cases
1. Messaging
2. Website Activity Tracking
3. Metrics
4. Log Aggregation
5. Stream Processing
21
Thanks
Website: http://kafka.apache.org/
Doc: http://kafka.apache.org/documentation.html
Mailing Lists: users@kafka.apache.org
Questions?
22

More Related Content

What's hot (20)

PDF
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
HostedbyConfluent
 
PDF
Service mesh(istio) monitoring
Jeong-Ho Na
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PPTX
Apache kafka
Viswanath J
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
PPTX
Tuning kafka pipelines
Sumant Tambe
 
PDF
Kafka 101 and Developer Best Practices
confluent
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PDF
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
Amazon Web Services Korea
 
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
PPTX
Kafka 101
Clement Demonchy
 
PPTX
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
PPTX
Apache kafka
Long Nguyen
 
PDF
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
 
PDF
Disaster Recovery Plans for Apache Kafka
confluent
 
PDF
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Schema Registry 101 with Bill Bejeck | Kafka Summit London 2022
HostedbyConfluent
 
Service mesh(istio) monitoring
Jeong-Ho Na
 
Introduction to Apache Kafka
Jeff Holoman
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Apache kafka
Viswanath J
 
Introduction to Apache Kafka
AIMDek Technologies
 
Introduction to apache kafka
Dimitris Kontokostas
 
Tuning kafka pipelines
Sumant Tambe
 
Kafka 101 and Developer Best Practices
confluent
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
게임사를 위한 Amazon GameLift 세션 - 이정훈, AWS 솔루션즈 아키텍트
Amazon Web Services Korea
 
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Jean-Paul Azar
 
Kafka 101
Clement Demonchy
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Spring Boot+Kafka: the New Enterprise Platform
VMware Tanzu
 
Apache kafka
Long Nguyen
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
 
Disaster Recovery Plans for Apache Kafka
confluent
 
Apache Kafka Architecture & Fundamentals Explained
confluent
 

Viewers also liked (20)

PDF
Scaling MQTT With Apache Kafka
kellogh
 
PDF
Low Latency Mobile Messaging using MQTT
Henrik Sjöstrand
 
PDF
Introducing MQTT
Andy Piper
 
PDF
MQTT - MQ Telemetry Transport for Message Queueing
Peter R. Egli
 
PDF
Introduction MQTT in English
Eric Xiao
 
PPTX
Processing IoT Data with Apache Kafka
Matthew Howlett
 
PDF
MQTT - A practical protocol for the Internet of Things
Bryan Boyd
 
PPTX
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger
 
PPTX
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
ODP
Intoduction to Android Development
Ben Hardill
 
ODP
MQTT - The Internet of Things Protocol
Ben Hardill
 
PPTX
Copy of Kafka-Camus
Deep Shah
 
PDF
Introduction to Erlang/(Elixir) at a Webilea Hands-On Session
André Graf
 
PDF
VerneMQ @ Paris Erlang User Group June 29th 2015
André Graf
 
PPTX
Drools Ecosystem
Saumitra Srivastav
 
PDF
Friends of Solr - Nutch & HDFS
Saumitra Srivastav
 
PDF
An introduction to MQTT - Pub / Sub for the masses
Dominik Obermaier
 
PDF
MQTT - Protocol for the Internet of Things
University of Pretoria
 
PDF
Mqtt – a protocol for the internet of things
Rahul Gupta
 
PDF
Apache Solr Workshop
Saumitra Srivastav
 
Scaling MQTT With Apache Kafka
kellogh
 
Low Latency Mobile Messaging using MQTT
Henrik Sjöstrand
 
Introducing MQTT
Andy Piper
 
MQTT - MQ Telemetry Transport for Message Queueing
Peter R. Egli
 
Introduction MQTT in English
Eric Xiao
 
Processing IoT Data with Apache Kafka
Matthew Howlett
 
MQTT - A practical protocol for the Internet of Things
Bryan Boyd
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Robert Metzger
 
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Intoduction to Android Development
Ben Hardill
 
MQTT - The Internet of Things Protocol
Ben Hardill
 
Copy of Kafka-Camus
Deep Shah
 
Introduction to Erlang/(Elixir) at a Webilea Hands-On Session
André Graf
 
VerneMQ @ Paris Erlang User Group June 29th 2015
André Graf
 
Drools Ecosystem
Saumitra Srivastav
 
Friends of Solr - Nutch & HDFS
Saumitra Srivastav
 
An introduction to MQTT - Pub / Sub for the masses
Dominik Obermaier
 
MQTT - Protocol for the Internet of Things
University of Pretoria
 
Mqtt – a protocol for the internet of things
Rahul Gupta
 
Apache Solr Workshop
Saumitra Srivastav
 
Ad

Similar to Distributed messaging with Apache Kafka (20)

DOCX
KAFKA Quickstart
Vikram Singh Chandel
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PPTX
Kafka tutorial
Srikrishna k
 
PPTX
Apache kafka
Srikrishna k
 
PPTX
Apache kafka
Srikrishna k
 
PDF
apachekafka-160907180205.pdf
TarekHamdi8
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Apache kafka
Ramakrishna kapa
 
PPTX
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
PPTX
Kafka.pptx
Tarun techme
 
PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
PPTX
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
PPTX
Kafka101
Aparna Pillai
 
ODP
Apache Kafka Demo
Edward Capriolo
 
PPTX
Kafka 101
Aparna Pillai
 
PPTX
Kafka Fundamentals
Ketan Keshri
 
KAFKA Quickstart
Vikram Singh Chandel
 
Kafka tutorial
Srikrishna k
 
Apache kafka
Srikrishna k
 
Apache kafka
Srikrishna k
 
apachekafka-160907180205.pdf
TarekHamdi8
 
An Introduction to Apache Kafka
Amir Sedighi
 
Apache kafka
Ramakrishna kapa
 
Unleashing Real-time Power with Kafka.pptx
Knoldus Inc.
 
Kafka.pptx
Tarun techme
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Introduction to Kafka
Ducas Francis
 
Apache Kafka Introduction
Amita Mirajkar
 
kafka_session_updated.pptx
Koiuyt1
 
Introduction to Kafka and Event-Driven
arconsis
 
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
Kafka101
Aparna Pillai
 
Apache Kafka Demo
Edward Capriolo
 
Kafka 101
Aparna Pillai
 
Kafka Fundamentals
Ketan Keshri
 
Ad

Recently uploaded (20)

PDF
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
SQL for Accountants and Finance Managers
ysmaelreyes
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PDF
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
PDF
Group 5_RMB Final Project on circular economy
pgban24anmola
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
What Is Data Integration and Transformation?
subhashenia
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
PDF
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 
Technical-Report-GPS_GIS_RS-for-MSF-finalv2.pdf
KPycho
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
SQL for Accountants and Finance Managers
ysmaelreyes
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
BinarySearchTree in datastructures in detail
kichokuttu
 
UNISE-Operation-Procedure-InDHIS2trainng
ahmedabduselam23
 
Group 5_RMB Final Project on circular economy
pgban24anmola
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
What Is Data Integration and Transformation?
subhashenia
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
办理学历认证InformaticsLetter新加坡英华美学院毕业证书,Informatics成绩单
Taqyea
 
Data Science Course Certificate by Sigma Software University
Stepan Kalika
 

Distributed messaging with Apache Kafka

  • 1. Distributed messaging with Apache Kafka Saumitra Srivastav @_saumitra_ http://www.meetup.com/Bangalore-Apache-Kafka-Group/ 1
  • 2. Introduction Kafka is a: • distributed • replicated • persistent • partitioned • high throughput • pub-sub messaging system. Incubated at LinkedIn. Written in Scala. 2
  • 4. Stream Producer Broker-1 Broker-2 Broker-3 Twitter Streaming API Kafka Cluster Solr-1 Realtime search Solr-2 Cassandra-1 Data Store for longer retention Cassandra-2 Sentiment Analysis 4
  • 5. Terminology Topics: categories in which message feed is maintained Producer: Processes that publish messages to a Kafka topic. Consumers: processes that subscribe to topics and process the feed of published messages Brokers: Servers which form a kafka cluster and act as a data transport channel between producers and consumers. Producer Producer Consumer Consumer Broker Kafka Cluster Broker Broker 5
  • 6. Simplified View of a Kafka System ZookeeperBroker 1 Broker 2 Broker 3 Producer 1 Producer 2 Consumer 1 Consumer 2 Consumer 3 6
  • 7. Topics and Partitions TOPIC – 1 (error log) TOPIC – 2 (security log) 7
  • 8. Partitions • Each partition is an ordered, immutable sequence of messages. • Messages are continuously appended to it. • Each message in partition is assigned a unique sequential id number called offset. • Any message in partition can be accessed using this offset. 8
  • 9. Partitions • Partition servers 2 purposes: 1. Scaling 2. Parallelism • Scaling A topic can be divided into multiple partition, and each partition can be on different servers. • Parallelism A consumer can consume from multiple partitions at same time(while maintaining ordering guarantee). 9
  • 10. Distribution & Replication • The partitions of the log are distributed over Kafka cluster • Each server handles data and requests for some number of partition • Each partition is replicated for fault tolerance. • Each partition has one server which acts as the leader. • The leader handles all read and write requests for the partition. • Followers keep replicating the leader. 10
  • 11. Producers • Producers publish data to the topics of their choice. • Producer can choose the topic’s partition to which message should be assigned. • Partition can be selected in a round robin manner for load balancing. • Kafka doesn’t care about serialization format. All it need is a byte array. 11
  • 12. Consumers • Other messaging systems basically follow 2 models: • Queuing • Publish-Subscribe • Kafka uses a concept of consumer group which generalizes both these models. • Consumers label themselves with a consumer group name • Each message published to a topic, is delivered to one consumer instance, within each subscribing consumer group. 12
  • 14. Consumer Groups ZookeeperBroker 1 Broker 2 Broker 3 Producer 1 Producer 2 Consumer 1 Consumer 2 Consumer 3 Consumer-Group A Consumer-Group B 14
  • 15. Consumer groups Zookeeper Broker 1 Topic-1 Broker 2 Topic-1 Broker 3 Topic-1 Producer 1 Producer 2 Consumer 1 Consumer-Group A Consumer-Group B P0 P3 P5 P2 P4 Consumer 2 Consumer 3 15
  • 16. Message Persistence • Unlike other messaging system, message are not deleted on consumption. • Message are retained until a configurable period of time after which they are deleted (even if they are NOT consumed). • Consumers can re-consume any chunk of older message using message offset. • Kafka performance is effectively constant with respect to data size, so huge data size is not an issue. 16
  • 17. Demo Running a multi-broker kafka cluster 17
  • 18. Guarantees 1. Ordering guarantee • Messages sent by a producer to a particular topic partition will be appended in the order they are sent. • A consumer instance sees messages in the order they are stored in the log. 2. At least once delivery 3. Fault tolerance For a topic with replication factor N, up to N-1 server failures will not cause any data loss. 4. No corruption of data: • over the network • On the disk 18
  • 20. Misc Design features 1. Stateless broker • Each consumer maintains its own state(offset) 2. Load balancing 3. Asynchronous send 4. Push/pull model instead of Push/Push 5. Consumer Position 6. Offline Data Load 7. Simple API 8. Low Overhead 9. Batch send and receive 10. No message caching in JVM 11. Rely on file system buffering • mostly sequential access patterns 12. Zero-copy transfer: file->socket 20
  • 21. Use Cases 1. Messaging 2. Website Activity Tracking 3. Metrics 4. Log Aggregation 5. Stream Processing 21