SlideShare a Scribd company logo
Apache Kafka is distributed message broker to handle large volume of real-time data
efficiently.
It is used as Pub/Sub messaging system.
Kafka cluster is highly scalable and fault tolerant.
Much higher throughput compared to other message broker such as ActiveMQ or
RabbitMQ
Latency of less than 10ms – real time
Integration with Spark, Flink, Storm, Hadoop and many more Big Data technologies.
Three key capabilities :
1. Publish and subscribe to streams of records, similar to a message queue or enterprise
messaging system
2. Store streams of records in a fault-tolerant durable way
3. Process streams of records as they occur.
Topics :
A topic is a feed name to which records are published.
A topic can have zero, one, or many consumers that subscribe to the data written to it.
Partitions :
For each Topic, data stream are split into partitions.
Each partition is an ordered.
The records in the partitions are assigned a sequential id number called the offset that
uniquely identifies each record within the partition.
The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream
of records produced to them.
The Streams API allows an application to act as a stream processor, consuming an input stream
from one or more topics and producing an output stream to one or more output topics, effectively
transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka
topics to existing applications or data systems.
Order is guaranteed only within a partition.
Once data is written to a partition, it can't be changed.
Data is assigned randomly to a partition – unless a key is provided.
Topic 1
Partition 1
Topic 2
Partition 0
Topic 1
Partition 2
Topic 2
Partition 1
Topic 1
Partition 0
Brokers :
A Kafka cluster is composed of multiple brokers (servers)
Each broker has its own unique ID
Broker1 Broker2 Broker3
Topic 1
Partition 1
Topic 1
Partition 0
Topic 1
Partition 1
Topic 1
Partition 0
Topic Replication Factor :
Broker1 Broker2 Broker3
If Broker2 is down, still Broker1 and Broker3 can serve the data.
Topic 1
Partition 1
Topic 2
Partition 0
Topic 1
Partition 2
Topic 2
Partition 1
Topic 1
Partition 0
Zookeeper :
Broker1 Broker2 Broker3
Zookeeper
Kafka Cluster
Zookeeper :
Zookeeper keeps a list Kafka brokers.
Zookeeper sends notification to Kafka in case of changes such as new topic, broker
dies, broker comes up, topic deleted etc)
Kafka can't work without Zookeeper.
Zookeeper usually operates in an odd quorum (cluster) of servers (1,3,5,7...)
Zookeeper1
(Follower)
Zookeeper3
(Follower)
Zookeeper2
(Leader)
Kafka
Broker1
Kafka
Broker2
Kafka
Broker3
Kafka
Broker4
Kafka
Broker5

More Related Content

What's hot (20)

PPTX
Apache kafka
Srikrishna k
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PDF
Apache Kafka - Free Friday
Otávio Carvalho
 
PPTX
Apache kafka
Srikrishna k
 
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
PPTX
Apache kafka
natashasweety7
 
PPTX
Kafka tutorial
Srikrishna k
 
PPTX
Apache kafka
Jemin Patel
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PPTX
Apache Kafka
emreakis
 
PPTX
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PDF
Kafka as Message Broker
Haluan Irsad
 
PPTX
Introduction to Apache Kafka
AIMDek Technologies
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PDF
Kafka clients and emitters
Edgar Domingues
 
PPTX
Kafka
shrenikp
 
PDF
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 
Apache kafka
Srikrishna k
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Kafka connect 101
Whiteklay
 
An Introduction to Apache Kafka
Amir Sedighi
 
Apache Kafka - Free Friday
Otávio Carvalho
 
Apache kafka
Srikrishna k
 
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
Apache kafka
natashasweety7
 
Kafka tutorial
Srikrishna k
 
Apache kafka
Jemin Patel
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
Apache Kafka
emreakis
 
Reducing Microservice Complexity with Kafka and Reactive Streams
jimriecken
 
Introduction to Apache Kafka
Jeff Holoman
 
Kafka as Message Broker
Haluan Irsad
 
Introduction to Apache Kafka
AIMDek Technologies
 
Kafka clients and emitters
Edgar Domingues
 
Kafka
shrenikp
 
Devoxx Morocco 2016 - Microservices with Kafka
László-Róbert Albert
 

Similar to Apache kafka introduction (20)

PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
PPTX
A Short Presentation on Kafka
Mostafa Jubayer Khan
 
DOCX
KAFKA Quickstart
Vikram Singh Chandel
 
PPTX
Kafka overview
Shanki Singh Gandhi
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Kafka and Kafka Streams Intro at iwomm in London
Erik Schmiegelow
 
DOCX
Fundamentals of Apache Kafka
Avanish Chauhan
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
PPTX
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
PDF
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
PPTX
Kafkha real time analytics platform.pptx
dummyuseage1
 
PDF
Data Pipelines with Apache Kafka
Ben Stopford
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PDF
apachekafka-160907180205.pdf
TarekHamdi8
 
PPTX
Kafka overview v0.1
Mahendran Ponnusamy
 
PPTX
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
PDF
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
PPTX
Intoduction to Apache Kafka
Veysel Gündüzalp
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
kafka_session_updated.pptx
Koiuyt1
 
A Short Presentation on Kafka
Mostafa Jubayer Khan
 
KAFKA Quickstart
Vikram Singh Chandel
 
Kafka overview
Shanki Singh Gandhi
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka and Kafka Streams Intro at iwomm in London
Erik Schmiegelow
 
Fundamentals of Apache Kafka
Avanish Chauhan
 
Introduction to Kafka and Event-Driven
arconsis
 
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Kafkha real time analytics platform.pptx
dummyuseage1
 
Data Pipelines with Apache Kafka
Ben Stopford
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
apachekafka-160907180205.pdf
TarekHamdi8
 
Kafka overview v0.1
Mahendran Ponnusamy
 
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Intoduction to Apache Kafka
Veysel Gündüzalp
 
Ad

Recently uploaded (20)

PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
Ad

Apache kafka introduction

  • 1. Apache Kafka is distributed message broker to handle large volume of real-time data efficiently. It is used as Pub/Sub messaging system. Kafka cluster is highly scalable and fault tolerant. Much higher throughput compared to other message broker such as ActiveMQ or RabbitMQ Latency of less than 10ms – real time Integration with Spark, Flink, Storm, Hadoop and many more Big Data technologies.
  • 2. Three key capabilities : 1. Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system 2. Store streams of records in a fault-tolerant durable way 3. Process streams of records as they occur.
  • 3. Topics : A topic is a feed name to which records are published. A topic can have zero, one, or many consumers that subscribe to the data written to it. Partitions : For each Topic, data stream are split into partitions. Each partition is an ordered. The records in the partitions are assigned a sequential id number called the offset that uniquely identifies each record within the partition.
  • 4. The Producer API allows an application to publish a stream of records to one or more Kafka topics. The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them. The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.
  • 5. Order is guaranteed only within a partition. Once data is written to a partition, it can't be changed. Data is assigned randomly to a partition – unless a key is provided.
  • 6. Topic 1 Partition 1 Topic 2 Partition 0 Topic 1 Partition 2 Topic 2 Partition 1 Topic 1 Partition 0 Brokers : A Kafka cluster is composed of multiple brokers (servers) Each broker has its own unique ID Broker1 Broker2 Broker3
  • 7. Topic 1 Partition 1 Topic 1 Partition 0 Topic 1 Partition 1 Topic 1 Partition 0 Topic Replication Factor : Broker1 Broker2 Broker3 If Broker2 is down, still Broker1 and Broker3 can serve the data.
  • 8. Topic 1 Partition 1 Topic 2 Partition 0 Topic 1 Partition 2 Topic 2 Partition 1 Topic 1 Partition 0 Zookeeper : Broker1 Broker2 Broker3 Zookeeper Kafka Cluster
  • 9. Zookeeper : Zookeeper keeps a list Kafka brokers. Zookeeper sends notification to Kafka in case of changes such as new topic, broker dies, broker comes up, topic deleted etc) Kafka can't work without Zookeeper. Zookeeper usually operates in an odd quorum (cluster) of servers (1,3,5,7...) Zookeeper1 (Follower) Zookeeper3 (Follower) Zookeeper2 (Leader) Kafka Broker1 Kafka Broker2 Kafka Broker3 Kafka Broker4 Kafka Broker5