SlideShare a Scribd company logo
APACHE KAFKA DEMYSTIFIED
Shanki Singh Gandhi
@shankisg
OVERVIEW
Apache Kafka is an open-source stream processing platform developed by
the Apache Software Foundation written in Scala and Java. The project aims
to provide a unified, high-throughput, low-latency platform for handling
real-time data feeds.
KEY POINTS
 Kafka is run as a cluster on one or more servers.
 The Kafka cluster stores streams of records in categories called topics.
 Each record consists of a key, a value, and a timestamp.
CONCEPTS
 Producer: Application that sends the messages.
 Consumer: Application that receives the messages.
 Message: Information that is sent from the producer to a consumer through Apache Kafka.
 Connection: A connection is a TCP connection between your application and the Kafka broker.
 Topic: A Topic is a category/feed name to which messages are stored and published.
 Topic partition: Kafka topics are divided into a number of partitions, which allows you to split data across
multiple brokers.
 Replicas A replica of a partition is a "backup" of a partition. Replicas never read or write data. They are used
to prevent data loss.
 Consumer Group: A consumer group includes the set of consumer processes that are subscribing to a
specific topic.
 Offset: The offset is a unique identifier of a record within a partition. It denotes the position of the consumer
in the partition.
 Node: A node is a single computer in the Apache Kafka cluster.
 Cluster: A cluster is a group of nodes i.e., a group of computers.
KAFKA ARCHITECTURE
KAFKA TOPIC
 Topic is a category or feed name to which records are published.
 Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that
subscribe to the data written to it.
 Each partition is an ordered, immutable sequence of records that is continually appended to structured
commit log.
 The records in the partitions are each assigned a sequential id number called the offset that uniquely
identifies each record within the partition.
PARTITION AND BROKER
KAFKA APIS
• Producer Api
• Consumer Api
• Streams Api
• Connector Api
PRODUCER
 Producers publish data to the topics of their choice.
 Producers write to a single leader, this provides a means of load balancing production so that
each write can be serviced by a separate broker and machine.
CONSUMERS AND CONSUMER GROUPS
 Consumers label themselves with a consumer group name, and each record published to a topic is
delivered to one consumer instance within each subscribing consumer group. Consumer instances
can be in separate processes or on separate machines.
 If all the consumer instances have the same consumer group, then the records will effectively be
load balanced over the consumer instances.
 If all the consumer instances have different consumer groups, then each record will be broadcast to
all the consumer processes.
START KAFKA SERVER
 Download kafka1.0.0 from here
 Extract the code
tar -xzf kafka_2.11-1.0.0.tgz
cd kafka_2.11-1.0.0
 Start zookeeper
bin/zookeeper-server-start.sh config/zookeeper.properties
 Start kafka server
bin/kafka-server-start.sh config/server.properties
BASIC KAFKA CLI COMMANDS
Create topic
 bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
List topics
 bin/kafka-topics.sh --list --zookeeper localhost:2181
Start producer
 bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Start consumer
 bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
CONNECTING KAFKA FROM PYTHON
 kafka-python
 Install kafka-python: pip install kafka-python
 Github: https://github.com/dpkp/kafka-python
 Documentation: https://kafka-python.readthedocs.io/en/master/index.html
PRODUCER SAMPLE CODE
import json
from kafka import KafkaProducer
# Send json data to a kafka topic
producer = KafkaProducer(value_serializer=json.dumps, bootstrap_servers=[kafka_url])
data = {key: value}
producer.send(“my-topic”, data )
CONSUMER SAMPLE CODE
from kafka import KafkaConsumer
# Connecting to kakfa and subscribing to a topic
consumer = KafkaConsumer(“my-topic”, group_id=“my-group”, bootstrap_servers=[kafka_url])
# Start consuming data
for msg in consumer:
print msg
IMPORTANT LINKS
 https://kafka.apache.org/intro
 https://kafka.apache.org/quickstart
 https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is-
apache-kafka.html
 http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
 https://kafka-python.readthedocs.io/en/master/usage.html
DEMO
Q & A
THANKS

More Related Content

What's hot (20)

PPTX
Apache kafka
Srikrishna k
 
PDF
ES & Kafka
Diego Pacheco
 
PPTX
Schema registry
Whiteklay
 
PDF
Kafka meetup - kafka connect
Yi Zhang
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PPSX
Apache kafka introduction
Mohammad Mazharuddin
 
PDF
Kafka clients and emitters
Edgar Domingues
 
PDF
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
PPTX
Event Hub & Kafka
Aparna Pillai
 
PPTX
Apache kafka
natashasweety7
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PDF
Deploying Kafka on DC/OS
Kaufman Ng
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PPTX
Kafka: Internals
Knoldus Inc.
 
PPTX
Apache Kafka
emreakis
 
PPTX
Kafka 101
Aparna Pillai
 
PPTX
Apache kafka
Rahul Jain
 
PPTX
Kafka tutorial
Srikrishna k
 
PDF
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Apache kafka
Srikrishna k
 
ES & Kafka
Diego Pacheco
 
Schema registry
Whiteklay
 
Kafka meetup - kafka connect
Yi Zhang
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Apache kafka introduction
Mohammad Mazharuddin
 
Kafka clients and emitters
Edgar Domingues
 
London Apache Kafka Meetup (Jan 2017)
Landoop Ltd
 
Event Hub & Kafka
Aparna Pillai
 
Apache kafka
natashasweety7
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Deploying Kafka on DC/OS
Kaufman Ng
 
Apache Kafka Introduction
Amita Mirajkar
 
Kafka: Internals
Knoldus Inc.
 
Apache Kafka
emreakis
 
Kafka 101
Aparna Pillai
 
Apache kafka
Rahul Jain
 
Kafka tutorial
Srikrishna k
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 

Similar to Kafka overview (20)

PPTX
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
DOCX
KAFKA Quickstart
Vikram Singh Chandel
 
PPTX
Kafka
shrenikp
 
PDF
Introduction to Kafka and Event-Driven
arconsis
 
PPTX
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PDF
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
PPTX
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
PPTX
Notes leo kafka
Léopold Gault
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
PDF
Kafka zero to hero
Avi Levi
 
PDF
Apache Kafka - From zero to hero
Apache Kafka TLV
 
PPTX
Introduction to Kafka
Ducas Francis
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PPTX
Kafka overview v0.1
Mahendran Ponnusamy
 
PDF
Kafka Training Online | Apache Kafka Course
Accentfuture
 
PDF
Kafka Deep Dive
Knoldus Inc.
 
PPTX
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
PDF
apachekafka-160907180205.pdf
TarekHamdi8
 
Kafka.pptx (uploaded from MyFiles SomnathDeb_PC)
somnathdeb0212
 
KAFKA Quickstart
Vikram Singh Chandel
 
Kafka
shrenikp
 
Introduction to Kafka and Event-Driven
arconsis
 
Introduction to Kafka and Event-Driven
Dimosthenis Botsaris
 
An Introduction to Apache Kafka
Amir Sedighi
 
Apache Kafka Women Who Code Meetup
Snehal Nagmote
 
Distributed messaging with Apache Kafka
Saumitra Srivastav
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
Notes leo kafka
Léopold Gault
 
kafka_session_updated.pptx
Koiuyt1
 
Kafka zero to hero
Avi Levi
 
Apache Kafka - From zero to hero
Apache Kafka TLV
 
Introduction to Kafka
Ducas Francis
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Kafka overview v0.1
Mahendran Ponnusamy
 
Kafka Training Online | Apache Kafka Course
Accentfuture
 
Kafka Deep Dive
Knoldus Inc.
 
Fundamentals and Architecture of Apache Kafka
Angelo Cesaro
 
apachekafka-160907180205.pdf
TarekHamdi8
 
Ad

Recently uploaded (20)

PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Build a Custom Agent for Agentic Testing.pptx
klpathrudu
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
Ad

Kafka overview

  • 1. APACHE KAFKA DEMYSTIFIED Shanki Singh Gandhi @shankisg
  • 2. OVERVIEW Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
  • 3. KEY POINTS  Kafka is run as a cluster on one or more servers.  The Kafka cluster stores streams of records in categories called topics.  Each record consists of a key, a value, and a timestamp.
  • 4. CONCEPTS  Producer: Application that sends the messages.  Consumer: Application that receives the messages.  Message: Information that is sent from the producer to a consumer through Apache Kafka.  Connection: A connection is a TCP connection between your application and the Kafka broker.  Topic: A Topic is a category/feed name to which messages are stored and published.  Topic partition: Kafka topics are divided into a number of partitions, which allows you to split data across multiple brokers.  Replicas A replica of a partition is a "backup" of a partition. Replicas never read or write data. They are used to prevent data loss.  Consumer Group: A consumer group includes the set of consumer processes that are subscribing to a specific topic.  Offset: The offset is a unique identifier of a record within a partition. It denotes the position of the consumer in the partition.  Node: A node is a single computer in the Apache Kafka cluster.  Cluster: A cluster is a group of nodes i.e., a group of computers.
  • 6. KAFKA TOPIC  Topic is a category or feed name to which records are published.  Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.  Each partition is an ordered, immutable sequence of records that is continually appended to structured commit log.  The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition.
  • 8. KAFKA APIS • Producer Api • Consumer Api • Streams Api • Connector Api
  • 9. PRODUCER  Producers publish data to the topics of their choice.  Producers write to a single leader, this provides a means of load balancing production so that each write can be serviced by a separate broker and machine.
  • 10. CONSUMERS AND CONSUMER GROUPS  Consumers label themselves with a consumer group name, and each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.  If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.  If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
  • 11. START KAFKA SERVER  Download kafka1.0.0 from here  Extract the code tar -xzf kafka_2.11-1.0.0.tgz cd kafka_2.11-1.0.0  Start zookeeper bin/zookeeper-server-start.sh config/zookeeper.properties  Start kafka server bin/kafka-server-start.sh config/server.properties
  • 12. BASIC KAFKA CLI COMMANDS Create topic  bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test List topics  bin/kafka-topics.sh --list --zookeeper localhost:2181 Start producer  bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test Start consumer  bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
  • 13. CONNECTING KAFKA FROM PYTHON  kafka-python  Install kafka-python: pip install kafka-python  Github: https://github.com/dpkp/kafka-python  Documentation: https://kafka-python.readthedocs.io/en/master/index.html
  • 14. PRODUCER SAMPLE CODE import json from kafka import KafkaProducer # Send json data to a kafka topic producer = KafkaProducer(value_serializer=json.dumps, bootstrap_servers=[kafka_url]) data = {key: value} producer.send(“my-topic”, data )
  • 15. CONSUMER SAMPLE CODE from kafka import KafkaConsumer # Connecting to kakfa and subscribing to a topic consumer = KafkaConsumer(“my-topic”, group_id=“my-group”, bootstrap_servers=[kafka_url]) # Start consuming data for msg in consumer: print msg
  • 16. IMPORTANT LINKS  https://kafka.apache.org/intro  https://kafka.apache.org/quickstart  https://www.cloudkarafka.com/blog/2016-11-30-part1-kafka-for-beginners-what-is- apache-kafka.html  http://blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/  https://kafka-python.readthedocs.io/en/master/usage.html
  • 17. DEMO