SlideShare a Scribd company logo
Himani Arora
Software Consultant
Knoldus Software LLP
Satendra Kumar
Sr. Software Consultant
Knoldus Software LLP
Introduction to Apache Kafka-01
Topics Covered
➢ What is Kafka
➢ Why Kafka
➢ High level overview
➢ Use cases
➢ Key terminology
➢ Partitions distribution over brokers
➢ Replication protocol
➢ Demo
What is Kafka
➢ publish-subscribe messaging system
➢ fast
➢ distributed by Design
➢ fault tolerant
➢ scalable
➢ durable
➢ written in Scala
➢ free and open source
Building Data Pipelines
Building Data Pipelines
Building Data Pipelines
Building Data Pipelines
Building Data Pipelines
Building Data Pipelines
This is Bad data pipelining
Building Data Pipelines
Kafka decouples
Data Pipelines
High level overview
High level overview
Use cases
➢ Messaging
➢ Website Activity Tracking
➢ Metrics
➢ Log Aggregation
➢ Real-Time Stream Processing
➢ Event Sourcing
➢ Commit Log
➢ Internet Of Things (IoT)
Key Terminology
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Anatomy of a Topic
For each topic, the Kafka cluster maintains a partitioned log that looks like this:
http://kafka.apache.org/images/log_anatomy.png
Number of partition for a Topic is configurable. In this example number of partition are three.
Reading & Writing From Topic
https://content.linkedin.com/content/dam/engineering/en-us/blog/migrated/partitioned_log_0.png
Topic with two partition:
Partitions distribution
Partitions distribution
Partitions distribution
Partitions distribution
Partitions distribution
Partitions distribution
Partitions distribution
Partitions Distribution
Who is responsible for these tasks ?
Partitions Distribution
Partitions Distribution
Partitions Distribution
Responsibility Of Controller
● managing the states of partitions and replicas
● performing administrative tasks like reassigning partitions
Roles For Partition
➢ Each partition has one server which acts as the "leader" and zero or more servers which act as
"followers".
➢ The leader handles all read and write requests for the partition while the followers passively replicate
the leader.
➢ If the leader fails, one of the followers will automatically become the new leader.
➢ Each server acts as a leader for some of its partitions and a follower for others so load is well
balanced within the cluster.
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Replication Protocol
Demo
Basic Operations
● List all topics created:
bin/kafka-topics.sh --list --zookeeper localhost:2181
● Describe a topic:
– bin/kafka-topics.sh --zookeeper localhost:2181 --topic topic-name –describe
Basic Operations
Adding a topic:
$ bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 1 --topic topic_name
Modifying a topic
$ bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --alter --topic my_topic_name --partitions 4
Deleting a topic
bin/kafka-topics.sh --zookeeper zk_host:localhost:2181 --delete --topic my_topic_name
Basic Operations
Balancing Leadership:
$ bin/kafka-preferred-replica-election.sh --zookeeper zk_host:localhost:2181
– Or
Also configure Kafka to do this automatically by setting the following configuration :
auto.leader.rebalance.enable = true
References
● http://kafka.apache.org/documentation.html
● https://engineering.linkedin.com/kafka/benchmarking-apache-k
● http://www.confluent.io/blog/tutorial-getting-started-with-the-new
● http://kafka-summit.org
● http://www.confluent.io/blog/hands-free-kafka-replication-a-less
Question & Option[Answer]
Thanks
Presenters:
@_himaniarora
@_satendrakumar
Organizer:
@knolspeak
http://www.knoldus.com

More Related Content

What's hot (20)

PDF
Kafka and Spark Streaming
datamantra
 
PDF
kafka
Ariel Moskovich
 
PPTX
Apache Kafka
Joe Stein
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PDF
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
PPTX
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PPTX
Kafka connect 101
Whiteklay
 
PPTX
Current and Future of Apache Kafka
Joe Stein
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
PDF
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
PPTX
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
PPTX
Kafka Connect
Oleg Kuznetsov
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
PDF
Apache kafka
NexThoughts Technologies
 
Kafka and Spark Streaming
datamantra
 
Apache Kafka
Joe Stein
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Introduction to Apache Kafka and why it matters - Madrid
Paolo Castagna
 
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Kafka connect 101
Whiteklay
 
Current and Future of Apache Kafka
Joe Stein
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Introduction to Kafka Streams
Guozhang Wang
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Real time Messages at Scale with Apache Kafka and Couchbase
Will Gardella
 
Kafka Connect
Oleg Kuznetsov
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 

Similar to Introduction to Apache Kafka- Part 1 (20)

PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
PDF
Tips and Tricks for Operating Apache Kafka
All Things Open
 
PDF
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
StreamNative
 
PDF
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
PDF
Structured Streaming with Kafka
datamantra
 
PDF
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
StreamNative
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
PDF
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
PDF
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
PDF
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
PPTX
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
PDF
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
StreamNative
 
PDF
Timothy Spann: Apache Pulsar for ML
Edunomica
 
PDF
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
An Introduction to Apache Kafka
Amir Sedighi
 
Introduction to apache kafka
Samuel Kerrien
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Tips and Tricks for Operating Apache Kafka
All Things Open
 
Introducing Kafka-on-Pulsar: bring native Kafka protocol support to Apache Pu...
StreamNative
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Structured Streaming with Kafka
datamantra
 
Kafka on Pulsar:bringing native Kafka protocol support to Pulsar_Sijie&Pierre
StreamNative
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
LINE Corporation
 
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Mark Bittmann
 
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Building a Messaging Solutions for OVHcloud with Apache Pulsar_Pierre Zemb
StreamNative
 
Timothy Spann: Apache Pulsar for ML
Edunomica
 
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
PPTX
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
PPTX
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
PPTX
Java 17 features and implementation.pptx
Knoldus Inc.
 
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
PPTX
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
PPTX
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
PPTX
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
PPTX
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
PPTX
Intro to Azure Container App Presentation
Knoldus Inc.
 
PPTX
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
PPTX
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
PPTX
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
PPTX
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 
Ad

Recently uploaded (20)

PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Transform Retail with Smart Technology: Power Your Growth with Ginesys
Ginesys
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
PDF
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
PPTX
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
 
PPTX
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
PDF
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
PPTX
Automatic_Iperf_Log_Result_Excel_visual_v2.pptx
Chen-Chih Lee
 
PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
NeuroStrata: Harnessing Neuro-Symbolic Paradigms for Improved Testability and...
Ivan Ruchkin
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Transform Retail with Smart Technology: Power Your Growth with Ginesys
Ginesys
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Capcut Pro Crack For PC Latest Version {Fully Unlocked} 2025
hashhshs786
 
LPS25 - Operationalizing MLOps in GEP - Terradue.pdf
terradue
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
From Chaos to Clarity: Mastering Analytics Governance in the Modern Enterprise
Wiiisdom
 
EO4EU Ocean Monitoring: Maritime Weather Routing Optimsation Use Case
EO4EU
 
3uTools Full Crack Free Version Download [Latest] 2025
muhammadgurbazkhan
 
Continouous failure - Why do we make our lives hard?
Papp Krisztián
 
Automatic_Iperf_Log_Result_Excel_visual_v2.pptx
Chen-Chih Lee
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 

Introduction to Apache Kafka- Part 1

Editor's Notes

  • #10: 1) spend 10 to 20 % time for data integration 2) It is not scalable 3) push based system does not work.
  • #18: Topics are high level abstraction that kafka provides. A topic is a category or feed name to which messages are published.
  • #19: The topics are further divided into partitions.
  • #20: Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.
  • #22: Producers publish data to the topics of their choice. The producer is responsible for choosing which message to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load or it can be done according to some semantic partition function (say based on some key in the message). More on the use of partitioning in a second.
  • #24: 1) The key abstraction in Kafka is the topic. 2) Producers publish their records to a topic, and consumers subscribe to one or more topics. 3) A Kafka topic is just a sharded write-ahead log. 4) Producers append records to these logs and consumers subscribe to changes. 5) Each record is a key/value pair. The key is used for assigning the record to a log partition (unless the publisher specifies the partition directly).
  • #26: Each node in the cluster is called a Kafka broker.
  • #27: Each partition is an ordered, immutable sequence of messages that is continually appended to—a commit log. The messages in the partitions are each assigned a sequential id number called the offset that uniquely identifies each message within the partition.