SlideShare a Scribd company logo
www.edureka.co/r-for-analytics
www.edureka.co/apache-Kafka
How Apache Kafka is transforming
Hadoop, Spark & Storm
Slide 2Slide 2Slide 2 www.edureka.co/apache-Kafka
 Million Dollar Question! Why we need Kafka?
 What is Kafka?
 Kafka Architecture
 Kafka with Hadoop
 Kafka with Spark
 Kafka with Storm
 Companies using Kafka
 Demo on Kafka Messaging Service…
What will you learn today?
Million Dollar Question!
Why we need Kafka??
Slide 4Slide 4Slide 4 www.edureka.co/apache-Kafka
Why Kafka is preferred in place of
more traditional brokers like JMS
and AMQP
Why Kafka Cluster?
Slide 5Slide 5Slide 5 www.edureka.co/apache-Kafka
Kafka Producer Performance with Other Systems
Slide 6Slide 6Slide 6 www.edureka.co/apache-Kafka
Kafka Consumer Performance with Other Systems
Slide 7Slide 7Slide 7 www.edureka.co/apache-Kafka
Salient Features of Kafka
Feature Description
High Throughput Support for millions of messages with modest hardware
Scalability Highly scalable distributed systems with no downtime
Replication
Messages can be replicated across cluster, which provides support for multiple
subscribers and also in case of failure balances the consumers
Durability Provides support for persistence of messages to disk which can be further used for
batch consumption
Stream Processing Kafka can be used along with real time streaming applications like spark and storm
Data Loss Kafka with the proper configurations can ensure zero data loss
Slide 8Slide 8Slide 8 www.edureka.co/apache-Kafka
 With Kafka, we can easily handle hundreds and thousands of messages in a second
 The cluster can be expanded with no downtime, making Kafka highly scalable
 Messages are replicated, which provides reliability and durability
 Fault tolerant
Scalable
Kafka Advantages
What is Kafka?
Slide 10Slide 10Slide 10 www.edureka.co/apache-Kafka
 A distributed publish-subscribe messaging system
 Developed at LinkedIn Corporation
 Provides solution to handle all activity stream data
 Fully supported in Hadoop platform
 Partitions real time consumption across cluster of machines
 Provides a mechanism for parallel load into Hadoop
What is Kafka ?
Slide 11Slide 11Slide 11 www.edureka.co/apache-Kafka
Apache Kafka – Overview
Kafka
External
Tracking Proxy
Frontend FrontendFrontend
Background
Service
(Consumer)
Background
Service
(Consumer)
Hadoop DWH
Background
Service
(Producer)
Background
Service
(Producer)
Kafka Architecture
Slide 13Slide 13Slide 13 www.edureka.co/apache-Kafka
Kafka Architecture
Producer
(Front End)
Producer
(Services)
Producer
(Proxies)
Producer
(Adapters)
Other
Producer
Zookeeper
Consumers
(Real Time)
Consumers
(NoSQL)
Consumers
(Hadoop)
Consumers
(Warehouses)
Other
Producer
Kafka Kafka Kafka Kafka Broker
Slide 14Slide 14Slide 14 www.edureka.co/apache-Kafka
 Below table lists the core concepts of Kafka
Kafka Core Components
Feature Description
Topic A category or feed to which messages are published
Producer Publishes messages to the Kafka Topic
Consumer Subscribes and consumes messages from Kafka Topic
Broker Handles hundreds of megabytes of reads and writes
Slide 15Slide 15Slide 15 www.edureka.co/apache-Kafka
Kafka Topic
 A user defined category where the messages are published
 For each topic a partition log is maintained
 Each partition basically contains an ordered, immutable sequence of messages where each message is assigned a
sequential ID number called offset
 Writes to a partition are generally sequential thereby reducing the number of hard disk seeks
 Reading messages from partition can be random
Slide 16Slide 16Slide 16 www.edureka.co/apache-Kafka
 Applications publishes messages to the topic in kafka cluster.
 Can be of any kind like front end, streaming etc.
 While writing messages, it is also possible to attach a key with the
message
Same key will arrive in the same partition
 Doesn’t wait for the acknowledgement from the kafka cluster
 Publishes as much messages as fast as the broker in a cluster can handle
Kafka Producers
Kafka
Clusters
Producer
Producer
Producer
Slide 17Slide 17Slide 17 www.edureka.co/apache-Kafka
Kafka Consumers
 Applications subscribes and consumes messages from the brokers in
Kafka cluster
 Can be of any kind like real time consumers, NoSQL consumers, etc.
 During consumption of messages from a topic, a consumer group
can be configured with multiple consumers
 Each consumer of consumer group reads messages from a unique
subset of partitions in each topic they subscribe to
 Messages with same key arrives at same consumer
 Supports both Queuing and Publish-Subscribe
 Consumers have to maintain the number of messages consumed
Kafka Clusters
Consumer
Consumer
Consumer
Slide 18Slide 18Slide 18 www.edureka.co/apache-Kafka
Each server in the cluster is called a broker
 Handles hundreds of MBs of writes from producers and reads
from consumers
 Retains all published messages irrespective of whether it is
consumed or not
 Retention is configured for n days
 Published messages is available for consumptions for
configured ‘n’ days and thereafter it is discarded
 Works like a queue if consumer instances belong to same
consumer group, else works like publish-subscribe
Kafka Brokers
Slide 19Slide 19Slide 19 www.edureka.co/apache-Kafka
Kafka Producer-Broker-Consumer
Slide 20Slide 20Slide 20 www.edureka.co/apache-Kafka
How Kafka can be used with Hadoop
Slide 21Slide 21Slide 21 www.edureka.co/apache-Kafka
Kafka with Hadoop using Camus
 Camus is LinkedIn's Kafka ->HDFS pipeline
 It is a MapReduce job
Distributes data loads out of Kafka
At LinkedIn, it processes tens of billions of messages/day
All work done with one single Hadoop job
Courtesy : confluent
Slide 22Slide 22Slide 22 www.edureka.co/apache-Kafka
How Kafka can be used with Spark
Slide 23Slide 23Slide 23 www.edureka.co/apache-Kafka
Kafka With Spark Streaming
If messages are stored in ‘n’ partitions, parallel reading makes things faster
Generally in Kafka messages are stored in multiple partitions
Parallel reads can be effectively achieved by spark streaming
Parallelism of reads is achieved by integrating KafkaInputDStream of Spark with Kafka High Level Consumer API
Slide 24 www.edureka.co/apache-Kafka
APPS
Kafka
E V E N T S
STREAMING ENGINE
Kafka With Spark Streaming
Generally in Kafka messages are stored in multiple partitions
Slide 25Slide 25Slide 25 www.edureka.co/apache-Kafka
How Kafka can be used with Storm
Slide 26Slide 26Slide 26 www.edureka.co/apache-Kafka
Kafka With Spark Streaming
Slide 27Slide 27Slide 27 www.edureka.co/apache-Kafka
Companies Using Kafka
Slide 28Slide 28Slide 28 www.edureka.co/apache-Kafka
Get Certified in Apache Kafka from Edureka
Edureka's Real-Time Analytics with Apache Kafka course:
• Carefully designed to provide knowledge and skills to become a successful Kafka Big Data Developer
• Helps you master the concepts of Kafka Cluster, Producers and Consumers, Kafka API, Kafka Integration with Hadoop, Storm
and Spark
• Encompasses the fundamental concepts like Kafka cluster, Kafka API to advance topics such as Kafka integration with
Hadoop, Storm, Spark, Maven etc.
• Online Live Courses: 15 hours
• Assignments: 25 hours
• Project: 20 hours
• Lifetime Access + 24 X 7 Support
Go to www.edureka.co/apache-kafka
Batch starts from 10th October (Weekend Batch)
Thank You
Questions/Queries/Feedback/Survey
Recording and presentation will be made available to you within 24 hours

More Related Content

What's hot (20)

PPTX
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
PDF
Apache kafka
the100rabh
 
PPTX
Current and Future of Apache Kafka
Joe Stein
 
PPTX
Design Patterns for working with Fast Data
MapR Technologies
 
PDF
An Introduction to Apache Kafka
Amir Sedighi
 
PPTX
Real time analytics with Kafka and SparkStreaming
Ashish Singh
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
PPTX
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
PPTX
Kafka & Hadoop - for NYC Kafka Meetup
Gwen (Chen) Shapira
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PPTX
Intro to Apache Kafka
Jason Hubbard
 
PDF
101 ways to configure kafka - badly (Kafka Summit)
Henning Spjelkavik
 
PPTX
Fraud Detection for Israel BigThings Meetup
Gwen (Chen) Shapira
 
PDF
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
AWS Summits
 
PDF
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
PPTX
Apache Kafka at LinkedIn
Discover Pinterest
 
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Apache kafka
the100rabh
 
Current and Future of Apache Kafka
Joe Stein
 
Design Patterns for working with Fast Data
MapR Technologies
 
An Introduction to Apache Kafka
Amir Sedighi
 
Real time analytics with Kafka and SparkStreaming
Ashish Singh
 
Introduction Apache Kafka
Joe Stein
 
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Kafka blr-meetup-presentation - Kafka internals
Ayyappadas Ravindran (Appu)
 
Kafka & Hadoop - for NYC Kafka Meetup
Gwen (Chen) Shapira
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Intro to Apache Kafka
Jason Hubbard
 
101 ways to configure kafka - badly (Kafka Summit)
Henning Spjelkavik
 
Fraud Detection for Israel BigThings Meetup
Gwen (Chen) Shapira
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
AWS Summits
 
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Matt Franklin - Apache Software (Geekfest)
W2O Group
 
Apache Kafka at LinkedIn
Discover Pinterest
 

Similar to How Apache Kafka is transforming Hadoop, Spark and Storm (20)

PPTX
How kafka is transforming hadoop, spark & storm
Edureka!
 
PPTX
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
PPTX
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
PDF
Fault Tolerance with Kafka
Edureka!
 
PPTX
Understanding kafka
AmitDhodi
 
PPTX
Kafka presentation
Mohammed Fazuluddin
 
PPTX
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
PDF
Apache kafka
Janu Jahnavi
 
PPTX
Apache kafka
Janu Jahnavi
 
PPTX
Apache kafka
sureshraj43
 
PPTX
Kafka Basic For Beginners
Riby Varghese
 
PPTX
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
PDF
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PPTX
Apache kafka
Kumar Shivam
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Data Integration with Apache Kafka: What, Why, How
Pat Patterson
 
PDF
Apache kafka
NexThoughts Technologies
 
How kafka is transforming hadoop, spark & storm
Edureka!
 
Apache Kafka: Next Generation Distributed Messaging System
Edureka!
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Edureka!
 
Fault Tolerance with Kafka
Edureka!
 
Understanding kafka
AmitDhodi
 
Kafka presentation
Mohammed Fazuluddin
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Apache kafka
Janu Jahnavi
 
Apache kafka
Janu Jahnavi
 
Apache kafka
sureshraj43
 
Kafka Basic For Beginners
Riby Varghese
 
What is Kafka & why is it Important? (UKOUG Tech17, Birmingham, UK - December...
Lucas Jellema
 
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache kafka
Kumar Shivam
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Apache Kafka Introduction
Amita Mirajkar
 
Data Integration with Apache Kafka: What, Why, How
Pat Patterson
 
Ad

More from Edureka! (20)

PDF
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
PDF
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
PDF
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
PDF
Tableau Tutorial for Data Science | Edureka
Edureka!
 
PDF
Python Programming Tutorial | Edureka
Edureka!
 
PDF
Top 5 PMP Certifications | Edureka
Edureka!
 
PDF
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
PDF
Linux Mint Tutorial | Edureka
Edureka!
 
PDF
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
PDF
Importance of Digital Marketing | Edureka
Edureka!
 
PDF
RPA in 2020 | Edureka
Edureka!
 
PDF
Email Notifications in Jenkins | Edureka
Edureka!
 
PDF
EA Algorithm in Machine Learning | Edureka
Edureka!
 
PDF
Cognitive AI Tutorial | Edureka
Edureka!
 
PDF
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
PDF
Blue Prism Top Interview Questions | Edureka
Edureka!
 
PDF
Big Data on AWS Tutorial | Edureka
Edureka!
 
PDF
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
PDF
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
PDF
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Digital Circuits, important subject in CS
contactparinay1
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 

How Apache Kafka is transforming Hadoop, Spark and Storm

  • 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-Kafka  Million Dollar Question! Why we need Kafka?  What is Kafka?  Kafka Architecture  Kafka with Hadoop  Kafka with Spark  Kafka with Storm  Companies using Kafka  Demo on Kafka Messaging Service… What will you learn today?
  • 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-Kafka Why Kafka is preferred in place of more traditional brokers like JMS and AMQP Why Kafka Cluster?
  • 5. Slide 5Slide 5Slide 5 www.edureka.co/apache-Kafka Kafka Producer Performance with Other Systems
  • 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-Kafka Kafka Consumer Performance with Other Systems
  • 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-Kafka Salient Features of Kafka Feature Description High Throughput Support for millions of messages with modest hardware Scalability Highly scalable distributed systems with no downtime Replication Messages can be replicated across cluster, which provides support for multiple subscribers and also in case of failure balances the consumers Durability Provides support for persistence of messages to disk which can be further used for batch consumption Stream Processing Kafka can be used along with real time streaming applications like spark and storm Data Loss Kafka with the proper configurations can ensure zero data loss
  • 8. Slide 8Slide 8Slide 8 www.edureka.co/apache-Kafka  With Kafka, we can easily handle hundreds and thousands of messages in a second  The cluster can be expanded with no downtime, making Kafka highly scalable  Messages are replicated, which provides reliability and durability  Fault tolerant Scalable Kafka Advantages
  • 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-Kafka  A distributed publish-subscribe messaging system  Developed at LinkedIn Corporation  Provides solution to handle all activity stream data  Fully supported in Hadoop platform  Partitions real time consumption across cluster of machines  Provides a mechanism for parallel load into Hadoop What is Kafka ?
  • 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-Kafka Apache Kafka – Overview Kafka External Tracking Proxy Frontend FrontendFrontend Background Service (Consumer) Background Service (Consumer) Hadoop DWH Background Service (Producer) Background Service (Producer)
  • 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-Kafka Kafka Architecture Producer (Front End) Producer (Services) Producer (Proxies) Producer (Adapters) Other Producer Zookeeper Consumers (Real Time) Consumers (NoSQL) Consumers (Hadoop) Consumers (Warehouses) Other Producer Kafka Kafka Kafka Kafka Broker
  • 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-Kafka  Below table lists the core concepts of Kafka Kafka Core Components Feature Description Topic A category or feed to which messages are published Producer Publishes messages to the Kafka Topic Consumer Subscribes and consumes messages from Kafka Topic Broker Handles hundreds of megabytes of reads and writes
  • 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-Kafka Kafka Topic  A user defined category where the messages are published  For each topic a partition log is maintained  Each partition basically contains an ordered, immutable sequence of messages where each message is assigned a sequential ID number called offset  Writes to a partition are generally sequential thereby reducing the number of hard disk seeks  Reading messages from partition can be random
  • 16. Slide 16Slide 16Slide 16 www.edureka.co/apache-Kafka  Applications publishes messages to the topic in kafka cluster.  Can be of any kind like front end, streaming etc.  While writing messages, it is also possible to attach a key with the message Same key will arrive in the same partition  Doesn’t wait for the acknowledgement from the kafka cluster  Publishes as much messages as fast as the broker in a cluster can handle Kafka Producers Kafka Clusters Producer Producer Producer
  • 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-Kafka Kafka Consumers  Applications subscribes and consumes messages from the brokers in Kafka cluster  Can be of any kind like real time consumers, NoSQL consumers, etc.  During consumption of messages from a topic, a consumer group can be configured with multiple consumers  Each consumer of consumer group reads messages from a unique subset of partitions in each topic they subscribe to  Messages with same key arrives at same consumer  Supports both Queuing and Publish-Subscribe  Consumers have to maintain the number of messages consumed Kafka Clusters Consumer Consumer Consumer
  • 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-Kafka Each server in the cluster is called a broker  Handles hundreds of MBs of writes from producers and reads from consumers  Retains all published messages irrespective of whether it is consumed or not  Retention is configured for n days  Published messages is available for consumptions for configured ‘n’ days and thereafter it is discarded  Works like a queue if consumer instances belong to same consumer group, else works like publish-subscribe Kafka Brokers
  • 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-Kafka Kafka Producer-Broker-Consumer
  • 20. Slide 20Slide 20Slide 20 www.edureka.co/apache-Kafka How Kafka can be used with Hadoop
  • 21. Slide 21Slide 21Slide 21 www.edureka.co/apache-Kafka Kafka with Hadoop using Camus  Camus is LinkedIn's Kafka ->HDFS pipeline  It is a MapReduce job Distributes data loads out of Kafka At LinkedIn, it processes tens of billions of messages/day All work done with one single Hadoop job Courtesy : confluent
  • 22. Slide 22Slide 22Slide 22 www.edureka.co/apache-Kafka How Kafka can be used with Spark
  • 23. Slide 23Slide 23Slide 23 www.edureka.co/apache-Kafka Kafka With Spark Streaming If messages are stored in ‘n’ partitions, parallel reading makes things faster Generally in Kafka messages are stored in multiple partitions Parallel reads can be effectively achieved by spark streaming Parallelism of reads is achieved by integrating KafkaInputDStream of Spark with Kafka High Level Consumer API
  • 24. Slide 24 www.edureka.co/apache-Kafka APPS Kafka E V E N T S STREAMING ENGINE Kafka With Spark Streaming Generally in Kafka messages are stored in multiple partitions
  • 25. Slide 25Slide 25Slide 25 www.edureka.co/apache-Kafka How Kafka can be used with Storm
  • 26. Slide 26Slide 26Slide 26 www.edureka.co/apache-Kafka Kafka With Spark Streaming
  • 27. Slide 27Slide 27Slide 27 www.edureka.co/apache-Kafka Companies Using Kafka
  • 28. Slide 28Slide 28Slide 28 www.edureka.co/apache-Kafka Get Certified in Apache Kafka from Edureka Edureka's Real-Time Analytics with Apache Kafka course: • Carefully designed to provide knowledge and skills to become a successful Kafka Big Data Developer • Helps you master the concepts of Kafka Cluster, Producers and Consumers, Kafka API, Kafka Integration with Hadoop, Storm and Spark • Encompasses the fundamental concepts like Kafka cluster, Kafka API to advance topics such as Kafka integration with Hadoop, Storm, Spark, Maven etc. • Online Live Courses: 15 hours • Assignments: 25 hours • Project: 20 hours • Lifetime Access + 24 X 7 Support Go to www.edureka.co/apache-kafka Batch starts from 10th October (Weekend Batch)
  • 29. Thank You Questions/Queries/Feedback/Survey Recording and presentation will be made available to you within 24 hours