SlideShare a Scribd company logo
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
If I’m talking about some console
applications, stills, moonshine and
police now.
That means prayer didn’t work.
Fuck.
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable
messaging with Kafka
Rafał Hryniewski
@r_hryniewski
fb.me/hryniewskinet
.NET Dev
Blogger
Speaker
Community leader
https://hryniewski.net
rafal@hryniewski.net
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable
messaging with Kafka
Agenda
 History
 Use cases
 Producers and consumers
 Topics, partitions and clusters
 Streams, AdminClient and Connectors
 Kafka in .NET and Cloud
 External stream processing systems (spark/storm/flink/apex)
History
History
 Developed in LinkedIn
 Open sourced in 2011
 Named after Franz Kafka because it’s optimized for writing
Kafka is basically:
 Open source
 Written in Scala
 Message broker
 Stream processing platform
 High throughput & low latency
 Scalable
 Designed as distributed transaction log
Messaging 101
Event driven architecture 101
Used by
Large scale, distributed and reliable messaging with Kafka
Messaging
Event sourcing
Stream processing
Commit log
User activity tracking
Metrics
Log aggregation
Large scale, distributed and reliable messaging with Kafka
Kafka APIs
 Producer API
 Consumer API
 Connector API
 Streams API
 AdminClient API
Producer
Producer API
 Allows to publish stream of messages to one or more topics
 Asynchronous and thread safe (in original implementation)
 Can deliver messages “at least once”, “at most once” or “exactly once”
 Can batch messages
 Can use partitions for load balancing purpose
Large scale, distributed and reliable messaging with Kafka
Consumer API
 Allows subscription to topic and receiving messages from it
 Messages are pulled from topic – each consumer can process messages at its
own pace
 Supports long polling to avoid being stuck in a loop
 Each consumer handles its own position
 Does not support acknowledgements but can rewind from any offset
 Supports consumer groups
Large scale, distributed and reliable messaging with Kafka
Topics
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Topics
 Each topic has a name, is partitioned and is multi-subscriber
 Kafka persists each published message. Retention period is configurable.
 Consumer controls its own offset
 Partition must fit on the server but topic can be partitioned across multiple nodes
 Partitions are replicated across cluster to ensure fault tolerance, each partition has
a leader replica
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Cluster
 Kafka runs in cluster
 Cluster has multiple servers/nodes
 Cluster can run on multiple datacenters
 Cluster stores messages in partitioned topics
 Zookeeper coordinates servers in cluster
Streams
Streams
 Acts as stream processor
 Allows consuming inputs from one or more topics and provide processed output to
other topic
 Works (almost) in real time
AdminClient
Large scale, distributed and reliable messaging with Kafka
Connector API
 Build your own reusable consumers/producers
 Integrate Kafka with existing applications
Example Connectors
Kafka in .NET
 Main library is confluent-kafka-dotnet
 Supports Avro serialization/deserialization with schema registry
 Easy to learn, hard to master
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Kafka in Azure
 Azure Event Hub are fully compatible with Kafka enabled applications (you just
need to change connection configuration)
 You can setup Kafka Cluster in HDInsight (it’s not cheap)
Kafka in AWS
 Amazon Managed Streaming for Apache Kafka (Amazon MSK)
 Amazon Kinesis has somewhat similar capabilities
Kafka in GCP
 Only in VMs/Containers
Kafka in IBM Cloud
 IBM Event Streams is basically Kafka-as-a-service
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
Large scale, distributed and reliable messaging with Kafka
External stream processing systems
Apache Apex
 Platform used to help in development of stream and batch oriented applications.
 Designed to process data in-motion
 Performant
 Scalable
 Fault tolerant
 Allows creation of various functions without thinking about distributed environment
Apache Flink
 Focused on parallel, pipelined processing of streams
 Runs Java, Scala, Python and SQL Code
 Manages state
 Great for data analysis and event correlation
Apache Spark
 Analytics engine for big data processing
 Data processing framework
 Used for processing and transforming streams of data
 Also used for training machine learning algorithms
 Great for ETL (Extract, transform, and load) processes
 Supports Java, Scala, Python and R
Apache Storm
 Distributed real-time computation system
 Great for real time analytic systems (in example fraud detection)
 Can handle MASSIVE amounts of data on the fly
 Works with ANY programming language
Large scale, distributed and reliable messaging with Kafka
bit.ly/rh-kafka
Questions?
@r_hryniewskifb.me/hryniewskinet

More Related Content

What's hot (19)

PPTX
Apache kafka
Jemin Patel
 
PDF
Apache Kafka
Diego Pacheco
 
PPTX
Data Pipelines with Kafka Connect
Kaufman Ng
 
PDF
Monitoring Apache Kafka with Confluent Control Center
confluent
 
PPTX
Kafka connect
Andrew Stevenson
 
PPTX
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
HostedbyConfluent
 
PDF
A Streaming Platform Architecture Based on Apache Kafka
confluent
 
PPTX
Kafka connect 101
Whiteklay
 
PDF
Data Driven Enterprise with Apache Kafka
confluent
 
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
PPTX
Schema registry
Whiteklay
 
PDF
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
confluent
 
PDF
Deploying Kafka on DC/OS
Kaufman Ng
 
PDF
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
PDF
Common issues with Apache Kafka® Producer
confluent
 
PPTX
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
PDF
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Yu-Jhe Li
 
PDF
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
PDF
Kafka ops-new
Ariel Moskovich
 
Apache kafka
Jemin Patel
 
Apache Kafka
Diego Pacheco
 
Data Pipelines with Kafka Connect
Kaufman Ng
 
Monitoring Apache Kafka with Confluent Control Center
confluent
 
Kafka connect
Andrew Stevenson
 
Introducing KSML: Kafka Streams for low code environments | Jeroen van Dissel...
HostedbyConfluent
 
A Streaming Platform Architecture Based on Apache Kafka
confluent
 
Kafka connect 101
Whiteklay
 
Data Driven Enterprise with Apache Kafka
confluent
 
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
HostedbyConfluent
 
Schema registry
Whiteklay
 
Building Stream Processing Applications with Apache Kafka Using KSQL (Robin M...
confluent
 
Deploying Kafka on DC/OS
Kaufman Ng
 
Un'introduzione a Kafka Streams e KSQL... and why they matter!
Paolo Castagna
 
Common issues with Apache Kafka® Producer
confluent
 
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
Yu-Jhe Li
 
Apache Kafka lessons learned @PAYBACK
Maxim Shelest
 
Kafka ops-new
Ariel Moskovich
 

Similar to Large scale, distributed and reliable messaging with Kafka (20)

PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
PPTX
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
PDF
Kafka syed academy_v1_introduction
Syed Hadoop
 
PPTX
Kafka Basic For Beginners
Riby Varghese
 
PPTX
Kafka for data scientists
Jenn Rawlins
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
PDF
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
PDF
Confluent Enterprise Datasheet
confluent
 
PDF
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
PPTX
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
PDF
Kafka Architecture | Key Components | kafka training online
Accentfuture
 
PDF
apache kafka training online | kafka online training
Accentfuture
 
PPTX
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
PDF
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
PDF
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Streaming the platform with Confluent (Apache Kafka)
GiuseppeBaccini
 
Kafka syed academy_v1_introduction
Syed Hadoop
 
Kafka Basic For Beginners
Riby Varghese
 
Kafka for data scientists
Jenn Rawlins
 
Introduction to Apache Kafka and Confluent... and why they matter!
Paolo Castagna
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
GeeksLab Odessa
 
Introduction to Apache Kafka and Confluent... and why they matter
confluent
 
Confluent Enterprise Datasheet
confluent
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
Kafka Architecture | Key Components | kafka training online
Accentfuture
 
apache kafka training online | kafka online training
Accentfuture
 
Kafka Tutorial - introduction to the Kafka streaming platform
Jean-Paul Azar
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
HostedbyConfluent
 
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
Apache kafka-a distributed streaming platform
confluent
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Ad

More from Rafał Hryniewski (17)

PDF
Azure messaging
Rafał Hryniewski
 
PDF
Azure developer
Rafał Hryniewski
 
PDF
Great webapis
Rafał Hryniewski
 
PPTX
DevSecOps - security all the way
Rafał Hryniewski
 
PPTX
DevSecOps - Security all the way
Rafał Hryniewski
 
PPTX
Anchor modeling
Rafał Hryniewski
 
PPTX
Meet Gremlin – your guide through graphs in Cosmos DB
Rafał Hryniewski
 
PPTX
Shit happens – achieve extensibility, modularity and loosely coupled architec...
Rafał Hryniewski
 
PPTX
Web app security essentials
Rafał Hryniewski
 
PPTX
Public speaking - why am I doing this to myself and why you should too?
Rafał Hryniewski
 
PPTX
Azure SQL - more or/and less than SQL Server
Rafał Hryniewski
 
PPTX
Blazor
Rafał Hryniewski
 
PPTX
Shodan
Rafał Hryniewski
 
PPTX
Essential security measures in ASP.NET MVC
Rafał Hryniewski
 
PPTX
.NET, Alexa and me
Rafał Hryniewski
 
PPTX
ORM – The tip of an iceberg
Rafał Hryniewski
 
PPTX
Quick trip around the Cosmos - Things every astronaut supposed to know
Rafał Hryniewski
 
Azure messaging
Rafał Hryniewski
 
Azure developer
Rafał Hryniewski
 
Great webapis
Rafał Hryniewski
 
DevSecOps - security all the way
Rafał Hryniewski
 
DevSecOps - Security all the way
Rafał Hryniewski
 
Anchor modeling
Rafał Hryniewski
 
Meet Gremlin – your guide through graphs in Cosmos DB
Rafał Hryniewski
 
Shit happens – achieve extensibility, modularity and loosely coupled architec...
Rafał Hryniewski
 
Web app security essentials
Rafał Hryniewski
 
Public speaking - why am I doing this to myself and why you should too?
Rafał Hryniewski
 
Azure SQL - more or/and less than SQL Server
Rafał Hryniewski
 
Essential security measures in ASP.NET MVC
Rafał Hryniewski
 
.NET, Alexa and me
Rafał Hryniewski
 
ORM – The tip of an iceberg
Rafał Hryniewski
 
Quick trip around the Cosmos - Things every astronaut supposed to know
Rafał Hryniewski
 
Ad

Recently uploaded (20)

PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

Large scale, distributed and reliable messaging with Kafka

  • 3. If I’m talking about some console applications, stills, moonshine and police now. That means prayer didn’t work. Fuck.
  • 5. Large scale, distributed and reliable messaging with Kafka
  • 6. Rafał Hryniewski @r_hryniewski fb.me/hryniewskinet .NET Dev Blogger Speaker Community leader https://hryniewski.net [email protected]
  • 8. Large scale, distributed and reliable messaging with Kafka
  • 9. Agenda  History  Use cases  Producers and consumers  Topics, partitions and clusters  Streams, AdminClient and Connectors  Kafka in .NET and Cloud  External stream processing systems (spark/storm/flink/apex)
  • 11. History  Developed in LinkedIn  Open sourced in 2011  Named after Franz Kafka because it’s optimized for writing
  • 12. Kafka is basically:  Open source  Written in Scala  Message broker  Stream processing platform  High throughput & low latency  Scalable  Designed as distributed transaction log
  • 25. Kafka APIs  Producer API  Consumer API  Connector API  Streams API  AdminClient API
  • 27. Producer API  Allows to publish stream of messages to one or more topics  Asynchronous and thread safe (in original implementation)  Can deliver messages “at least once”, “at most once” or “exactly once”  Can batch messages  Can use partitions for load balancing purpose
  • 29. Consumer API  Allows subscription to topic and receiving messages from it  Messages are pulled from topic – each consumer can process messages at its own pace  Supports long polling to avoid being stuck in a loop  Each consumer handles its own position  Does not support acknowledgements but can rewind from any offset  Supports consumer groups
  • 34. Topics  Each topic has a name, is partitioned and is multi-subscriber  Kafka persists each published message. Retention period is configurable.  Consumer controls its own offset  Partition must fit on the server but topic can be partitioned across multiple nodes  Partitions are replicated across cluster to ensure fault tolerance, each partition has a leader replica
  • 37. Cluster  Kafka runs in cluster  Cluster has multiple servers/nodes  Cluster can run on multiple datacenters  Cluster stores messages in partitioned topics  Zookeeper coordinates servers in cluster
  • 39. Streams  Acts as stream processor  Allows consuming inputs from one or more topics and provide processed output to other topic  Works (almost) in real time
  • 42. Connector API  Build your own reusable consumers/producers  Integrate Kafka with existing applications
  • 44. Kafka in .NET  Main library is confluent-kafka-dotnet  Supports Avro serialization/deserialization with schema registry  Easy to learn, hard to master
  • 51. Kafka in Azure  Azure Event Hub are fully compatible with Kafka enabled applications (you just need to change connection configuration)  You can setup Kafka Cluster in HDInsight (it’s not cheap)
  • 52. Kafka in AWS  Amazon Managed Streaming for Apache Kafka (Amazon MSK)  Amazon Kinesis has somewhat similar capabilities
  • 53. Kafka in GCP  Only in VMs/Containers
  • 54. Kafka in IBM Cloud  IBM Event Streams is basically Kafka-as-a-service
  • 59. Apache Apex  Platform used to help in development of stream and batch oriented applications.  Designed to process data in-motion  Performant  Scalable  Fault tolerant  Allows creation of various functions without thinking about distributed environment
  • 60. Apache Flink  Focused on parallel, pipelined processing of streams  Runs Java, Scala, Python and SQL Code  Manages state  Great for data analysis and event correlation
  • 61. Apache Spark  Analytics engine for big data processing  Data processing framework  Used for processing and transforming streams of data  Also used for training machine learning algorithms  Great for ETL (Extract, transform, and load) processes  Supports Java, Scala, Python and R
  • 62. Apache Storm  Distributed real-time computation system  Great for real time analytic systems (in example fraud detection)  Can handle MASSIVE amounts of data on the fly  Works with ANY programming language