SlideShare a Scribd company logo
Apache Kafka 
Introduction 
http://kafka.apache.org/
Joe Stein 
• Developer, Architect & Technologist 
• Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly 
Big Data Open Source Security LLC provides professional services and product solutions for the collection, 
storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and 
distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data 
Infrastructure Components to use but also how to change their existing (or build new) systems to work with 
them. 
• Apache Kafka Committer & PMC member 
• Blog & Podcast - http://allthingshadoop.com 
• Twitter @allthingshadoop
Apache Kafka 
• Apache Kafka 
o http://kafka.apache.org 
• Apache Kafka Source Code 
o https://github.com/apache/kafka 
• Documentation 
o http://kafka.apache.org/documentation.html 
• Wiki 
o https://cwiki.apache.org/confluence/display/KAFKA/Index
Kafka decouples data-pipelines
Topics & Partitions
A high-throughput distributed messaging system 
rethought as a distributed commit log.
More! 
• Producers - ** push ** 
o Batching 
o Compression 
o Sync (Ack), Async (auto batch) 
o Replication 
o Sequential writes, guaranteed ordering within each partition 
• Consumers - ** pull ** 
o No state held by broker 
o Consumers control reading from the stream 
• Zero Copy for producers and consumers to and from the broker 
http://kafka.apache.org/documentation.html#maximizingefficiency 
• Message stay on disk when consumed, deletes on TTL or compaction 
https://kafka.apache.org/documentation.html#compaction
Client Libraries 
Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients 
• Python - Pure Python implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• C - High performance C library with full protocol support 
• C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. 
• Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer 
implementations included, GZIP and Snappy compression supported. 
• Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy 
compression supported. Ruby 1.9.3 and up (CI runs MRI 2. 
• Clojure - Clojure DSL for the Kafka API 
• JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation 
• stdin & stdout 
Wire Protocol Developers Guide 
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
Really Quick Start (Scala) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/scala-kafka 
4) cd scala-kafka 
5) vagrant up 
Zookeeper will be running on 192.168.86.5 
BrokerOne will be running on 192.168.86.10 
All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 
6) ./gradlew test
Really Quick Start (Go) 
1) Install Vagrant http://www.vagrantup.com/ 
2) Install Virtual Box https://www.virtualbox.org/ 
3) git clone https://github.com/stealthly/go-kafka 
4) cd go-kafka 
5) vagrant up 
6) vagrant ssh brokerOne 
7) cd /vagrant 
8) sudo ./test.sh
Questions? 
/******************************************* 
Joe Stein 
Founder, Principal Consultant 
Big Data Open Source Security LLC 
http://www.stealth.ly 
Twitter: @allthingshadoop 
********************************************/

More Related Content

What's hot (20)

PDF
Apache Kafka - Martin Podval
Martin Podval
 
PPTX
Introduction to Kafka and Zookeeper
Rahul Jain
 
PPTX
Apache Kafka at LinkedIn
Discover Pinterest
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
PPTX
Apache Kafka
emreakis
 
PPTX
Architecture of a Kafka camus infrastructure
mattlieber
 
PPTX
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
PDF
Lessons from managing a Pulsar cluster (Nutanix)
StreamNative
 
PDF
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
PPTX
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Enrico Olivelli
 
PDF
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative
 
PDF
Kafka and Spark Streaming
datamantra
 
PDF
Kafka on Pulsar
StreamNative
 
PDF
Introduction to apache kafka
Dimitris Kontokostas
 
KEY
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
PPTX
Introduction to Apache Kafka
Jeff Holoman
 
PPTX
Developing with the Go client for Apache Kafka
Joe Stein
 
Apache Kafka - Martin Podval
Martin Podval
 
Introduction to Kafka and Zookeeper
Rahul Jain
 
Apache Kafka at LinkedIn
Discover Pinterest
 
kafka for db as postgres
PivotalOpenSourceHub
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
Apache Kafka
emreakis
 
Architecture of a Kafka camus infrastructure
mattlieber
 
I Heart Log: Real-time Data and Apache Kafka
Jay Kreps
 
Kafka meetup JP #3 - Engineering Apache Kafka at LINE
kawamuray
 
Lessons from managing a Pulsar cluster (Nutanix)
StreamNative
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
Apache Bookkeeper and Apache Zookeeper for Apache Pulsar
Enrico Olivelli
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
StreamNative
 
Kafka and Spark Streaming
datamantra
 
Kafka on Pulsar
StreamNative
 
Introduction to apache kafka
Dimitris Kontokostas
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 
Introduction to Apache Kafka
Jeff Holoman
 
Developing with the Go client for Apache Kafka
Joe Stein
 

Viewers also liked (20)

PPTX
Kafka & Hadoop - for NYC Kafka Meetup
Gwen (Chen) Shapira
 
PPTX
Data Pipeline at Tapad
Toby Matejovsky
 
PPTX
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Michael Noll
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PPTX
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
PDF
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
PPTX
Fast Data Driving Personalization - Nick Gorski
Hakka Labs
 
PDF
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Hakka Labs
 
PDF
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
PPTX
jstein.cassandra.nyc.2011
Joe Stein
 
PPTX
Storing Time Series Metrics With Cassandra and Composite Columns
Joe Stein
 
PPTX
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
PDF
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
PPTX
Containerized Data Persistence on Mesos
Joe Stein
 
PPTX
Apache Cassandra 2.0
Joe Stein
 
PPTX
Introduction to Kafka
Akash Vacher
 
PPTX
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
PPTX
Making Apache Kafka Elastic with Apache Mesos
Joe Stein
 
PPTX
Developing Frameworks for Apache Mesos
Joe Stein
 
PPTX
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
Kafka & Hadoop - for NYC Kafka Meetup
Gwen (Chen) Shapira
 
Data Pipeline at Tapad
Toby Matejovsky
 
Introducing Kafka Streams, the new stream processing library of Apache Kafka,...
Michael Noll
 
Introduction to Kafka Streams
Guozhang Wang
 
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Fast Data Driving Personalization - Nick Gorski
Hakka Labs
 
Ad Personalization at Spotify: Iterative Enginering and Product Development -...
Hakka Labs
 
Data Pipeline with Kafka
Peerapat Asoktummarungsri
 
jstein.cassandra.nyc.2011
Joe Stein
 
Storing Time Series Metrics With Cassandra and Composite Columns
Joe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Containerized Data Persistence on Mesos
Joe Stein
 
Apache Cassandra 2.0
Joe Stein
 
Introduction to Kafka
Akash Vacher
 
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Joe Stein
 
Developing Frameworks for Apache Mesos
Joe Stein
 
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
Ad

Similar to Introduction Apache Kafka (20)

PPTX
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
PPTX
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
PPTX
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
DOCX
Apache kafka configuration-guide
Chetan Khatri
 
PDF
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
PDF
Kafka Workshop
Alexandre André
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PPTX
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Virtual Bash! A Lunchtime Introduction to Kafka
Jason Bell
 
PDF
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
PDF
Apache kafka-a distributed streaming platform
confluent
 
PDF
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
PPTX
Building an Event Bus at Scale
jimriecken
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Apache KAfka
Pedro Alcantara
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Being Ready for Apache Kafka - Apache: Big Data Europe 2015
Michael Noll
 
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache kafka configuration-guide
Chetan Khatri
 
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Kafka Workshop
Alexandre André
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache Kafka Introduction
Amita Mirajkar
 
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Virtual Bash! A Lunchtime Introduction to Kafka
Jason Bell
 
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
Apache kafka-a distributed streaming platform
confluent
 
Apache Kafka - A Distributed Streaming Platform
Paolo Castagna
 
Building an Event Bus at Scale
jimriecken
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Apache KAfka
Pedro Alcantara
 
Ad

More from Joe Stein (9)

PDF
Streaming Processing with a Distributed Commit Log
Joe Stein
 
PDF
SMACK Stack 1.1
Joe Stein
 
PDF
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
 
PPTX
Introduction To Apache Mesos
Joe Stein
 
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
PPTX
Building and Deploying Application to Apache Mesos
Joe Stein
 
PPTX
Apache Kafka, HDFS, Accumulo and more on Mesos
Joe Stein
 
PPTX
Introduction to Apache Mesos
Joe Stein
 
PPTX
Hadoop Streaming Tutorial With Python
Joe Stein
 
Streaming Processing with a Distributed Commit Log
Joe Stein
 
SMACK Stack 1.1
Joe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
 
Introduction To Apache Mesos
Joe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Building and Deploying Application to Apache Mesos
Joe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Joe Stein
 
Introduction to Apache Mesos
Joe Stein
 
Hadoop Streaming Tutorial With Python
Joe Stein
 

Recently uploaded (20)

DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PPTX
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 

Introduction Apache Kafka

  • 1. Apache Kafka Introduction http://kafka.apache.org/
  • 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  • 3. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  • 6. A high-throughput distributed messaging system rethought as a distributed commit log.
  • 7. More! • Producers - ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  • 8. Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  • 9. Really Quick Start (Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  • 10. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  • 11. Questions? /******************************************* Joe Stein Founder, Principal Consultant Big Data Open Source Security LLC http://www.stealth.ly Twitter: @allthingshadoop ********************************************/