SlideShare a Scribd company logo
Apache	Kafka	and	Real	Time	
Stream	Processing
Gwen	Shapira
System	Architect
Confluent
@gwenshap
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
I’ll	tell	you	about
• What	is	stream	processing	and	
why	it	matters
• What	is	Apache	Kafka
• How	Kafka	helps	stream	processing
Stay	awake	for	
this	part
What	is	Stream	Processing?
Data	Processing	Paradigm
Request	/	Response	
Batch
Stream	Processing
Stream	Processing	Paradigm
• Data	is	generated	at	its	own	rate	as	“Streams”
• We	can	process	as	much	or	as	little	as	we	want
• Continuously
• Results	are	available	in	real-time
• But	nothing	waits	for	specific	results
• Time	for	data	availability?
• More	than	“few	ms”
• Less	than	“hours”
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
This	is	the	world	changing	bit
• Most	of	the	business	is…
• Not	urgent	enough	to	require	immediate	response
• But	can’t	wait	for	the	next	day
• “Streams	of	events”	represents	something	fundamental
• Same	way	relational	tables	are	fundamental
Ok,	got	the	streams	part.
But	what	about	Apache	Kafka?
Cross	of	messaging	system	
and	file	system
Kafka	is	all	about	LOGS
If	you	understand	logs
You	understand	Kafka
Redo	Log:
The	most	crucial	structure	for	
recovery	operations	…	
store	all	changes	made	to	the	
database	as	they	occur.
Important	Point
The	redo	log	is	the	only reliable	
source	of	information	about	current	
state	of	the	database.
But	Logs	are	also	a	STREAM	of	events
And	Kafka	stores	those	logs
Allowing	to	read	the	past
and	keep	getting	updates	on	the	future
Stream	Processing
Read	a	stream
modify	it
output	another	stream
Example:	CDC-based	ETL
If	we	use	Kafka	for	CDC,	
does	it	mean	it	is	ACID?
Stream	Processing	is	Important
Kafka	is	a	collection	of	logs.
How	does	Kafka	help	with	stream	processing?
First,	How	do	we	actually	
do	stream	processing?
Method	1:	
Do	it	yourself	(Hipster	stream	processing)
Method	2:
The	Stream	Processing	Frameworks
• Storm
• Spark
• Flink
• Samza
• Apex
• Nifi
• StreamBase
• InfoSphere Streams
• Google	DataFlow (AKA	Beam)
• I	can	go	on	for	5	more	pages…
Few	of	those	are	really	popular!
• Pro:	They	handle	some	hard	problems
• Con:	It	can	be	too	complex
What	do	I	mean	by	too	complex?
Hadoop	Cluster	II
Storage Processing
SolR
Hadoop	Cluster	I
ClientClient
Flume	Agents
Hbase /	
Memory
Spark	
Streaming
HDFS
Hive/Imp
ala
Map/Red
uce
Spark
Search
Automated	&	
Manual	
Analytical	
Adjustments	
and	Pattern	
detection
Fetching	&	
Updating	Profiles
Adjusting	NRT	Stats
HDFSEventSink
SolR Sink
Batch	Time	Adjustments
Automated	&	
Manual	
Review	of	NRT	
Changes	and	
Counters
Local	Cache
Kafka
Clients:
(Swipe	
here!)
Web	App
Why	so	many	moving	parts?
We	needed…
Hbase to	handle	complex	state
Spark	requires	HDFS
Ingest	layer	
Batch	layer	to	handle	re-calculations
What	we	really	wanted	was…
Inputs
Kafka
Processor
output
Enter	KafkaStreams
3	Simplifications:
1. Uses	Kafka
2. No	Framework
3. Unify	Tables	and	Streams
Don’t	all	stream	processing	use	
Kafka?
We	use	Kafka	for…	Partitioning,	Scalability,	
Fault	Tolerance
Kafka
A A A
Group	A
B
B
Group	B
Handling	Time
No	Framework
• It	is	just	a	library	that	does	transformations
• We	can	add	languages	on	top
• Kafka	does	everything	we	needed	the	framework	to	do
• You	don’t	need		“framework”	to	run	queries,	why	do	you	need	it	to	
run	queries	continuously?
The	really	important	bit:
Streams	meet	Tables
Streams:	Things	that	happen.	Events.
Tables:	State	of	things	as	they	are.
Databases:	Only	states.
Streams:	Only	events.
We	can	convert	tables	to	streams	and	back:
Stream	->	Apply	->	Table
Table	->	Change	Capture	->	Stream
This	is	called	Table-Stream	Duality.
Streams	and	Tables	sometimes	work	
the	same.
And	sometimes	are	very	different.
KafkaStreams handles	both.
But…
Where	do	streams	come	from?
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
We	really	like	streams
So	we	created	a	
Stream	Data	Platform
GNW03: Stream Processing with Apache Kafka by Gwen Shapira
Where	can	we	learn	more?
• http://www.confluent.io/blog
• http://kafka.apache.org/docume
ntation.html
• http://docs.confluent.io/current

More Related Content

What's hot (20)

PPTX
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
StreamNative
 
PDF
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
PPTX
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
PDF
Introduction to Spark Streaming
datamantra
 
PPTX
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
PPTX
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
PDF
Tachyon and Apache Spark
rhatr
 
PDF
Rethinking Streaming Analytics For Scale
Helena Edelson
 
PPTX
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
PPTX
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
PPTX
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
PDF
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Flip Kromer
 
PDF
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 
PDF
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
PPTX
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
PDF
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
ODP
Lambda Architecture with Spark
Knoldus Inc.
 
PPTX
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
StreamNative
 
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Real time data viz with Spark Streaming, Kafka and D3.js
Ben Laird
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
DataWorks Summit/Hadoop Summit
 
Introduction to Spark Streaming
datamantra
 
Scaling ETL with Hadoop - Avoiding Failure
Gwen (Chen) Shapira
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Tachyon and Apache Spark
rhatr
 
Rethinking Streaming Analytics For Scale
Helena Edelson
 
HBaseConEast2016: Splice machine open source rdbms
Michael Stack
 
Data Architectures for Robust Decision Making
Gwen (Chen) Shapira
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
DataWorks Summit/Hadoop Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Helena Edelson
 
Patterns of the Lambda Architecture -- 2015 April -- Hadoop Summit, Europe
Flip Kromer
 
ScyllaDB: What could you do with Cassandra compatibility at 1.8 million reque...
Data Con LA
 
Cassandra Community Webinar: Apache Spark Analytics at The Weather Channel - ...
DataStax Academy
 
Kafka connect-london-meetup-2016
Gwen (Chen) Shapira
 
Apache Kylin: Speed Up Cubing with Apache Spark with Luke Han and Shaofeng Shi
Databricks
 
Lambda Architecture with Spark
Knoldus Inc.
 
Unified Batch & Stream Processing with Apache Samza
DataWorks Summit
 

Similar to GNW03: Stream Processing with Apache Kafka by Gwen Shapira (20)

PPTX
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
PPTX
Flink System Overview
Timo Walther
 
PDF
Connecting Akka with Oracle Event Hub Cloud Service
Dalibor Blazevic
 
PDF
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
PDF
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
PDF
Introduction to Kafka Streams - Knolx.pdf
Knoldus Inc.
 
PDF
Five Early Challenges Of Building Streaming Fast Data Applications
Lightbend
 
PPTX
kafka for db as postgres
PivotalOpenSourceHub
 
PPTX
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Using Hazelcast in the Kappa architecture
Oliver Buckley-Salmon
 
PDF
Building Big Data Streaming Architectures
David Martínez Rego
 
PPTX
Reactconf 2014 - Event Stream Processing
Andy Piper
 
PDF
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
PPTX
Stream Processing @ Lyft
Jamie Grier
 
PPTX
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Jie Li
 
PDF
Etl is Dead; Long Live Streams
confluent
 
PPTX
Ai big dataconference_jeffrey ricker_kappa_architecture
Olga Zinkevych
 
PDF
Scaling Cron at Slack by Claire Adams, Slack
ScyllaDB
 
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Flink System Overview
Timo Walther
 
Connecting Akka with Oracle Event Hub Cloud Service
Dalibor Blazevic
 
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Brian Brazil
 
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
Introduction to Kafka Streams - Knolx.pdf
Knoldus Inc.
 
Five Early Challenges Of Building Streaming Fast Data Applications
Lightbend
 
kafka for db as postgres
PivotalOpenSourceHub
 
Performance Comparison of Streaming Big Data Platforms
DataWorks Summit/Hadoop Summit
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Using Hazelcast in the Kappa architecture
Oliver Buckley-Salmon
 
Building Big Data Streaming Architectures
David Martínez Rego
 
Reactconf 2014 - Event Stream Processing
Andy Piper
 
Gruter TECHDAY 2014 Realtime Processing in Telco
Gruter
 
Stream Processing @ Lyft
Jamie Grier
 
Powering Interactive Data Analysis at Pinterest by Amazon Redshift
Jie Li
 
Etl is Dead; Long Live Streams
confluent
 
Ai big dataconference_jeffrey ricker_kappa_architecture
Olga Zinkevych
 
Scaling Cron at Slack by Claire Adams, Slack
ScyllaDB
 
Ad

Recently uploaded (20)

PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Ad

GNW03: Stream Processing with Apache Kafka by Gwen Shapira