SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Apache Kafka
Scalable Message Processing and more!
Guido Schmutz – 10.5.2017
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Apache Kafka - Scalable Message Processing and more!
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1. Introduction & Motivation
2. Kafka Core
3. Kafka Connect
4. Kafka Streams
5. Kafka and "Big Data" / "Fast Data" Ecosystem
6. Kafka in Enterprise Architecture
7. Confluent Data Platform
8. Summary
Introduction & Motivation
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
Initially developed at LinkedIn, now part of Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• "A unified platform for handling all the real-time data feeds a large company might
have."
Must haves
• High throughput to support high volume event feeds
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems
• Support low-latency delivery to handle more traditional messaging use cases
• Guarantee fault-tolerance in the presence of machine failures
Apache Kafka History
Source:	Confluent
Apache Kafka - Unix Analogy
$ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt
Kafka	Connect	API Kafka	Connect	APIKafka	Streams	API
Kafka	Core	(Cluster)
Source:	Confluent
Kafka Core
Kafka High Level Architecture
The who is who
• Producers write data to brokers.
• Consumers read data from
brokers.
• All this is distributed.
The data
• Data is stored in topics.
• Topics are split into partitions,
which are replicated.
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor1 2 3 4 5 6
Truck
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor
Partition	0
1 2 3 4 5 6
Partition	0
1 2 3 4 5 6
Partition	1 Movement
Processor
Truck
Apache
Kafka
Kafka Broker 1
Movement
Processor
Truck
Movement	Topic
P	0
Movement
Processor
1 2 3 4 5
P	2 1 2 3 4 5
Kafka Broker 2
Movement	Topic
P	2 1 2 3 4 5
P	1 1 2 3 4 5
Kafka Broker 3
Movement	Topic
P	0 1 2 3 4 5
P	1 1 2 3 4 5
Movement
Processor
Kafka Topics
Creating a topic
• Command line interface
• Using AdminUtils.createTopic method
• Auto-create via auto.create.topics.enable = true
Modifying a topic
https://kafka.apache.org/documentation.html#basic_ops_modify_topic
Deleting a topic
• Command Line interface
$ kafka-topics.sh –zookeeper zk1:2181 --create 
--topic my.topic –-partitions 3 
–-replication-factor 2 --config x=y
Kafka Producer
• Write Ahead Log / Commit Log
• Producers always append to tail (think append to file)
• Preserves Order of messages within partition
Kafka Broker
Movement	Topic
1 2 3 4 5
Truck
6 6
Kafka Producer – High Level Overiew
Producer Client
Kafka Broker
Movement	Topic
Partition	0
Partitioner
Movement Topic
Serializer
Producer	Record
message	1
message	2
message	3
message	4
Batch
Movement	Topic
Partition	1
message	1
message	2
message	3
Batch
Partition	0
Partition	1
Retry
?
Fail
?
Topic
Message
[	Partition	]
[	Key	]
Value
yes
yes
if	can’t	retry:
throw	exception
successful:
return	metadata
Compression	(optional)
Kafka Producer - Durability Guarantees
Producer can configure acknowledgements
Value Description Throughput Latency Durability
0 • Producer	doesn’t	wait	for	leader high low low (no	
guarantee)
1	
(default)
• Producer	waits	for	leader
• Leader	sends ack when	message	
written	to	log
• No	wait	for	followers
medium medium medium	
(leader)
all	(-1) • Producer	waits	for	leader
• Leader	sends	ack when all	In-Sync	
Replica	have	acknowledged
low high high	(ISR)
Kafka Producer - Java API
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092");
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
Kafka Producer - Java API
Sending Message Synchronously (no control if message has been sent successful)
Sending Message Synchronously (wait until reply from Kafka arrives back)
ProducerRecord<String, String> record =
new ProducerRecord<>("topicName", "Key", "Value");
try {
producer.send(record);
} catch (Exception e) {}
ProducerRecord<String, String> record =
new ProducerRecord<>("topicName", "Key", "Value");
try {
producer.send(record).get();
} catch (Exception e) {}
Kafka Producer - Java API
Sending Message Asynchronously
private class DemoProducerCallback implements Callback {
@Override
public void onCompletion(RecordMetadata recordMetadata,
Exception e) {
if (e != null) {
e.printStackTrace();
}
}
}
ProducerRecord<String, String> record = new
ProducerRecord<>("topicName", "key", "value");
producer.send(record, new DemoProducerCallback());
Kafka Consumer - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and
sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer	Group	A Consumer	Group	B
Kafka Consumer - Consumer Groups
Kafka
Movement	Topic
Partition	0
Consumer Group 1
Consumer	1
Partition	1
Partition	2
Partition	3
Kafka
Movement	Topic
Partition	0
Consumer Group 1
Partition	1
Partition	2
Partition	3
Kafka
Movement	Topic
Partition	0
Consumer Group 1
Partition	1
Partition	2
Partition	3
Kafka
Movement	Topic
Partition	0
Consumer Group 1
Partition	1
Partition	2
Partition	3
Consumer	1
Consumer	2
Consumer	3
Consumer	4
Consumer	1
Consumer	2
Consumer	3
Consumer	4
Consumer	5
Consumer	1
Consumer	2
2	Consumers	/	each	get	messages	from	2	partitions
1	Consumer	/	get	messages	from	all	partitions
5 Consumers	/	one	gets	no	messages
4 Consumers	/	each	get	messages	from	1	partition
Kafka Consumer - Consumer Groups
it’s very common to have multiple
applications that read data from the
same topic
each application should get all of the
messages
assign a unique consumer group to each
application
number of consumers (threads) can be different
Kafka scales to large number of consumers
without impacting performance
Kafka
Movement	Topic
Partition	0
Consumer Group 1
Consumer	1
Partition	1
Partition	2
Partition	3
Consumer	2
Consumer	3
Consumer	4
Consumer Group 2
Consumer	1
Consumer	2
Kafka Consumer - Java API
Constructing a Kafka Consumer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092");
kafkaProps.put("group.id","MovementsConsumerGroup");
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
consumer = new KafkaConsumer<String, String>(kafkaProps);
Kafka Consumer - Java API
Kafka Consumer Poll Loop (with synchronous offset commit)
consumer.subscribe(Collections.singletonList("topic"));
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
// process message, available information:
// record.topic(), record.partition(), record.offset(),
// record.key(), record.value());
}
consumer.commitSync();
}
} finally {
consumer.close();
}
Data Retention – 3 options
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based (entries with same key are removed)
kafka-topics.sh --zookeeper localhost:2181 
--create --topic customers 
--replication-factor 1 --partitions 1 
--config cleanup.policy=compact
Apache Kafka – Some numbers
Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics
Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster
• 445’622 messages/second
• 31 MB / second
• 3.0405 ms average latency between producer / consumer
1.3	Trillion	messages	per	
day
330	Terabytes	in/day
1.2	Petabytes	out/day
Peak	load	for	a	single	cluster
2	million	messages/sec
4.7	Gigabits/sec	inbound
15	Gigabits/sec	outbound
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
Kafka Connect
Kafka Connect Architecture
Source:	Confluent
Kafka Connector Hub – Certified Connectors
Source:	http://www.confluent.io/product/connectors
Kafka Connector Hub – Additional Connectors
Source:	http://www.confluent.io/product/connectors
Kafka Connect – Twitter example
./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties
../demo-config/twitter-source.properties
name=twitter-source
connector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnector
tasks.max=1
topic=tweets
twitter.consumerkey=<consumer-key>
twitter.consumersecret=<consumer-secret>
twitter.token=<token>
twitter.secret=<token-secret>
track.terms=bigdata
bootstrap.servers=localhost:9095,localhost:9096,localhost:9097
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
...
Kafka Streams
Kafka Streams
• Designed as a simple and lightweight library in Apache
Kafka
• no external dependencies on systems other than Apache
Kafka
• Part of open source Apache Kafka, introduced in 0.10+
• Leverages Kafka as its internal messaging layer
• agnostic to resource management and configuration tools
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond
latency
• Windowing with out-of-order data using a Google DataFlow-like
model
Streams API in the context of Kafka
Source:	Confluent
Kafka and "Big Data" / "Fast Data"
Ecosystem
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular
products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache NiFi
• Streamsets
• Apache Flume
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Spring Integration Kafka Support
• …Storm	built-in	Kafka	Spout	to	consume	events	from	Kafka
Kafka in "Enterprise Architecture"
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Traditional Big Data Architecture
BI	Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search
Online	&	Mobile	
Apps
Search
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – handle event stream data
BI	Tools
Enterprise Data
Warehouse
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Event
Hub
Call
Center
Weather
Data
Mobile
Apps
SQL
Search
Online	&	Mobile	
Apps
Search
Data Flow
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – taking Velocity into account
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Event
Hub
Event
Hub
Event
Hub
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Container
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – Asynchronous Microservice Architecture
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Event
Hub
Event
Hub
Event
Hub
Parallel
Batch
ProcessingDistributed
Filesystem
Microservice
NoSQLRDBMS
SQL
Search
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
{		}
API
Confluent Platform
Confluent Data Platform 3.2
Source:	Confluent
Confluent Data Platform 3.2
Source:	Confluent
Confluent Enterprise – Control Center
Source:	Confluent
Summary
Summary
• Kafka can scale to millions of messages per second, and more
• Easy to start in a Proof of Concept (PoC), but more to invest to setup a production
environment
• Monitoring is key
• Vibrant community and ecosystem
• Fast paced technology
• Confluent provides distribution and support for Apache Kafka
•
Oracle Event Hub Service offers a Kafka Managed Service
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
@gschmutz guidoschmutz.wordpress.com

More Related Content

What's hot (20)

PDF
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
HostedbyConfluent
 
PPSX
Service Mesh - Observability
Araf Karsh Hamid
 
PDF
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
PDF
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
HostedbyConfluent
 
PDF
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Introducing Kafka's Streams API
confluent
 
PPTX
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
PDF
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Event Driven Architectures with Apache Kafka on Heroku
Heroku
 
PDF
Evolving from Messaging to Event Streaming
confluent
 
PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PPTX
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
confluent
 
PDF
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
PDF
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
confluent
 
PPTX
Developing with the Go client for Apache Kafka
Joe Stein
 
PPTX
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 
Making Kafka Cloud Native | Jay Kreps, Co-Founder & CEO, Confluent
HostedbyConfluent
 
Service Mesh - Observability
Araf Karsh Hamid
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
HostedbyConfluent
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
Partner Development Guide for Kafka Connect
confluent
 
Introducing Kafka's Streams API
confluent
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
Introduction to Stream Processing
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Event Driven Architectures with Apache Kafka on Heroku
Heroku
 
Evolving from Messaging to Event Streaming
confluent
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Kafka Summit NYC 2017 - Cloud Native Data Streaming Microservices with Spring...
confluent
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
confluent
 
Developing with the Go client for Apache Kafka
Joe Stein
 
Building Event-Driven Systems with Apache Kafka
Brian Ritchie
 

Similar to Apache Kafka - Scalable Message-Processing and more ! (20)

PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PPTX
kafka_session_updated.pptx
Koiuyt1
 
PDF
Apache kafka
NexThoughts Technologies
 
PDF
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
PPTX
Kafka tutorial
Srikrishna k
 
PPTX
Apache kafka
Srikrishna k
 
PDF
apachekafka-160907180205.pdf
TarekHamdi8
 
PDF
Kafka Deep Dive
Knoldus Inc.
 
PPTX
Kafkha real time analytics platform.pptx
dummyuseage1
 
PDF
Apache Kafka
Worapol Alex Pongpech, PhD
 
PDF
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PPTX
Apache kafka
Ramakrishna kapa
 
PDF
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
PPTX
Apache kafka
Srikrishna k
 
PDF
Fundamentals of Apache Kafka
Chhavi Parasher
 
PDF
Apache Kafka Introduction
Amita Mirajkar
 
PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Introduction to apache kafka
Samuel Kerrien
 
PPTX
Apache kafka
Jemin Patel
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
kafka_session_updated.pptx
Koiuyt1
 
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Kafka tutorial
Srikrishna k
 
Apache kafka
Srikrishna k
 
apachekafka-160907180205.pdf
TarekHamdi8
 
Kafka Deep Dive
Knoldus Inc.
 
Kafkha real time analytics platform.pptx
dummyuseage1
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache kafka
Ramakrishna kapa
 
Kafka in action - Tech Talk - Paytm
Sumit Jain
 
Apache kafka
Srikrishna k
 
Fundamentals of Apache Kafka
Chhavi Parasher
 
Apache Kafka Introduction
Amita Mirajkar
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Introduction to apache kafka
Samuel Kerrien
 
Apache kafka
Jemin Patel
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Ad

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualisation
Guido Schmutz
 
PDF
Kafka as an event store - is it good enough?
Guido Schmutz
 
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
PDF
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Ad

Recently uploaded (20)

PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Digital Circuits, important subject in CS
contactparinay1
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 

Apache Kafka - Scalable Message-Processing and more !

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz – 10.5.2017 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: [email protected] Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Apache Kafka - Scalable Message Processing and more!
  • 3. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 4. Agenda 1. Introduction & Motivation 2. Kafka Core 3. Kafka Connect 4. Kafka Streams 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Enterprise Architecture 7. Confluent Data Platform 8. Summary
  • 6. Apache Kafka - Overview Distributed publish-subscribe messaging system Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) Initially developed at LinkedIn, now part of Apache Does not use JMS API and standards Kafka maintains feeds of messages in topics
  • 7. Apache Kafka - Motivation LinkedIn’s motivation for Kafka was: • "A unified platform for handling all the real-time data feeds a large company might have." Must haves • High throughput to support high volume event feeds • Support real-time processing of these feeds to create new, derived feeds. • Support large data backlogs to handle periodic ingestion from offline systems • Support low-latency delivery to handle more traditional messaging use cases • Guarantee fault-tolerance in the presence of machine failures
  • 9. Apache Kafka - Unix Analogy $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt Kafka Connect API Kafka Connect APIKafka Streams API Kafka Core (Cluster) Source: Confluent
  • 11. Kafka High Level Architecture The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. The data • Data is stored in topics. • Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble
  • 12. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor1 2 3 4 5 6 Truck
  • 13. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor Partition 0 1 2 3 4 5 6 Partition 0 1 2 3 4 5 6 Partition 1 Movement Processor Truck
  • 14. Apache Kafka Kafka Broker 1 Movement Processor Truck Movement Topic P 0 Movement Processor 1 2 3 4 5 P 2 1 2 3 4 5 Kafka Broker 2 Movement Topic P 2 1 2 3 4 5 P 1 1 2 3 4 5 Kafka Broker 3 Movement Topic P 0 1 2 3 4 5 P 1 1 2 3 4 5 Movement Processor
  • 15. Kafka Topics Creating a topic • Command line interface • Using AdminUtils.createTopic method • Auto-create via auto.create.topics.enable = true Modifying a topic https://kafka.apache.org/documentation.html#basic_ops_modify_topic Deleting a topic • Command Line interface $ kafka-topics.sh –zookeeper zk1:2181 --create --topic my.topic –-partitions 3 –-replication-factor 2 --config x=y
  • 16. Kafka Producer • Write Ahead Log / Commit Log • Producers always append to tail (think append to file) • Preserves Order of messages within partition Kafka Broker Movement Topic 1 2 3 4 5 Truck 6 6
  • 17. Kafka Producer – High Level Overiew Producer Client Kafka Broker Movement Topic Partition 0 Partitioner Movement Topic Serializer Producer Record message 1 message 2 message 3 message 4 Batch Movement Topic Partition 1 message 1 message 2 message 3 Batch Partition 0 Partition 1 Retry ? Fail ? Topic Message [ Partition ] [ Key ] Value yes yes if can’t retry: throw exception successful: return metadata Compression (optional)
  • 18. Kafka Producer - Durability Guarantees Producer can configure acknowledgements Value Description Throughput Latency Durability 0 • Producer doesn’t wait for leader high low low (no guarantee) 1 (default) • Producer waits for leader • Leader sends ack when message written to log • No wait for followers medium medium medium (leader) all (-1) • Producer waits for leader • Leader sends ack when all In-Sync Replica have acknowledged low high high (ISR)
  • 19. Kafka Producer - Java API Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092"); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps);
  • 20. Kafka Producer - Java API Sending Message Synchronously (no control if message has been sent successful) Sending Message Synchronously (wait until reply from Kafka arrives back) ProducerRecord<String, String> record = new ProducerRecord<>("topicName", "Key", "Value"); try { producer.send(record); } catch (Exception e) {} ProducerRecord<String, String> record = new ProducerRecord<>("topicName", "Key", "Value"); try { producer.send(record).get(); } catch (Exception e) {}
  • 21. Kafka Producer - Java API Sending Message Asynchronously private class DemoProducerCallback implements Callback { @Override public void onCompletion(RecordMetadata recordMetadata, Exception e) { if (e != null) { e.printStackTrace(); } } } ProducerRecord<String, String> record = new ProducerRecord<>("topicName", "key", "value"); producer.send(record, new DemoProducerCallback());
  • 22. Kafka Consumer - Partition offsets Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset • Consumers track their pointers via (offset, partition, topic) tuples Consumer Group A Consumer Group B
  • 23. Kafka Consumer - Consumer Groups Kafka Movement Topic Partition 0 Consumer Group 1 Consumer 1 Partition 1 Partition 2 Partition 3 Kafka Movement Topic Partition 0 Consumer Group 1 Partition 1 Partition 2 Partition 3 Kafka Movement Topic Partition 0 Consumer Group 1 Partition 1 Partition 2 Partition 3 Kafka Movement Topic Partition 0 Consumer Group 1 Partition 1 Partition 2 Partition 3 Consumer 1 Consumer 2 Consumer 3 Consumer 4 Consumer 1 Consumer 2 Consumer 3 Consumer 4 Consumer 5 Consumer 1 Consumer 2 2 Consumers / each get messages from 2 partitions 1 Consumer / get messages from all partitions 5 Consumers / one gets no messages 4 Consumers / each get messages from 1 partition
  • 24. Kafka Consumer - Consumer Groups it’s very common to have multiple applications that read data from the same topic each application should get all of the messages assign a unique consumer group to each application number of consumers (threads) can be different Kafka scales to large number of consumers without impacting performance Kafka Movement Topic Partition 0 Consumer Group 1 Consumer 1 Partition 1 Partition 2 Partition 3 Consumer 2 Consumer 3 Consumer 4 Consumer Group 2 Consumer 1 Consumer 2
  • 25. Kafka Consumer - Java API Constructing a Kafka Consumer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092"); kafkaProps.put("group.id","MovementsConsumerGroup"); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); consumer = new KafkaConsumer<String, String>(kafkaProps);
  • 26. Kafka Consumer - Java API Kafka Consumer Poll Loop (with synchronous offset commit) consumer.subscribe(Collections.singletonList("topic")); try { while (true) { ConsumerRecords<String, String> records = consumer.poll(100); for (ConsumerRecord<String, String> record : records) { // process message, available information: // record.topic(), record.partition(), record.offset(), // record.key(), record.value()); } consumer.commitSync(); } } finally { consumer.close(); }
  • 27. Data Retention – 3 options 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed) kafka-topics.sh --zookeeper localhost:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact
  • 28. Apache Kafka – Some numbers Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster • 445’622 messages/second • 31 MB / second • 3.0405 ms average latency between producer / consumer 1.3 Trillion messages per day 330 Terabytes in/day 1.2 Petabytes out/day Peak load for a single cluster 2 million messages/sec 4.7 Gigabits/sec inbound 15 Gigabits/sec outbound http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines https://engineering.linkedin.com/kafka/running-kafka-scale
  • 31. Kafka Connector Hub – Certified Connectors Source: http://www.confluent.io/product/connectors
  • 32. Kafka Connector Hub – Additional Connectors Source: http://www.confluent.io/product/connectors
  • 33. Kafka Connect – Twitter example ./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties ../demo-config/twitter-source.properties name=twitter-source connector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnector tasks.max=1 topic=tweets twitter.consumerkey=<consumer-key> twitter.consumersecret=<consumer-secret> twitter.token=<token> twitter.secret=<token-secret> track.terms=bigdata bootstrap.servers=localhost:9095,localhost:9096,localhost:9097 key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter ...
  • 35. Kafka Streams • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Part of open source Apache Kafka, introduced in 0.10+ • Leverages Kafka as its internal messaging layer • agnostic to resource management and configuration tools • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model
  • 36. Streams API in the context of Kafka Source: Confluent
  • 37. Kafka and "Big Data" / "Fast Data" Ecosystem
  • 38. Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache NiFi • Streamsets • Apache Flume • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Spring Integration Kafka Support • …Storm built-in Kafka Spout to consume events from Kafka
  • 39. Kafka in "Enterprise Architecture"
  • 40. Hadoop Clusterd Hadoop Cluster Big Data Cluster Traditional Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search Online & Mobile Apps Search NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing
  • 41. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – handle event stream data BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search Online & Mobile Apps Search Data Flow NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing
  • 42. Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Batch Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data
  • 43. Container Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Event Hub Event Hub Event Hub Parallel Batch ProcessingDistributed Filesystem Microservice NoSQLRDBMS SQL Search BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data { } API
  • 45. Confluent Data Platform 3.2 Source: Confluent
  • 46. Confluent Data Platform 3.2 Source: Confluent
  • 47. Confluent Enterprise – Control Center Source: Confluent
  • 49. Summary • Kafka can scale to millions of messages per second, and more • Easy to start in a Proof of Concept (PoC), but more to invest to setup a production environment • Monitoring is key • Vibrant community and ecosystem • Fast paced technology • Confluent provides distribution and support for Apache Kafka • Oracle Event Hub Service offers a Kafka Managed Service