SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Apache Kafka
Scalable Message Processing and more!
Guido Schmutz - 24.4.2017
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 20 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Member of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Apache Kafka - Scalable Message Processing and more!
Agenda
1. Introduction & Motivation
2. Kafka Core
3. Kafka Connect
4. Kafka Streams
5. Kafka and "Big Data" / "Fast Data" Ecosystem
6. Kafka in Enterprise Architecture
7. Confluent Data Platform
8. Summary
Apache Kafka - Scalable Message Processing and more!
Introduction & Motivation
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Overview
Distributed publish-subscribe messaging system
Designed for processing of real time activity stream data (logs, metrics
collections, social media streams, …)
Initially developed at LinkedIn, now part of Apache
Does not use JMS API and standards
Kafka maintains feeds of messages in topics
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Motivation
LinkedIn’s motivation for Kafka was:
• "A unified platform for handling all the real-time data feeds a large company might
have."
Must haves
• High throughput to support high volume event feeds
• Support real-time processing of these feeds to create new, derived feeds.
• Support large data backlogs to handle periodic ingestion from offline systems
• Support low-latency delivery to handle more traditional messaging use cases
• Guarantee fault-tolerance in the presence of machine failures
Apache Kafka - Scalable Message Processing and more!
Apache Kafka History
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Apache Kafka - Unix Analogy
Apache Kafka - Scalable Message Processing and more!
$ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt
Kafka	Connect	API Kafka	Connect	APIKafka	Streams	API
Kafka	Core	(Cluster)
Source:	Confluent
Kafka Core
Apache Kafka - Scalable Message Processing and more!
Kafka High Level Architecture
The who is who
• Producers write data to brokers.
• Consumers read data from
brokers.
• All this is distributed.
The data
• Data is stored in topics.
• Topics are split into partitions,
which are replicated.
Kafka Cluster
Consumer Consumer Consumer
Producer Producer Producer
Broker 1 Broker 2 Broker 3
Zookeeper
Ensemble
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor1 2 3 4 5 6
Truck
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Architecture
Kafka Broker
Movement
Processor
Movement	Topic
Engine-Metrics	Topic
1 2 3 4 5 6
Engine
Processor
Partition	0
1 2 3 4 5 6
Partition	0
1 2 3 4 5 6
Partition	1 Movement
Processor
Truck
Apache Kafka - Scalable Message Processing and more!
Apache
Kafka
Kafka Broker 1
Movement
Processor
Truck
Movement	Topic
P	0
Movement
Processor
1 2 3 4 5
P	2 1 2 3 4 5
Kafka Broker 2
Movement	Topic
P	2 1 2 3 4 5
P	1 1 2 3 4 5
Kafka Broker 3
Movement	Topic
P	0 1 2 3 4 5
P	1 1 2 3 4 5
Movement
Processor
Apache Kafka - Architecture
• Write Ahead Log / Commit Log
• Producers always append to tail
• think append to file
Kafka Broker
Movement	Topic
1 2 3 4 5
Truck
6 6
Apache Kafka - Scalable Message Processing and more!
Kafka Topics
Creating a topic
• Command line interface
• Using AdminUtils.createTopic method
• Auto-create via auto.create.topics.enable = true
Modifying a topic
https://kafka.apache.org/documentation.html#basic_ops_modify_topic
Deleting a topic
• Command Line interface
$ kafka-topics.sh –zookeeper zk1:2181 --create 
--topic my.topic –-partitions 3 
–-replication-factor 2 --config x=y
Apache Kafka - Scalable Message Processing and more!
Kafka Producer
Apache Kafka - Scalable Message Processing and more!
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092");
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =
new ProducerRecord<>(”topicName", ”Key", ”Value");
try {
producer.send(record);
} catch (Exception e) {}
Durability Guarantees
Producer can configure acknowledgements
Apache Kafka - Scalable Message Processing and more!
Value Description Throughput Latency Durability
0 • Producer	doesn’t	wait	for	leader high low low (no	
guarantee)
1	
(default)
• Producer	waits	for	leader
• Leader	sends ack when	message	
written	to	log
• No	wait	for	followers
medium medium medium	
(leader)
all	(-1) • Producer	waits	for	leader
• Leader	sends	ack when all	In-Sync	
Replica	have	acknowledged
low high high	(ISR)
Apache Kafka - Partition offsets
Offset: messages in the partitions are each assigned a unique (per partition) and
sequential id called the offset
• Consumers track their pointers via (offset, partition, topic) tuples
Consumer	Group	A Consumer	Group	B
Apache Kafka - Scalable Message Processing and more!
Source:	Apache	Kafka
Data Retention – 3 options
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based (entries with same key are removed)
kafka-topics.sh --zookeeper localhost:2181 
--create --topic customers 
--replication-factor 1 --partitions 1 
--config cleanup.policy=compact
Apache Kafka - Scalable Message Processing and more!
Apache Kafka – Some numbers
Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics
Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster
• 445’622 messages/second
• 31 MB / second
• 3.0405 ms average latency between producer / consumer
1.3	Trillion	messages	per	
day
330	Terabytes	in/day
1.2	Petabytes	out/day
Peak	load	for	a	single	cluster
2	million	messages/sec
4.7	Gigabits/sec	inbound
15	Gigabits/sec	outbound
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
https://engineering.linkedin.com/kafka/running-kafka-scale
Apache Kafka - Scalable Message Processing and more!
Kafka Connect
Apache Kafka - Scalable Message Processing and more!
Kafka Connect Architecture
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Kafka Connector Hub – Certified Connectors
Source:	http://www.confluent.io/product/connectors
Apache Kafka - Scalable Message Processing and more!
Kafka Connector Hub – Additional Connectors
Source:	http://www.confluent.io/product/connectors
Apache Kafka - Scalable Message Processing and more!
Kafka Connect – Twitter example
Apache Kafka - Scalable Message Processing and more!
./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties
../demo-config/twitter-source.properties
name=twitter-source
connector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnector
tasks.max=1
topic=tweets
twitter.consumerkey=<consumer-key>
twitter.consumersecret=<consumer-secret>
twitter.token=<token>
twitter.secret=<token-secret>
track.terms=bigdata
bootstrap.servers=localhost:9095,localhost:9096,localhost:9097
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
...
Kafka Streams
Apache Kafka - Scalable Message Processing and more!
Kafka Streams
• Designed as a simple and lightweight library in Apache
Kafka
• no external dependencies on systems other than Apache
Kafka
• Part of open source Apache Kafka, introduced in 0.10+
• Leverages Kafka as its internal messaging layer
• agnostic to resource management and configuration tools
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond
latency
• Windowing with out-of-order data using a Google DataFlow-like
model
Apache Kafka - Scalable Message Processing and more!
Streams API in the context of Kafka
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Kafka and "Big Data" / "Fast Data"
Ecosystem
Apache Kafka - Scalable Message Processing and more!
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular
products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache NiFi
• Streamsets
• Apache Flume
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Spring Integration Kafka Support
• …Storm	built-in	Kafka	Spout	to	consume	events	from	Kafka
Apache Kafka - Scalable Message Processing and more!
Kafka in “Enterprise Architecture”
Apache Kafka - Scalable Message Processing and more!
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Traditional Big Data Architecture
BI	Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search
Online	&	Mobile	
Apps
Search
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Apache Kafka - Scalable Message Processing and more!
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – handle event stream data
BI	Tools
Enterprise Data
Warehouse
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Event
Hub
Call
Center
Weather
Data
Mobile
Apps
SQL
Search
Online	&	Mobile	
Apps
Search
Data Flow
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine	Learning
• Graph	Algorithms
• Natural	Language	Processing
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – taking Velocity into account
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Event
Hub
Event
Hub
Event
Hub
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Apache Kafka - Scalable Message Processing and more!
Container
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – Asynchronous Microservice Architecture
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Event
Hub
Event
Hub
Event
Hub
Parallel
Batch
ProcessingDistributed
Filesystem
Microservice
NoSQLRDBMS
SQL
Search
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
File Import / SQL Import
Weather
Data
Apache Kafka - Scalable Message Processing and more!
{		}
API
Confluent Platform
Apache Kafka - Scalable Message Processing and more!
Confluent Data Platform 3.2
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Confluent Data Platform 3.2
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Confluent Enterprise – Control Center
Apache Kafka - Scalable Message Processing and more!
Source:	Confluent
Summary
Apache Kafka - Scalable Message Processing and more!
Summary
• Kafka can scale to millions of messages per second, and more
• Easy to start in a Proof of Concept (PoC), but more to invest to setup a production
environment
• Monitoring is key
• Vibrant community and ecosystem
• Fast paced technology
• Confluent provides distribution and support for Apache Kafka
•
Oracle Event Hub Service offers a Kafka Managed Service
Apache Kafka - Scalable Message Processing and more!
Weather
Data
SQL Import
Hadoop Clusterd
Hadoop Cluster
Hadoop Cluster
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Event
Hub
Event
Hub
Event
Hub
NoSQL
Parallel
Processing
Distributed
Filesystem
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
Dashboard
BI	Tools
Enterprise Data
Warehouse
Search
Online	&	Mobile	
Apps
Customer Event Hub – mapping of technologies
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
Technology Manager
guido.schmutz@trivadis.com
Apache Kafka - Scalable Message Processing and more!
@gschmutz guidoschmutz.wordpress.com

More Related Content

What's hot (20)

PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
PDF
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
PDF
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
PPTX
Kafka for data scientists
Jenn Rawlins
 
PDF
Ingesting streaming data into Graph Database
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Simplify Governance of Streaming Data
confluent
 
PDF
Evolving from Messaging to Event Streaming
confluent
 
PDF
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
PDF
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
HostedbyConfluent
 
PDF
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
PDF
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
confluent
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Microservices with Kafka Ecosystem
Guido Schmutz
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
DataStax
 
Kafka for data scientists
Jenn Rawlins
 
Ingesting streaming data into Graph Database
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Simplify Governance of Streaming Data
confluent
 
Evolving from Messaging to Event Streaming
confluent
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Writing Blazing Fast, and Production-Ready Kafka Streams apps in less than 30...
HostedbyConfluent
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
dotScale 2017 Keynote: The Rise of Real Time by Neha Narkhede
confluent
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Partner Development Guide for Kafka Connect
confluent
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Introduction to Stream Processing
Guido Schmutz
 
Microservices with Kafka Ecosystem
Guido Schmutz
 

Similar to Apache Kafka - Scalable Message-Processing and more ! (20)

PDF
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
PDF
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
PPTX
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
PDF
What is apache Kafka?
Kenny Gorman
 
PDF
What is Apache Kafka®?
Eventador
 
PDF
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
PDF
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Joan Viladrosa Riera
 
PDF
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
PDF
Connecting Apache Kafka With Mule ESB
Jitendra Bafna
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
PDF
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
PDF
Spark streaming + kafka 0.10
Joan Viladrosa Riera
 
PPTX
Training
HemantDunga1
 
PDF
Jug - ecosystem
Florent Ramiere
 
PPT
Kafka Explainaton
NguyenChiHoangMinh
 
PDF
Chti jug - 2018-06-26
Florent Ramiere
 
PDF
Python Kafka Integration: Developers Guide
Inexture Solutions
 
PDF
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
PDF
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis
 
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
What is apache Kafka?
Kenny Gorman
 
What is Apache Kafka®?
Eventador
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Kai Wähner
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
Joan Viladrosa Riera
 
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 
Connecting Apache Kafka With Mule ESB
Jitendra Bafna
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Spark streaming + kafka 0.10
Joan Viladrosa Riera
 
Training
HemantDunga1
 
Jug - ecosystem
Florent Ramiere
 
Kafka Explainaton
NguyenChiHoangMinh
 
Chti jug - 2018-06-26
Florent Ramiere
 
Python Kafka Integration: Developers Guide
Inexture Solutions
 
Spark (Structured) Streaming vs. Kafka Streams
Guido Schmutz
 
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann
 
Ad

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualisation
Guido Schmutz
 
PDF
Kafka as an event store - is it good enough?
Guido Schmutz
 
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
PDF
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Ad

Recently uploaded (20)

PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PPTX
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PDF
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Import Data Form Excel to Tally Services
Tally xperts
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
An Introduction to ZAP by Checkmarx - Official Version
Simon Bennetts
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
Executive Business Intelligence Dashboards
vandeslie24
 
GridView,Recycler view, API, SQLITE& NetworkRequest.pdf
Nabin Dhakal
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Platform for Enterprise Solution - Java EE5
abhishekoza1981
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Unlock Efficiency with Insurance Policy Administration Systems
Insurance Tech Services
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 

Apache Kafka - Scalable Message-Processing and more !

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Apache Kafka Scalable Message Processing and more! Guido Schmutz - 24.4.2017 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Member of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: [email protected] Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Apache Kafka - Scalable Message Processing and more!
  • 3. Agenda 1. Introduction & Motivation 2. Kafka Core 3. Kafka Connect 4. Kafka Streams 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Enterprise Architecture 7. Confluent Data Platform 8. Summary Apache Kafka - Scalable Message Processing and more!
  • 4. Introduction & Motivation Apache Kafka - Scalable Message Processing and more!
  • 5. Apache Kafka - Overview Distributed publish-subscribe messaging system Designed for processing of real time activity stream data (logs, metrics collections, social media streams, …) Initially developed at LinkedIn, now part of Apache Does not use JMS API and standards Kafka maintains feeds of messages in topics Apache Kafka - Scalable Message Processing and more!
  • 6. Apache Kafka - Motivation LinkedIn’s motivation for Kafka was: • "A unified platform for handling all the real-time data feeds a large company might have." Must haves • High throughput to support high volume event feeds • Support real-time processing of these feeds to create new, derived feeds. • Support large data backlogs to handle periodic ingestion from offline systems • Support low-latency delivery to handle more traditional messaging use cases • Guarantee fault-tolerance in the presence of machine failures Apache Kafka - Scalable Message Processing and more!
  • 7. Apache Kafka History Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 8. Apache Kafka - Unix Analogy Apache Kafka - Scalable Message Processing and more! $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt Kafka Connect API Kafka Connect APIKafka Streams API Kafka Core (Cluster) Source: Confluent
  • 9. Kafka Core Apache Kafka - Scalable Message Processing and more!
  • 10. Kafka High Level Architecture The who is who • Producers write data to brokers. • Consumers read data from brokers. • All this is distributed. The data • Data is stored in topics. • Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Apache Kafka - Scalable Message Processing and more!
  • 11. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor1 2 3 4 5 6 Truck Apache Kafka - Scalable Message Processing and more!
  • 12. Apache Kafka - Architecture Kafka Broker Movement Processor Movement Topic Engine-Metrics Topic 1 2 3 4 5 6 Engine Processor Partition 0 1 2 3 4 5 6 Partition 0 1 2 3 4 5 6 Partition 1 Movement Processor Truck Apache Kafka - Scalable Message Processing and more!
  • 13. Apache Kafka Kafka Broker 1 Movement Processor Truck Movement Topic P 0 Movement Processor 1 2 3 4 5 P 2 1 2 3 4 5 Kafka Broker 2 Movement Topic P 2 1 2 3 4 5 P 1 1 2 3 4 5 Kafka Broker 3 Movement Topic P 0 1 2 3 4 5 P 1 1 2 3 4 5 Movement Processor
  • 14. Apache Kafka - Architecture • Write Ahead Log / Commit Log • Producers always append to tail • think append to file Kafka Broker Movement Topic 1 2 3 4 5 Truck 6 6 Apache Kafka - Scalable Message Processing and more!
  • 15. Kafka Topics Creating a topic • Command line interface • Using AdminUtils.createTopic method • Auto-create via auto.create.topics.enable = true Modifying a topic https://kafka.apache.org/documentation.html#basic_ops_modify_topic Deleting a topic • Command Line interface $ kafka-topics.sh –zookeeper zk1:2181 --create --topic my.topic –-partitions 3 –-replication-factor 2 --config x=y Apache Kafka - Scalable Message Processing and more!
  • 16. Kafka Producer Apache Kafka - Scalable Message Processing and more! private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker1:9092,broker2:9092"); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>(”topicName", ”Key", ”Value"); try { producer.send(record); } catch (Exception e) {}
  • 17. Durability Guarantees Producer can configure acknowledgements Apache Kafka - Scalable Message Processing and more! Value Description Throughput Latency Durability 0 • Producer doesn’t wait for leader high low low (no guarantee) 1 (default) • Producer waits for leader • Leader sends ack when message written to log • No wait for followers medium medium medium (leader) all (-1) • Producer waits for leader • Leader sends ack when all In-Sync Replica have acknowledged low high high (ISR)
  • 18. Apache Kafka - Partition offsets Offset: messages in the partitions are each assigned a unique (per partition) and sequential id called the offset • Consumers track their pointers via (offset, partition, topic) tuples Consumer Group A Consumer Group B Apache Kafka - Scalable Message Processing and more! Source: Apache Kafka
  • 19. Data Retention – 3 options 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed) kafka-topics.sh --zookeeper localhost:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Apache Kafka - Scalable Message Processing and more!
  • 20. Apache Kafka – Some numbers Kafka at LinkedIn => over 1800+ broker machines / 79K+ Topics Kafka Performance at our own infrastructure => 6 brokers (VM) / 1 cluster • 445’622 messages/second • 31 MB / second • 3.0405 ms average latency between producer / consumer 1.3 Trillion messages per day 330 Terabytes in/day 1.2 Petabytes out/day Peak load for a single cluster 2 million messages/sec 4.7 Gigabits/sec inbound 15 Gigabits/sec outbound http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines https://engineering.linkedin.com/kafka/running-kafka-scale Apache Kafka - Scalable Message Processing and more!
  • 21. Kafka Connect Apache Kafka - Scalable Message Processing and more!
  • 22. Kafka Connect Architecture Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 23. Kafka Connector Hub – Certified Connectors Source: http://www.confluent.io/product/connectors Apache Kafka - Scalable Message Processing and more!
  • 24. Kafka Connector Hub – Additional Connectors Source: http://www.confluent.io/product/connectors Apache Kafka - Scalable Message Processing and more!
  • 25. Kafka Connect – Twitter example Apache Kafka - Scalable Message Processing and more! ./connect-standalone.sh ../demo-config/connect-simple-source-standalone.properties ../demo-config/twitter-source.properties name=twitter-source connector.class=com.eneco.trading.kafka.connect.twitter.TwitterSourceConnector tasks.max=1 topic=tweets twitter.consumerkey=<consumer-key> twitter.consumersecret=<consumer-secret> twitter.token=<token> twitter.secret=<token-secret> track.terms=bigdata bootstrap.servers=localhost:9095,localhost:9096,localhost:9097 key.converter=org.apache.kafka.connect.storage.StringConverter value.converter=org.apache.kafka.connect.storage.StringConverter ...
  • 26. Kafka Streams Apache Kafka - Scalable Message Processing and more!
  • 27. Kafka Streams • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Part of open source Apache Kafka, introduced in 0.10+ • Leverages Kafka as its internal messaging layer • agnostic to resource management and configuration tools • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model Apache Kafka - Scalable Message Processing and more!
  • 28. Streams API in the context of Kafka Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 29. Kafka and "Big Data" / "Fast Data" Ecosystem Apache Kafka - Scalable Message Processing and more!
  • 30. Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache NiFi • Streamsets • Apache Flume • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Spring Integration Kafka Support • …Storm built-in Kafka Spout to consume events from Kafka Apache Kafka - Scalable Message Processing and more!
  • 31. Kafka in “Enterprise Architecture” Apache Kafka - Scalable Message Processing and more!
  • 32. Hadoop Clusterd Hadoop Cluster Big Data Cluster Traditional Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search Online & Mobile Apps Search NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing Apache Kafka - Scalable Message Processing and more!
  • 33. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – handle event stream data BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps SQL Search Online & Mobile Apps Search Data Flow NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing
  • 34. Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Batch Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data Apache Kafka - Scalable Message Processing and more!
  • 35. Container Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Event Hub Event Hub Event Hub Parallel Batch ProcessingDistributed Filesystem Microservice NoSQLRDBMS SQL Search BI Tools Enterprise Data Warehouse Search Online & Mobile Apps File Import / SQL Import Weather Data Apache Kafka - Scalable Message Processing and more! { } API
  • 36. Confluent Platform Apache Kafka - Scalable Message Processing and more!
  • 37. Confluent Data Platform 3.2 Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 38. Confluent Data Platform 3.2 Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 39. Confluent Enterprise – Control Center Apache Kafka - Scalable Message Processing and more! Source: Confluent
  • 40. Summary Apache Kafka - Scalable Message Processing and more!
  • 41. Summary • Kafka can scale to millions of messages per second, and more • Easy to start in a Proof of Concept (PoC), but more to invest to setup a production environment • Monitoring is key • Vibrant community and ecosystem • Fast paced technology • Confluent provides distribution and support for Apache Kafka • Oracle Event Hub Service offers a Kafka Managed Service Apache Kafka - Scalable Message Processing and more!
  • 42. Weather Data SQL Import Hadoop Clusterd Hadoop Cluster Hadoop Cluster Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Event Hub Event Hub Event Hub NoSQL Parallel Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search Dashboard BI Tools Enterprise Data Warehouse Search Online & Mobile Apps Customer Event Hub – mapping of technologies Apache Kafka - Scalable Message Processing and more!
  • 43. Guido Schmutz Technology Manager [email protected] Apache Kafka - Scalable Message Processing and more! @gschmutz guidoschmutz.wordpress.com