SlideShare a Scribd company logo
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Kafka Connect & Kafka Streams/KSQL
The Ecosystem around Kafka
Guido Schmutz – 4.4.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Kafka Connect & Streams - the Ecosystem around Kafka
Our company.
Kafka Connect & Streams - the Ecosystem around Kafka
Trivadis is a market leader in IT consulting, system integration, solution engineering
and the provision of IT services focusing on and
technologies
in Switzerland, Germany, Austria and Denmark. We offer our services in the following
strategic business fields:
Trivadis Services takes over the interacting operation of your IT systems.
O P E R A T I O N
COPENHAGEN
MUNICH
LAUSANNE
BERN
ZURICH
BRUGG
GENEVA
HAMBURG
DÜSSELDORF
FRANKFURT
STUTTGART
FREIBURG
BASEL
VIENNA
With over 600 specialists and IT experts in your region.
Kafka Connect & Streams - the Ecosystem around Kafka
14 Trivadis branches and more than
600 employees
200 Service Level Agreements
Over 4,000 training participants
Research and development budget:
CHF 5.0 million
Financially self-supporting and
sustainably profitable
Experience from more than 1,900
projects per year at over 800
customers
Agenda
1. What is Apache Kafka?
2. Kafka Connect
3. Kafka Streams
4. KSQL
5. Kafka and "Big Data" / "Fast Data" Ecosystem
6. Kafka in Software Architecture
Kafka Connect & Streams - the Ecosystem around Kafka
What is Apache Kafka?
Kafka Connect & Streams - the Ecosystem around Kafka
Apache Kafka History
2012 2013 2014 2015 2016 2017
Cluster mirroring
data compression
Intra-cluster
replication
0.7
0.8
0.9
Data Processing
(Streams API)
0.10
Data Integration
(Connect API)
0.11
2018
Exactly Once
Semantics
Performance
Improvements
KSQL Developer
Preview
Kafka Connect & Streams - the Ecosystem around Kafka
1.0 JBOD Support
Support Java 9
Apache Kafka – A Streaming Platform
Kafka Connect & Kafka Streams/KSQL
High-Level Architecture
Distributed Log at the Core
Scale-Out Architecture
Logs do not (necessarily) forget
Strong Ordering Guarantees
most business systems need strong
ordering guarantees
messages that require relative
ordering need to be sent to the same
partition
supply same key for
all messages that
require a relative order
To maintain global ordering use a
single partition topic
Producer 1
Consumer 1
Broker 1
Broker 2
Broker 3
Consumer 2
Consumer 3
Key-1
Key-2
Key-3
Key-4
Key-5
Key-6
Key-3
Key-1
Kafka Connect & Streams - the Ecosystem around Kafka
Durable and Highly Available Messaging
Producer 1
Broker 1
Broker 2
Broker 3
Producer 1
Broker 1
Broker 2
Broker 3
Consumer 1 Consumer 1
Consumer 2Consumer 2
Microservices with Kafka Ecosystem14
Durable and Highly Available Messaging (II)
Producer 1
Broker 1
Broker 2
Broker 3
Producer 1
Broker 1
Broker 2
Broker 3
Consumer 1 Consumer 1
Consumer
2
Consumer 2
Microservices with Kafka Ecosystem15
Hold Data for Long-Term – Data Retention
Producer 1
Broker 1
Broker 2
Broker 3
1. Never
2. Time based (TTL)
log.retention.{ms | minutes | hours}
3. Size based
log.retention.bytes
4. Log compaction based
(entries with same key are removed):
kafka-topics.sh --zookeeper zk:2181 
--create --topic customers 
--replication-factor 1 
--partitions 1 
--config cleanup.policy=compact
Kafka Connect & Streams - the Ecosystem around Kafka
Keep Topics in Compacted Form
0 1 2 3 4 5 6 7 8 9 10 11
K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
Offset
Key
Value
3 4 6 8 9 10
K1 K3 K4 K5 K2 K6
V4 V5 V7 V9 V10 V11
Offset
Key
Value
Compaction
Kafka Connect & Streams - the Ecosystem around Kafka
V1 V2
V3
V4 V5
V6
V7
V8
V9V10 V11
K1 K3 K4 K5K2 K6
How to get a Kafka environment
Kafka Connect & Streams - the Ecosystem around Kafka
On Premises
• Bare Metal Installation
• Docker
• Mesos / Kubernetes
• Hadoop Distributions
Cloud
• Oracle Event Hub Cloud Service
• Azure HDInsight Kafka
• Confluent Cloud
• …
Demo (I)
Truck-2
truck
position
Truck-1
Truck-3
console
consumer
Testdata-Generator by Hortonworks
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (I) – Create Kafka Topic
$ kafka-topics --zookeeper zookeeper:2181 --create 
--topic truck_position --partitions 8 --replication-factor 1
$ kafka-topics --zookeeper zookeeper:2181 –list
__consumer_offsets
_confluent-metrics
_schemas
docker-connect-configs
docker-connect-offsets
docker-connect-status
truck_position
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (I) – Run Producer and Kafka-Console-Consumer
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (I) – Java Producer to "truck_position"
Constructing a Kafka Producer
private Properties kafkaProps = new Properties();
kafkaProps.put("bootstrap.servers","broker-1:9092);
kafkaProps.put("key.serializer", "...StringSerializer");
kafkaProps.put("value.serializer", "...StringSerializer");
producer = new KafkaProducer<String, String>(kafkaProps);
ProducerRecord<String, String> record =
new ProducerRecord<>("truck_position", driverId, eventData);
try {
metadata = producer.send(record).get();
} catch (Exception e) {}
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (II) – devices send to MQTT instead of Kafka
Truck-2
truck/nn/
position
Truck-1
Truck-3
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (II) – devices send to MQTT instead of Kafka
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (II) - devices send to MQTT instead of Kafka –
how to get the data into Kafka?
Truck-2
truck/nn/
position
Truck-1
Truck-3
truck
position raw
?
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Apache Kafka – wait there is more!
Microservices with Kafka Ecosystem27
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Kafka Connect
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect - Overview
Source
Connector
Sink
Connector
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect – Single Message Transforms (SMT)
Simple Transformations for a single message
Defined as part of Kafka Connect
• some useful transforms provided out-of-the-box
• Easily implement your own
Optionally deploy 1+ transforms with each
connector
• Modify messages produced by source
connector
• Modify messages sent to sink connectors
Makes it much easier to mix and match connectors
Some of currently available
transforms:
• InsertField
• ReplaceField
• MaskField
• ValueToKey
• ExtractField
• TimestampRouter
• RegexRouter
• SetSchemaMetaData
• Flatten
• TimestampConverter
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect – Many Connectors
60+ since first release (0.9+)
20+ from Confluent and Partners
Source: http://www.confluent.io/product/connectors
Confluent supported Connectors
Certified Connectors Community Connectors
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (III) – Create MQTT Connect through REST API
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "mqtt-source",
"config": {
"connector.class":
"com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector",
"connect.mqtt.connection.timeout": "1000",
"tasks.max": "1",
"connect.mqtt.kcql":
"INSERT INTO truck_position SELECT * FROM truck/+/position",
"name": "MqttSourceConnector",
"connect.mqtt.service.quality": "0",
"connect.mqtt.client.id": "tm-mqtt-connect-01",
"connect.mqtt.converter.throw.on.error": "true",
"connect.mqtt.hosts": "tcp://mosquitto:1883"
}
}'
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III) – Call REST API and Kafka Console
Consumer
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (III)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position
console
consumer
what about some
analytics ?
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Kafka Streams
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Streams - Overview
• Designed as a simple and lightweight library in Apache
Kafka
• no external dependencies on systems other than Apache
Kafka
• Part of open source Apache Kafka, introduced in 0.10+
• Leverages Kafka as its internal messaging layer
• Supports fault-tolerant local state
• Event-at-a-time processing (not microbatch) with millisecond
latency
• Windowing with out-of-order data using a Google DataFlow-like
model
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Stream DSL and Processor Topology
KStream<Integer, String> stream1 =
builder.stream("in-1");
KStream<Integer, String> stream2=
builder.stream("in-2");
KStream<Integer, String> joined =
stream1.leftJoin(stream2, …);
KTable<> aggregated =
joined.groupBy(…).count("store");
aggregated.to("out-1");
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Streams Cluster
Processor Topology
Kafka Cluster
input-1
input-2
store (changelog)
output
1 2
lj
a
t
State
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Cluster
Processor Topology
input-1
Partition 0
Partition 1
Partition 2
Partition 3
input-2
Partition 0
Partition 1
Partition 2
Partition 3
Kafka Streams 1
Kafka Streams 2
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Cluster
Processor Topology
input-1
Partition 0
Partition 1
Partition 2
Partition 3
input-2
Partition 0
Partition 1
Partition 2
Partition 3
Kafka Streams 1 Kafka Streams 2
Kafka Streams 3 Kafka Streams 4
Kafka Connect & Streams - the Ecosystem around Kafka
Stream-Table Duality
We can view a table as a stream
We can view a stream as a table
A stream can be considered a
changelog of a table, where each
data record in the stream captures
a state change of the table
A table can be considered a
snapshot of the latest value for
each key in a stream
Source: Confluent
Stream vs. Table
Event Stream State Stream (Change Log Stream)
2017-10-02T20:18:46 11,Normal,41.87,-87.67
2017-10-02T20:18:55 11,Normal,40.38,-89.17
2017-10-02T20:18:59 21,Normal,42.23,-91.78
2017-10-02T20:19:01 21,Normal,41.71,-91.32
2017-10-02T20:19:02 11,Normal,38.65,-90.2
2017-10-02T20:19:23 21,Normal41.71,-91.32
11 2017-10-02T20:18:46,11,Normal,41.87,-87.67
11 2017-10-02T20:18:55,11,Normal,40.38,-89.17
21 2017-10-02T20:18:59, 21,Normal,42.23,-91.78
21 2017-10-02T20:19:01,21,Normal,41.71,-91.32
11 2017-10-02T20:19:02,11,Normal,38.65,-90.2
21 2017-10-02T20:19:23,21,Normal41.71,-91.32
Kafka Connect & Streams - the Ecosystem around Kafka
KStream KTable
Kafka Streams: Key Features
Kafka Connect & Streams - the Ecosystem around Kafka
• Native, 100%-compatible Kafka integration
• Secure stream processing using Kafka's security features
• Elastic and highly scalable
• Fault-tolerant
• Stateful and stateless computations
• Interactive queries
• Time model
• Windowing
• Supports late-arriving and out-of-order data
• Millisecond processing latency, no micro-batching
• At-least-once and exactly-once processing guarantees
Demo (IV)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position_s
detect_danger
ous_driving
dangerous_
driving
console
consumer
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (IV) - Create Stream
final KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> source =
builder.stream(stringSerde, stringSerde, "truck_position");
KStream<String, TruckPosition> positions =
source.map((key,value) ->
new KeyValue<>(key, TruckPosition.create(key,value)));
KStream<String, TruckPosition> filtered =
positions.filter(TruckPosition::filterNonNORMAL);
filtered.map((key,value) -> new KeyValue<>(key,value.toCSV()))
.to("dangerous_driving");
Kafka Connect & Streams - the Ecosystem around Kafka
KSQL
Kafka Connect & Streams - the Ecosystem around Kafka
KSQL: a Streaming SQL Engine for Apache Kafka
• Enables stream processing with zero coding required
• The simples way to process streams of data in real-time
• Powered by Kafka and Kafka Streams: scalable, distributed, mature
• All you need is Kafka – no complex deployments
• available as Developer preview!
• STREAM and TABLE as first-class citizens
• STREAM = data in motion
• TABLE = collected state of a stream
• join STREAM and TABLE
Kafka Connect & Streams - the Ecosystem around Kafka
KSQL Deployment Models
Standalone Mode Cluster Mode
Source: Confluent
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt to
kafka
truck_
position_s
detect_danger
ous_driving
dangerous_
driving
console
consumer
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (IV) - Start Kafka KSQL
$ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092
======================================
= _ __ _____ ____ _ =
= | |/ // ____|/ __ | | =
= | ' /| (___ | | | | | =
= | < ___ | | | | | =
= | .  ____) | |__| | |____ =
= |_|______/ __________| =
= =
= Streaming SQL Engine for Kafka =
Copyright 2017 Confluent Inc.
CLI v0.1, Server v0.1 located at http://localhost:9098
Having trouble? Type 'help' (case-insensitive) for a rundown of how things work!
ksql>
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV) - Create Stream
ksql> CREATE STREAM truck_position_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_position', 
value_format='DELIMITED');
Message
----------------
Stream created
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV) - Create Stream
ksql> SELECT * FROM truck_position_s;
1522847870317 | "truck/13/position0 | 1522847870310 | 44 | 13 | 1390372503 |
Normal | 41.71 | -91.32 | -2458274393837068406
1522847870376 | "truck/14/position0 | 1522847870370 | 35 | 14 | 1961634315 |
Normal | 37.66 | -94.3 | -2458274393837068406
1522847870418 | "truck/21/position0 | 1522847870410 | 58 | 21 | 137128276 |
Normal | 36.17 | -95.99 | -2458274393837068406
1522847870397 | "truck/29/position0 | 1522847870390 | 18 | 29 | 1090292248 |
Normal | 41.67 | -91.24 | -2458274393837068406
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1522847914246 | "truck/11/position0 | 1522847914240 | 54 | 11 | 1198242881 |
Lane Departure | 40.86 | -89.91 | -2458274393837068406
1522847915125 | "truck/10/position0 | 1522847915120 | 93 | 10 | 1384345811 |
Overspeed | 40.38 | -89.17 | -2458274393837068406
1522847919216 | "truck/12/position0 | 1522847919210 | 75 | 12 | 24929475 |
Overspeed | 42.23 | -91.78 | -2458274393837068406
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV) - Create Stream
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (IV) - Create Stream
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1522848286143 | "truck/15/position0 | 1522848286125 | 98 | 15 | 987179512 |
Overspeed | 34.78 | -92.31 | -2458274393837068406
1522848295729 | "truck/11/position0 | 1522848295720 | 54 | 11 | 1198242881 |
Unsafe following distance | 38.43 | -90.35 | -2458274393837068406
1522848313018 | "truck/11/position0 | 1522848313000 | 54 | 11 | 1198242881 |
Overspeed | 41.87 | -87.67 | -2458274393837068406
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V)
Truck-2
truck/nn/
position
Truck-1
Truck-3
mqtt-
source
truck_
position
detect_danger
ous_driving
dangerous_
driving
Truck
Driver
jdbc-
source
trucking_
driver
join_dangerou
s_driving_driv
er
dangerous_dri
ving_driver
27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00
console
consumer
{"id":27,"firstName":"Walter",
"lastName":"Ward","available
":"Y","birthdate":"24-JUL-
85","last_update":150692305
2012}
Kafka Connect & Streams - the Ecosystem around Kafka
1522846456703,101,31,1927624662,Normal,37.31,-
94.31,-4802309397906690837
Demo (V) – Create JDBC Connect through REST API
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"trucking_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) – Create JDBC Connect through REST API
Kafka Connect & Streams - the Ecosystem around Kafka
Demo (V) - Create Table with Driver State
Kafka Connect & Streams - the Ecosystem around Kafka
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='trucking_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
Demo (V) - Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON') 
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance
1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure
1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail
distance
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka and "Big Data" / "Fast Data"
Ecosystem
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka and the Big Data / Fast Data ecosystem
Kafka integrates with many popular products / frameworks
• Apache Spark Streaming
• Apache Flink
• Apache Storm
• Apache Apex
• Apache NiFi
• StreamSets
• Oracle Stream Analytics
• Oracle Service Bus
• Oracle GoldenGate
• Oracle Event Hub Cloud Service
• Debezium CDC
• …
Additional Info: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka in Software Architecture
Kafka Connect & Streams - the Ecosystem around Kafka
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Traditional Big Data Architecture
BI Tools
Enterprise Data
Warehouse
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
File Import / SQL Import
SQL
Search / Explore
Online & Mobile
Apps
Search
SQL
Export
Service
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine Learning
• Graph Algorithms
• Natural Language Processing
Kafka Connect & Streams - the Ecosystem around Kafka
Event
Hub
Event
Hub
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – handle event stream data
BI Tools
Enterprise Data
Warehouse
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Event
Hub
Call
Center
Weather
Data
Mobile
Apps
File Import / SQL Import
SQL
Search / Explore
Online & Mobile
Apps
Search
SQL
Export
ServiceData Flow
Data
Flow
NoSQL
Parallel Batch
Processing
Distributed
Filesystem
• Machine Learning
• Graph Algorithms
• Natural Language Processing
C
hange
D
ata
C
apture
Kafka Connect & Streams - the Ecosystem around Kafka
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – taking Velocity into account
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Batch Analytics
Streaming Analytics
Results
Parallel Batch
Processing
Distributed
Filesystem
Stream Analytics
NoSQL
Reference /
Models
SQL
Search
SQL
Export
Service
Dashboard
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
Event
Hub
Event
Hub
Event
Hub
Data
Flow
Data
Flow
Data
Flow
Change
Data
Capture
Kafka Connect & Streams - the Ecosystem around Kafka
Container
Hadoop Clusterd
Hadoop Cluster
Big Data Cluster
Event Hub – Asynchronous Microservice Architecture
Location
Social
Click
stream
Sensor
Data
Billing &
Ordering
CRM /
Profile
Marketing
Campaigns
Call
Center
Mobile
Apps
Parallel
Batch
ProcessingDistributed
Filesystem
Microservice
NoSQLRDBMS
SQL
Search
SQL
Export
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
Online & Mobile
Apps
File Import / SQL Import
Weather
Data
{ }
API
Event
Hub
Event
Hub
Event
Hub
Data
Flow
Data
Flow
Data
Flow
Change
Data
Capture
Kafka Connect & Streams - the Ecosystem around Kafka
Kafka Connect & Streams - the Ecosystem around Kafka
Technology on its own won't help you.
You need to know how to use it properly.

More Related Content

What's hot (20)

PDF
Introduction to Stream Processing
Guido Schmutz
 
PDF
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
PDF
Microservices with Kafka Ecosystem
Guido Schmutz
 
PDF
Ingesting streaming data into Graph Database
Guido Schmutz
 
PDF
Internet of Things (IoT) - in the cloud or rather on-premises?
Guido Schmutz
 
PPTX
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
PDF
KSQL - Stream Processing simplified!
Guido Schmutz
 
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
PDF
Partner Development Guide for Kafka Connect
confluent
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
PDF
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
PDF
How to build 1000 microservices with Kafka and thrive
Natan Silnitsky
 
PDF
From Zero to Hero with Kafka Connect
confluent
 
PDF
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
PDF
Getting Started with Confluent Schema Registry
confluent
 
PDF
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
PDF
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 
Introduction to Stream Processing
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Microservices with Kafka Ecosystem
Guido Schmutz
 
Ingesting streaming data into Graph Database
Guido Schmutz
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Guido Schmutz
 
Apache Kafka at LinkedIn - How LinkedIn Customizes Kafka to Work at the Trill...
Jonghyun Lee
 
KSQL - Stream Processing simplified!
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Partner Development Guide for Kafka Connect
confluent
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
José Román Martín Gil
 
From Zero to Hero with Kafka Connect
confluent
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
How to build 1000 microservices with Kafka and thrive
Natan Silnitsky
 
From Zero to Hero with Kafka Connect
confluent
 
Apache Kafka - Scalable Message-Processing and more !
Guido Schmutz
 
Getting Started with Confluent Schema Registry
confluent
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
Natan Silnitsky
 
Confluent Workshop Series: ksqlDB로 스트리밍 앱 빌드
confluent
 

Similar to Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka (20)

PDF
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
PDF
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
PDF
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
PDF
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
PPTX
Event Streaming Architectures with Confluent and ScyllaDB
ScyllaDB
 
PDF
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
PDF
Building a Streaming Platform with Kafka
confluent
 
PDF
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
PDF
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
PDF
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
PDF
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
PDF
Kafka Connect by Datio
Datio Big Data
 
PDF
Chti jug - 2018-06-26
Florent Ramiere
 
PDF
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
PPTX
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
PPTX
Streaming Data and Stream Processing with Apache Kafka
confluent
 
PDF
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
PPTX
Service messaging using Kafka
Robert Vadai
 
PDF
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
PDF
Building event-driven (Micro)Services with Apache Kafka Ecosystem
Guido Schmutz
 
Beyond the brokers - Un tour de l'écosystème Kafka
Florent Ramiere
 
Beyond the brokers - A tour of the Kafka ecosystem
Damien Gasparina
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
confluent
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
Event Streaming Architectures with Confluent and ScyllaDB
ScyllaDB
 
Devoxx university - Kafka de haut en bas
Florent Ramiere
 
Building a Streaming Platform with Kafka
confluent
 
A la rencontre de Kafka, le log distribué par Florian GARCIA
La Cuisine du Web
 
Introduction to apache kafka, confluent and why they matter
Paolo Castagna
 
Apache Kafka as Event Streaming Platform for Microservice Architectures
Kai Wähner
 
Connect K of SMACK:pykafka, kafka-python or?
Micron Technology
 
Kafka Connect by Datio
Datio Big Data
 
Chti jug - 2018-06-26
Florent Ramiere
 
JHipster conf 2019 - Kafka Ecosystem
Florent Ramiere
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
Lucas Jellema
 
Streaming Data and Stream Processing with Apache Kafka
confluent
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Service messaging using Kafka
Robert Vadai
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Building event-driven (Micro)Services with Apache Kafka Ecosystem
Guido Schmutz
 
Ad

More from Guido Schmutz (20)

PDF
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
PDF
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
PDF
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
PDF
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
PDF
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
PDF
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
PDF
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
PDF
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualisation
Guido Schmutz
 
PDF
Kafka as an event store - is it good enough?
Guido Schmutz
 
PDF
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
PDF
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
PDF
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
PDF
Streaming Visualization
Guido Schmutz
 
30 Minutes to the Analytics Platform with Infrastructure as Code
Guido Schmutz
 
Event Broker (Kafka) in a Modern Data Architecture
Guido Schmutz
 
Big Data, Data Lake, Fast Data - Dataserialiation-Formats
Guido Schmutz
 
ksqlDB - Stream Processing simplified!
Guido Schmutz
 
Kafka as your Data Lake - is it Feasible?
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Event Hub (i.e. Kafka) in Modern Data (Analytics) Architecture
Guido Schmutz
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Apache Kafka
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Guido Schmutz
 
What is Apache Kafka? Why is it so popular? Should I use it?
Guido Schmutz
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Guido Schmutz
 
Location Analytics Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualisation
Guido Schmutz
 
Kafka as an event store - is it good enough?
Guido Schmutz
 
Solutions for bi-directional Integration between Oracle RDMBS & Apache Kafka
Guido Schmutz
 
Fundamentals Big Data and AI Architecture
Guido Schmutz
 
Location Analytics - Real-Time Geofencing using Kafka
Guido Schmutz
 
Streaming Visualization
Guido Schmutz
 
Ad

Recently uploaded (20)

DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Digital Circuits, important subject in CS
contactparinay1
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 

Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Kafka Connect & Kafka Streams/KSQL The Ecosystem around Kafka Guido Schmutz – 4.4.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: [email protected] Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect & Streams - the Ecosystem around Kafka
  • 3. Our company. Kafka Connect & Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  • 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Kafka Connect & Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  • 5. Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
  • 6. What is Apache Kafka? Kafka Connect & Streams - the Ecosystem around Kafka
  • 7. Apache Kafka History 2012 2013 2014 2015 2016 2017 Cluster mirroring data compression Intra-cluster replication 0.7 0.8 0.9 Data Processing (Streams API) 0.10 Data Integration (Connect API) 0.11 2018 Exactly Once Semantics Performance Improvements KSQL Developer Preview Kafka Connect & Streams - the Ecosystem around Kafka 1.0 JBOD Support Support Java 9
  • 8. Apache Kafka – A Streaming Platform Kafka Connect & Kafka Streams/KSQL High-Level Architecture Distributed Log at the Core Scale-Out Architecture Logs do not (necessarily) forget
  • 9. Strong Ordering Guarantees most business systems need strong ordering guarantees messages that require relative ordering need to be sent to the same partition supply same key for all messages that require a relative order To maintain global ordering use a single partition topic Producer 1 Consumer 1 Broker 1 Broker 2 Broker 3 Consumer 2 Consumer 3 Key-1 Key-2 Key-3 Key-4 Key-5 Key-6 Key-3 Key-1 Kafka Connect & Streams - the Ecosystem around Kafka
  • 10. Durable and Highly Available Messaging Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2Consumer 2 Microservices with Kafka Ecosystem14
  • 11. Durable and Highly Available Messaging (II) Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2 Consumer 2 Microservices with Kafka Ecosystem15
  • 12. Hold Data for Long-Term – Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 --create --topic customers --replication-factor 1 --partitions 1 --config cleanup.policy=compact Kafka Connect & Streams - the Ecosystem around Kafka
  • 13. Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Kafka Connect & Streams - the Ecosystem around Kafka V1 V2 V3 V4 V5 V6 V7 V8 V9V10 V11 K1 K3 K4 K5K2 K6
  • 14. How to get a Kafka environment Kafka Connect & Streams - the Ecosystem around Kafka On Premises • Bare Metal Installation • Docker • Mesos / Kubernetes • Hadoop Distributions Cloud • Oracle Event Hub Cloud Service • Azure HDInsight Kafka • Confluent Cloud • …
  • 15. Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer Testdata-Generator by Hortonworks Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 16. Demo (I) – Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 –list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect & Streams - the Ecosystem around Kafka
  • 17. Demo (I) – Run Producer and Kafka-Console-Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  • 18. Demo (I) – Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect & Streams - the Ecosystem around Kafka
  • 19. Demo (II) – devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 20. Demo (II) – devices send to MQTT instead of Kafka Kafka Connect & Streams - the Ecosystem around Kafka
  • 21. Demo (II) - devices send to MQTT instead of Kafka – how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 22. Apache Kafka – wait there is more! Microservices with Kafka Ecosystem27 Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 23. Kafka Connect Kafka Connect & Streams - the Ecosystem around Kafka
  • 24. Kafka Connect - Overview Source Connector Sink Connector Kafka Connect & Streams - the Ecosystem around Kafka
  • 25. Kafka Connect – Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect • some useful transforms provided out-of-the-box • Easily implement your own Optionally deploy 1+ transforms with each connector • Modify messages produced by source connector • Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: • InsertField • ReplaceField • MaskField • ValueToKey • ExtractField • TimestampRouter • RegexRouter • SetSchemaMetaData • Flatten • TimestampConverter Kafka Connect & Streams - the Ecosystem around Kafka
  • 26. Kafka Connect – Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source: http://www.confluent.io/product/connectors Confluent supported Connectors Certified Connectors Community Connectors Kafka Connect & Streams - the Ecosystem around Kafka
  • 27. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position console consumer Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 28. Demo (III) – Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883" } }' Kafka Connect & Streams - the Ecosystem around Kafka
  • 29. Demo (III) – Call REST API and Kafka Console Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  • 30. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position console consumer what about some analytics ? Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 31. Kafka Streams Kafka Connect & Streams - the Ecosystem around Kafka
  • 32. Kafka Streams - Overview • Designed as a simple and lightweight library in Apache Kafka • no external dependencies on systems other than Apache Kafka • Part of open source Apache Kafka, introduced in 0.10+ • Leverages Kafka as its internal messaging layer • Supports fault-tolerant local state • Event-at-a-time processing (not microbatch) with millisecond latency • Windowing with out-of-order data using a Google DataFlow-like model Kafka Connect & Streams - the Ecosystem around Kafka
  • 33. Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 34. Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream("in-1"); KStream<Integer, String> stream2= builder.stream("in-2"); KStream<Integer, String> joined = stream1.leftJoin(stream2, …); KTable<> aggregated = joined.groupBy(…).count("store"); aggregated.to("out-1"); 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 35. Kafka Streams Cluster Processor Topology Kafka Cluster input-1 input-2 store (changelog) output 1 2 lj a t State Kafka Connect & Streams - the Ecosystem around Kafka
  • 36. Kafka Cluster Processor Topology input-1 Partition 0 Partition 1 Partition 2 Partition 3 input-2 Partition 0 Partition 1 Partition 2 Partition 3 Kafka Streams 1 Kafka Streams 2 Kafka Connect & Streams - the Ecosystem around Kafka
  • 37. Kafka Cluster Processor Topology input-1 Partition 0 Partition 1 Partition 2 Partition 3 input-2 Partition 0 Partition 1 Partition 2 Partition 3 Kafka Streams 1 Kafka Streams 2 Kafka Streams 3 Kafka Streams 4 Kafka Connect & Streams - the Ecosystem around Kafka
  • 38. Stream-Table Duality We can view a table as a stream We can view a stream as a table A stream can be considered a changelog of a table, where each data record in the stream captures a state change of the table A table can be considered a snapshot of the latest value for each key in a stream Source: Confluent
  • 39. Stream vs. Table Event Stream State Stream (Change Log Stream) 2017-10-02T20:18:46 11,Normal,41.87,-87.67 2017-10-02T20:18:55 11,Normal,40.38,-89.17 2017-10-02T20:18:59 21,Normal,42.23,-91.78 2017-10-02T20:19:01 21,Normal,41.71,-91.32 2017-10-02T20:19:02 11,Normal,38.65,-90.2 2017-10-02T20:19:23 21,Normal41.71,-91.32 11 2017-10-02T20:18:46,11,Normal,41.87,-87.67 11 2017-10-02T20:18:55,11,Normal,40.38,-89.17 21 2017-10-02T20:18:59, 21,Normal,42.23,-91.78 21 2017-10-02T20:19:01,21,Normal,41.71,-91.32 11 2017-10-02T20:19:02,11,Normal,38.65,-90.2 21 2017-10-02T20:19:23,21,Normal41.71,-91.32 Kafka Connect & Streams - the Ecosystem around Kafka KStream KTable
  • 40. Kafka Streams: Key Features Kafka Connect & Streams - the Ecosystem around Kafka • Native, 100%-compatible Kafka integration • Secure stream processing using Kafka's security features • Elastic and highly scalable • Fault-tolerant • Stateful and stateless computations • Interactive queries • Time model • Windowing • Supports late-arriving and out-of-order data • Millisecond processing latency, no micro-batching • At-least-once and exactly-once processing guarantees
  • 41. Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_danger ous_driving dangerous_ driving console consumer Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 42. Demo (IV) - Create Stream final KStreamBuilder builder = new KStreamBuilder(); KStream<String, String> source = builder.stream(stringSerde, stringSerde, "truck_position"); KStream<String, TruckPosition> positions = source.map((key,value) -> new KeyValue<>(key, TruckPosition.create(key,value))); KStream<String, TruckPosition> filtered = positions.filter(TruckPosition::filterNonNORMAL); filtered.map((key,value) -> new KeyValue<>(key,value.toCSV())) .to("dangerous_driving"); Kafka Connect & Streams - the Ecosystem around Kafka
  • 43. KSQL Kafka Connect & Streams - the Ecosystem around Kafka
  • 44. KSQL: a Streaming SQL Engine for Apache Kafka • Enables stream processing with zero coding required • The simples way to process streams of data in real-time • Powered by Kafka and Kafka Streams: scalable, distributed, mature • All you need is Kafka – no complex deployments • available as Developer preview! • STREAM and TABLE as first-class citizens • STREAM = data in motion • TABLE = collected state of a stream • join STREAM and TABLE Kafka Connect & Streams - the Ecosystem around Kafka
  • 45. KSQL Deployment Models Standalone Mode Cluster Mode Source: Confluent Kafka Connect & Streams - the Ecosystem around Kafka
  • 46. Demo (IV) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position_s detect_danger ous_driving dangerous_ driving console consumer Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 47. Demo (IV) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ __ _____ ____ _ = = | |/ // ____|/ __ | | = = | ' /| (___ | | | | | = = | < ___ | | | | | = = | . ____) | |__| | |____ = = |_|______/ __________| = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at http://localhost:9098 Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql> Kafka Connect & Streams - the Ecosystem around Kafka
  • 48. Demo (IV) - Create Stream ksql> CREATE STREAM truck_position_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_position', value_format='DELIMITED'); Message ---------------- Stream created Kafka Connect & Streams - the Ecosystem around Kafka
  • 49. Demo (IV) - Create Stream ksql> SELECT * FROM truck_position_s; 1522847870317 | "truck/13/position0 | 1522847870310 | 44 | 13 | 1390372503 | Normal | 41.71 | -91.32 | -2458274393837068406 1522847870376 | "truck/14/position0 | 1522847870370 | 35 | 14 | 1961634315 | Normal | 37.66 | -94.3 | -2458274393837068406 1522847870418 | "truck/21/position0 | 1522847870410 | 58 | 21 | 137128276 | Normal | 36.17 | -95.99 | -2458274393837068406 1522847870397 | "truck/29/position0 | 1522847870390 | 18 | 29 | 1090292248 | Normal | 41.67 | -91.24 | -2458274393837068406 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1522847914246 | "truck/11/position0 | 1522847914240 | 54 | 11 | 1198242881 | Lane Departure | 40.86 | -89.91 | -2458274393837068406 1522847915125 | "truck/10/position0 | 1522847915120 | 93 | 10 | 1384345811 | Overspeed | 40.38 | -89.17 | -2458274393837068406 1522847919216 | "truck/12/position0 | 1522847919210 | 75 | 12 | 24929475 | Overspeed | 42.23 | -91.78 | -2458274393837068406 Kafka Connect & Streams - the Ecosystem around Kafka
  • 50. Demo (IV) - Create Stream ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING) Kafka Connect & Streams - the Ecosystem around Kafka
  • 51. Demo (IV) - Create Stream ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_s; 1522848286143 | "truck/15/position0 | 1522848286125 | 98 | 15 | 987179512 | Overspeed | 34.78 | -92.31 | -2458274393837068406 1522848295729 | "truck/11/position0 | 1522848295720 | 54 | 11 | 1198242881 | Unsafe following distance | 38.43 | -90.35 | -2458274393837068406 1522848313018 | "truck/11/position0 | 1522848313000 | 54 | 11 | 1198242881 | Overspeed | 41.87 | -87.67 | -2458274393837068406 Kafka Connect & Streams - the Ecosystem around Kafka
  • 52. Demo (V) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc- source trucking_ driver join_dangerou s_driving_driv er dangerous_dri ving_driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10-02 15:19:00 console consumer {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012} Kafka Connect & Streams - the Ecosystem around Kafka 1522846456703,101,31,1927624662,Normal,37.31,- 94.31,-4802309397906690837
  • 53. Demo (V) – Create JDBC Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"trucking_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }' Kafka Connect & Streams - the Ecosystem around Kafka
  • 54. Demo (V) – Create JDBC Connect through REST API Kafka Connect & Streams - the Ecosystem around Kafka
  • 55. Demo (V) - Create Table with Driver State Kafka Connect & Streams - the Ecosystem around Kafka ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='trucking_driver', value_format='JSON', key='id'); Message ---------------- Table created
  • 56. Demo (V) - Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON') AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1511173352906 | 21 | 21 | Lila | Page | 58 | 1594289134 | Unsafe tail distance 1511173353669 | 12 | 12 | Laurence | Lindsey | 93 | 1384345811 | Lane Departure 1511173435385 | 11 | 11 | Micky | Isaacson | 22 | 1198242881 | Unsafe tail distance Kafka Connect & Streams - the Ecosystem around Kafka
  • 57. Kafka and "Big Data" / "Fast Data" Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
  • 58. Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks • Apache Spark Streaming • Apache Flink • Apache Storm • Apache Apex • Apache NiFi • StreamSets • Oracle Stream Analytics • Oracle Service Bus • Oracle GoldenGate • Oracle Event Hub Cloud Service • Debezium CDC • … Additional Info: https://cwiki.apache.org/confluence/display/KAFKA/Ecosystem Kafka Connect & Streams - the Ecosystem around Kafka
  • 59. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
  • 60. Hadoop Clusterd Hadoop Cluster Big Data Cluster Traditional Big Data Architecture BI Tools Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search SQL Export Service NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing Kafka Connect & Streams - the Ecosystem around Kafka
  • 61. Event Hub Event Hub Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – handle event stream data BI Tools Enterprise Data Warehouse Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Event Hub Call Center Weather Data Mobile Apps File Import / SQL Import SQL Search / Explore Online & Mobile Apps Search SQL Export ServiceData Flow Data Flow NoSQL Parallel Batch Processing Distributed Filesystem • Machine Learning • Graph Algorithms • Natural Language Processing C hange D ata C apture Kafka Connect & Streams - the Ecosystem around Kafka
  • 62. Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – taking Velocity into account Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Batch Analytics Streaming Analytics Results Parallel Batch Processing Distributed Filesystem Stream Analytics NoSQL Reference / Models SQL Search SQL Export Service Dashboard BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data Event Hub Event Hub Event Hub Data Flow Data Flow Data Flow Change Data Capture Kafka Connect & Streams - the Ecosystem around Kafka
  • 63. Container Hadoop Clusterd Hadoop Cluster Big Data Cluster Event Hub – Asynchronous Microservice Architecture Location Social Click stream Sensor Data Billing & Ordering CRM / Profile Marketing Campaigns Call Center Mobile Apps Parallel Batch ProcessingDistributed Filesystem Microservice NoSQLRDBMS SQL Search SQL Export Service BI Tools Enterprise Data Warehouse Search / Explore Online & Mobile Apps File Import / SQL Import Weather Data { } API Event Hub Event Hub Event Hub Data Flow Data Flow Data Flow Change Data Capture Kafka Connect & Streams - the Ecosystem around Kafka
  • 64. Kafka Connect & Streams - the Ecosystem around Kafka Technology on its own won't help you. You need to know how to use it properly.