SlideShare a Scribd company logo
1
Processing IoT Data with
Apache Kafka
Matt Howlett
Confluent Inc.
2
Pub Sub
Messaging Protocol
Pub Sub
Messaging System
(rethought as a distributed commit log)
Distributed Streaming Platform
● Pub Sub Messaging
● Event Storage
● Processing Framework
3
OBD-II Adapters
4
Problem Statement
Let’s build a system to:
• Transport OBD-II data over unreliable links from cars to the data center
• Capable of handling millions of devices*
• Extract information from + respond to this data in (near) real time (at scale)
• Handle surges in usage
• Potential for ad-hoc historical processing
* also less
Architecture / technology / methods applicable to many scenarios.
5
Publish / subscribe messaging protocol:
• Built on top of TCP/IP
• Features that make it well suited to poor connectivity / high latency scenarios
• Lightweight
• Efficient client implementations, low network overhead
• MQTT-SN for non IP networks (’virtual connections’)
• Many (open source) broker implementations
• Mosquitto, RabbitMQ, HiveMQ, VerneMQ
• Many Client Libraries
• C, C++, Java, C#, Python, Javascript, websockets, Arduino …
• Widely used (incl. phone apps!)
• Oil pipeline sensor via satellite link
• Facebook Messenger
• AWS IoT
MQTT Introduction
6
• Simple API
• Hierarchical topics
• myhome/kitchen/door/front/battery/level
• wildcard subscription: myhome/*/door/*/battery/level
• 3 qualities of service (on both produce and consume)
• At most once (QoS 0)
• At least once (QoS 1)
• Exactly once (QoS 2) [not universally supported]
• Persistent consumer sessions
• Important for QoS 1, QoS 2
• Last will and testament
• Last known good value
• Authorization, SSL/TLS
MQTT Features
7
• Device Id
• GPS Location [lon, lat]
• Ignition on / off
• Speedometer reading
• Timestamp
• …plus a lot more
Assume: data sent via 3G wireless connection at ~30 second interval
OBD-II Data
8
Deficiencies:
• Single MQTT server can handle maybe ~100K
connections
• Can’t handle usage surges (no buffering)
• No storage of events or reprocess capability
MQTT
Server 1
Processor 1 Processor 2 ...
Ingest Architecture V1
topic: [deviceid]/obd
9
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
http / REST
...
• Easily Shardable
• Treat MQTT server as
commodity service
Ingest Architecture V2
10
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
MQTT
Server 3
MQTT
Server 4
topic: [deviceid]/obd
Kafka Connect
OBD_Data
Stream
processing
kafka
OBD -> MQTT -> Kafka
11
Apache Kafka
Distributed Streaming Platform:
• Pub Sub Messaging
• (typically clients are within data-center)
• Data Store
• Messages not deleted after delivery
• Stream Processing
• Low or high level libraries
• Data re-processing
12
Apache Kafka adoption spans
companies across industries.
13
● Persisted
● Append only
● Immutable
● Delete earliest data based on time / size / never
14
• Allows topics to scale past constraints
of single server
• Message → partition_id deterministic.
Partition relevant to application.
• Ordering guarantees per partition but
not across partitions
15
Apache Kafka Replication
• cheap durability!
• choose # acks for
message produced
confirmation
16
Apache Kafka Consumer Groups
partitions possibly across different brokers
17
Kafka Connect
• Use client library producers / consumers in custom applications.
• Often want to bulk transfer data between standard systems:
• Don’t re-invent the wheel – configure Kafka Connect
• Narrow scope: move data into & out of Kafka
• Off-the-shelf connectors
• Fault Tolerant
• Auto-balances load
• Pluggable Serialization
• Standalone and distributed modes of operation
• Configuration / management via REST API
18
19
MQTT Connector
https://github.com/evokly/kafka-connect-mqtt
• Single Task
• Single MQTT Broker
• Source only
Either:
• Start a bunch of these connectors (in one connect cluster), one per server, or:
• Implement a new multi-task connector, one task per MQTT broker.
• Communicate with MQTT Controller
20
• user_id
• device_id
• name
• address
• phone_number
• speed_alert_level
• ...
SQL Db
User_Info
User Data
21
Example: Car Towed Alert
Detect movement of car when ignition off, send SMS alert
kafka
OBD_Data P1
OBD_Data P5
Consumer 1
Consumer 2
Broker 1
...
OBD_Data P3
OBD_Data P7
Broker 2
...
...
...
SMS Gateway
Last loc. in mem
KV store
Last loc. in mem
KV store
User Info
22
Consumer Implementation
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
if (obd_data.ignition_on)
return;
if (!kv_store.contains(device_id)) {
kv_store.add(device_id, obd_data.lon_lat);
return;
}
var prev_lon_lat = kv_store.get(device_id);
var dist = calc_dist(obd_data.lon_lat, prev_lon_lat);
kv_store.set(device_id, obd_data.lon_lat);
if (dist > alert_max_dist) {
// infrequent
send_alert(SQL.get_phone_number(device_id));
}
}
• Message can be from any partition
assigned to this consumer
• Ordering guaranteed per partition, but
not predictable across partitions
• All messages from a particular device
guaranteed to arrive at the same
consumer instance
23
Example: Speed Alert
• Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a
specified speed.
• In the Tow Alert example User_Info only needs to be queried in the event of an alert.
• In this example, the table needs to be queried for every OBD data record in every partition.
OBD_data
[can update
at any time]
User Info
table
Not scalable! Cache?
...
Highfrequency
P1
24
Time = 0 1 60 {device_id=1, speed_limit=60}
Time = 1 1 60 {device_id=2, speed_limit=80}
2 80
Time = 2 1 60 {device_id=3, speed_limit=70}
2 80
3 70
Time = 3 1 80 {device_id=1, speed_limit=80}
2 80
3 70
Time = 4 1 80 {device_id=1, speed_limit=65}
2 80
3 70
Table can be represented as stream of updates
device_id speed_limit
Log compaction!
25
Debezium
Kafka Connector that turns database tables into streams of update records.
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
User Info
[key: userId]
User_Info
[changelog topic]Partition by device_id
26
Stream / Table Join
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
...
Consumer 1
Relevant subset of
User_Info
device_id speed_limit
1 80
3 70
User_Info
[ChangeLog, compacted]
OBD_Data
[Record Stream]
...
debezium
key:device_id
key:device_id
27
Speed Alert: Message handler
on_message(message m)
{
var device_id = m.key;
var obd_data = m.value;
var user_info = user_info_local.get(device_id);
if (obd_data.speedometer > user_info.max_speed) {
alert_user(device_id, user_info);
}
}
28
MQTT Phone Client Connectivity
MQTT
Server
Coordinator
MQTT
Server 1
MQTT
Server 2
[deviceid]/alert
...
Consumer 1 ...
MQTT
Server 3
...
[deviceid]/obd
29
Speed Limit Alert: Rate limiting
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
app_state kafka topic
• Prefer to rate limit on server to minimize network overhead.
• Create new Kafka topic app_state, partitioned on
device_id.
• When alert triggered, store alert time in this topic.
• [can use this topic as general store for other per device
state info too]
• Materialize this change-log stream on consumers as
necessary.
30
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
Partition 7
...
Partition 1
Partition 2
Partition 3
...
Consumer 1
Relevant
subset of
User_Info
...
OBD_Data
[Record Stream]
User_Info
[ChangeLog, compacted]
Partition 4
Partition 1
Partition 2
Partition 3
...
Partition 4
App_State
[compacted]
Relevant
subset of
App_State
31
Example: Location Based Special Offers
When Car enters specific region, send available special offers to the user’s phone.
Require:
• User_Info
• Address – so we know whether they are local to their current location or not
• App_state
• Use to persist already sent offers
• Special_Offer_Info
• Table that store list of all special offers.
32
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31 32 33 34 35
36 37 38 39 40 41 42
Regions
• Regions may be simple (as depicted
here) or complex
• F(lon, lat) -> locationId.
• Note: could also implement ride—share
surge pricing using similar partitioning.
33
Special Offer Change-log Stream
debezium
Partition 1
Partition 2
Partition 3
Partition 4
Partition 5
Partition 6
...
MySQL
Special Offer
Info
Special_Offers
[changelog,
compacted]
Partition by location_id
34
Multi-stage Data Pipeline
OBD_Data App_State
[offers already sent]
User_Info
[address]
K: device_id
V: OBD record
consume enrich
K: device_id
V: OBD record
address
K: device_id
V: OBD record
Address
offers_sent
enrich
35
Multi-stage Data Pipeline (continued)
K: [device_id]
V: OBD record
Address
offers_sent
K: location_id
V: OBD record
Address
offers_sent
OBD_Data_By_Location
P1
……
…
Repartition by location_id
P2
P1
P3
Data from given device will still all be on the same partition
(except when region changes)
36
Multi-stage Data Pipeline (continued)
K: location_id
V: OBD record
Address
offers_sent
Special_Offers
K: location_id
V: OBD record
address
offers_sent
available_offers
re-partition
enrich
37
Multi-stage Data Pipeline (continued)
Special offer available in
location
Special offer not already
sent
User address near location?
MQTT
Server
filter
filter
filter
...
[deviceId]/alert
38
39
40
Discount code: kafcom17
Use the Apache Kafka community discount code to get $50 off
www.kafka-summit.org
Kafka Summit New York: May 8
Kafka Summit San Francisco: August 28
Presented by
41
Thank You
@matt_howlett
@confluentinc

More Related Content

What's hot (16)

PPTX
Protocols for internet of things
Charles Gibbons
 
PPTX
Osiot14 buildout
Michael Koster
 
PPTX
OpenContrail Silicon Valley Meetup Aug 25 2015
Scott Sneddon
 
PDF
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
Dominik Obermaier
 
PPTX
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi
 
PPTX
Functional reactive programming
Araf Karsh Hamid
 
PPTX
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
Matthias Kovatsch
 
PPTX
Microservices Part 4: Functional Reactive Programming
Araf Karsh Hamid
 
PPTX
Choosing the right platform for your Internet -of-Things solution
IBM_Info_Management
 
PPTX
Cisco OpenSOC
James Sirota
 
PDF
StarlingX - Project Onboarding
Shuquan Huang
 
PDF
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
IBM_Info_Management
 
PDF
Open Source Bristol 30 March 2022
Timothy Spann
 
PPT
Standards Drive the Internet of Things
zdshelby
 
PDF
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
PPTX
OpenDaylight VTN Policy
NEC Corporation
 
Protocols for internet of things
Charles Gibbons
 
Osiot14 buildout
Michael Koster
 
OpenContrail Silicon Valley Meetup Aug 25 2015
Scott Sneddon
 
In search of the perfect IoT Stack - Scalable IoT Architectures with MQTT
Dominik Obermaier
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Slim Baltagi
 
Functional reactive programming
Araf Karsh Hamid
 
CoAP for the Web of Things: From Tiny Resource-constrained Devices to the W...
Matthias Kovatsch
 
Microservices Part 4: Functional Reactive Programming
Araf Karsh Hamid
 
Choosing the right platform for your Internet -of-Things solution
IBM_Info_Management
 
Cisco OpenSOC
James Sirota
 
StarlingX - Project Onboarding
Shuquan Huang
 
Informix on ARM and informix Timeseries - producing an Internet-of-Things sol...
IBM_Info_Management
 
Open Source Bristol 30 March 2022
Timothy Spann
 
Standards Drive the Internet of Things
zdshelby
 
Protect your Private Data in your Hadoop Clusters with ORC Column Encryption
DataWorks Summit
 
OpenDaylight VTN Policy
NEC Corporation
 

Viewers also liked (7)

PPTX
Kafka Intro With Simple Java Producer Consumers
Jean-Paul Azar
 
PPTX
MapR Streams and MapR Converged Data Platform
MapR Technologies
 
PPTX
Avro Tutorial - Records with Schema for Kafka and Hadoop
Jean-Paul Azar
 
PPTX
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
PPTX
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
PDF
MapR Data Analyst
selvaraaju
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Kafka Intro With Simple Java Producer Consumers
Jean-Paul Azar
 
MapR Streams and MapR Converged Data Platform
MapR Technologies
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Jean-Paul Azar
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
MapR Data Analyst
selvaraaju
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Processing IoT Data with Apache Kafka (20)

PDF
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
Peter Broadhurst
 
PDF
Network-Connected Development with ZeroMQ
ICS
 
PDF
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
PPTX
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
PPTX
ADAM-3600 Sales kit_WATER.pptx
CADALTAINGENIERIASRL
 
PDF
IzoT platform presentation
Echelon Corporation
 
PDF
Thingsboard IoT Platform - A Quick Tour
TechYugadi IT Solutions & Consulting
 
PPTX
The Art of Displaying Industrial Data
Inductive Automation
 
PDF
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
PDF
Open source building blocks for the Internet of Things - Jfokus 2013
Benjamin Cabé
 
PDF
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
mfrancis
 
PPTX
Fiware: Connecting to robots
Jaime Martin Losa
 
PPTX
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
VirtualTech Japan Inc.
 
PPTX
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 
PDF
Machine to Machine Communication with Microsoft Azure IoT Edge & HiveMQ
HiveMQ
 
PDF
Powering your next IoT application with MQTT - JavaOne 2014 tutorial
Benjamin Cabé
 
ODP
Zero Downtime JEE Architectures
Alexander Penev
 
PDF
From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...
Rick G. Garibay
 
PDF
Autopilot : Securing Cloud Native Storage
SF Bay Cloud Native Open Infra Meetup
 
PPTX
Monitoring klassisch oder Cloud
ConSol Consulting & Solutions Software GmbH
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
Peter Broadhurst
 
Network-Connected Development with ZeroMQ
ICS
 
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
ADAM-3600 Sales kit_WATER.pptx
CADALTAINGENIERIASRL
 
IzoT platform presentation
Echelon Corporation
 
Thingsboard IoT Platform - A Quick Tour
TechYugadi IT Solutions & Consulting
 
The Art of Displaying Industrial Data
Inductive Automation
 
Using Kubernetes to make cellular data plans cheaper for 50M users
Mirantis
 
Open source building blocks for the Internet of Things - Jfokus 2013
Benjamin Cabé
 
Data Capture in IBM WebSphere Premises Server - Aldo Eisma, IBM
mfrancis
 
Fiware: Connecting to robots
Jaime Martin Losa
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
VirtualTech Japan Inc.
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
Apache Apex
 
Machine to Machine Communication with Microsoft Azure IoT Edge & HiveMQ
HiveMQ
 
Powering your next IoT application with MQTT - JavaOne 2014 tutorial
Benjamin Cabé
 
Zero Downtime JEE Architectures
Alexander Penev
 
From the Internet of Things to Intelligent Systems A Developer's Primer - Gar...
Rick G. Garibay
 
Autopilot : Securing Cloud Native Storage
SF Bay Cloud Native Open Infra Meetup
 
Monitoring klassisch oder Cloud
ConSol Consulting & Solutions Software GmbH
 
Ad

Recently uploaded (20)

PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PPTX
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
DOCX
Import Data Form Excel to Tally Services
Tally xperts
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
PDF
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
PPTX
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
How Odoo Became a Game-Changer for an IT Company in Manufacturing ERP
SatishKumar2651
 
Import Data Form Excel to Tally Services
Tally xperts
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Human Resources Information System (HRIS)
Amity University, Patna
 
A Complete Guide to Salesforce SMS Integrations Build Scalable Messaging With...
360 SMS APP
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Mobile CMMS Solutions Empowering the Frontline Workforce
CryotosCMMSSoftware
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Fundamentals_of_Microservices_Architecture.pptx
MuhammadUzair504018
 
Perfecting XM Cloud for Multisite Setup.pptx
Ahmed Okour
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 

Processing IoT Data with Apache Kafka

  • 1. 1 Processing IoT Data with Apache Kafka Matt Howlett Confluent Inc.
  • 2. 2 Pub Sub Messaging Protocol Pub Sub Messaging System (rethought as a distributed commit log) Distributed Streaming Platform ● Pub Sub Messaging ● Event Storage ● Processing Framework
  • 4. 4 Problem Statement Let’s build a system to: • Transport OBD-II data over unreliable links from cars to the data center • Capable of handling millions of devices* • Extract information from + respond to this data in (near) real time (at scale) • Handle surges in usage • Potential for ad-hoc historical processing * also less Architecture / technology / methods applicable to many scenarios.
  • 5. 5 Publish / subscribe messaging protocol: • Built on top of TCP/IP • Features that make it well suited to poor connectivity / high latency scenarios • Lightweight • Efficient client implementations, low network overhead • MQTT-SN for non IP networks (’virtual connections’) • Many (open source) broker implementations • Mosquitto, RabbitMQ, HiveMQ, VerneMQ • Many Client Libraries • C, C++, Java, C#, Python, Javascript, websockets, Arduino … • Widely used (incl. phone apps!) • Oil pipeline sensor via satellite link • Facebook Messenger • AWS IoT MQTT Introduction
  • 6. 6 • Simple API • Hierarchical topics • myhome/kitchen/door/front/battery/level • wildcard subscription: myhome/*/door/*/battery/level • 3 qualities of service (on both produce and consume) • At most once (QoS 0) • At least once (QoS 1) • Exactly once (QoS 2) [not universally supported] • Persistent consumer sessions • Important for QoS 1, QoS 2 • Last will and testament • Last known good value • Authorization, SSL/TLS MQTT Features
  • 7. 7 • Device Id • GPS Location [lon, lat] • Ignition on / off • Speedometer reading • Timestamp • …plus a lot more Assume: data sent via 3G wireless connection at ~30 second interval OBD-II Data
  • 8. 8 Deficiencies: • Single MQTT server can handle maybe ~100K connections • Can’t handle usage surges (no buffering) • No storage of events or reprocess capability MQTT Server 1 Processor 1 Processor 2 ... Ingest Architecture V1 topic: [deviceid]/obd
  • 9. 9 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd http / REST ... • Easily Shardable • Treat MQTT server as commodity service Ingest Architecture V2
  • 10. 10 MQTT Server Coordinator MQTT Server 1 MQTT Server 2 MQTT Server 3 MQTT Server 4 topic: [deviceid]/obd Kafka Connect OBD_Data Stream processing kafka OBD -> MQTT -> Kafka
  • 11. 11 Apache Kafka Distributed Streaming Platform: • Pub Sub Messaging • (typically clients are within data-center) • Data Store • Messages not deleted after delivery • Stream Processing • Low or high level libraries • Data re-processing
  • 12. 12 Apache Kafka adoption spans companies across industries.
  • 13. 13 ● Persisted ● Append only ● Immutable ● Delete earliest data based on time / size / never
  • 14. 14 • Allows topics to scale past constraints of single server • Message → partition_id deterministic. Partition relevant to application. • Ordering guarantees per partition but not across partitions
  • 15. 15 Apache Kafka Replication • cheap durability! • choose # acks for message produced confirmation
  • 16. 16 Apache Kafka Consumer Groups partitions possibly across different brokers
  • 17. 17 Kafka Connect • Use client library producers / consumers in custom applications. • Often want to bulk transfer data between standard systems: • Don’t re-invent the wheel – configure Kafka Connect • Narrow scope: move data into & out of Kafka • Off-the-shelf connectors • Fault Tolerant • Auto-balances load • Pluggable Serialization • Standalone and distributed modes of operation • Configuration / management via REST API
  • 18. 18
  • 19. 19 MQTT Connector https://github.com/evokly/kafka-connect-mqtt • Single Task • Single MQTT Broker • Source only Either: • Start a bunch of these connectors (in one connect cluster), one per server, or: • Implement a new multi-task connector, one task per MQTT broker. • Communicate with MQTT Controller
  • 20. 20 • user_id • device_id • name • address • phone_number • speed_alert_level • ... SQL Db User_Info User Data
  • 21. 21 Example: Car Towed Alert Detect movement of car when ignition off, send SMS alert kafka OBD_Data P1 OBD_Data P5 Consumer 1 Consumer 2 Broker 1 ... OBD_Data P3 OBD_Data P7 Broker 2 ... ... ... SMS Gateway Last loc. in mem KV store Last loc. in mem KV store User Info
  • 22. 22 Consumer Implementation on_message(message m) { var device_id = m.key; var obd_data = m.value; if (obd_data.ignition_on) return; if (!kv_store.contains(device_id)) { kv_store.add(device_id, obd_data.lon_lat); return; } var prev_lon_lat = kv_store.get(device_id); var dist = calc_dist(obd_data.lon_lat, prev_lon_lat); kv_store.set(device_id, obd_data.lon_lat); if (dist > alert_max_dist) { // infrequent send_alert(SQL.get_phone_number(device_id)); } } • Message can be from any partition assigned to this consumer • Ordering guaranteed per partition, but not predictable across partitions • All messages from a particular device guaranteed to arrive at the same consumer instance
  • 23. 23 Example: Speed Alert • Scenario: Parent wants to monitor son/daughter driving and be alerted if they exceed a specified speed. • In the Tow Alert example User_Info only needs to be queried in the event of an alert. • In this example, the table needs to be queried for every OBD data record in every partition. OBD_data [can update at any time] User Info table Not scalable! Cache? ... Highfrequency P1
  • 24. 24 Time = 0 1 60 {device_id=1, speed_limit=60} Time = 1 1 60 {device_id=2, speed_limit=80} 2 80 Time = 2 1 60 {device_id=3, speed_limit=70} 2 80 3 70 Time = 3 1 80 {device_id=1, speed_limit=80} 2 80 3 70 Time = 4 1 80 {device_id=1, speed_limit=65} 2 80 3 70 Table can be represented as stream of updates device_id speed_limit Log compaction!
  • 25. 25 Debezium Kafka Connector that turns database tables into streams of update records. debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL User Info [key: userId] User_Info [changelog topic]Partition by device_id
  • 26. 26 Stream / Table Join Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 ... Consumer 1 Relevant subset of User_Info device_id speed_limit 1 80 3 70 User_Info [ChangeLog, compacted] OBD_Data [Record Stream] ... debezium key:device_id key:device_id
  • 27. 27 Speed Alert: Message handler on_message(message m) { var device_id = m.key; var obd_data = m.value; var user_info = user_info_local.get(device_id); if (obd_data.speedometer > user_info.max_speed) { alert_user(device_id, user_info); } }
  • 28. 28 MQTT Phone Client Connectivity MQTT Server Coordinator MQTT Server 1 MQTT Server 2 [deviceid]/alert ... Consumer 1 ... MQTT Server 3 ... [deviceid]/obd
  • 29. 29 Speed Limit Alert: Rate limiting Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... app_state kafka topic • Prefer to rate limit on server to minimize network overhead. • Create new Kafka topic app_state, partitioned on device_id. • When alert triggered, store alert time in this topic. • [can use this topic as general store for other per device state info too] • Materialize this change-log stream on consumers as necessary.
  • 30. 30 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 7 ... Partition 1 Partition 2 Partition 3 ... Consumer 1 Relevant subset of User_Info ... OBD_Data [Record Stream] User_Info [ChangeLog, compacted] Partition 4 Partition 1 Partition 2 Partition 3 ... Partition 4 App_State [compacted] Relevant subset of App_State
  • 31. 31 Example: Location Based Special Offers When Car enters specific region, send available special offers to the user’s phone. Require: • User_Info • Address – so we know whether they are local to their current location or not • App_state • Use to persist already sent offers • Special_Offer_Info • Table that store list of all special offers.
  • 32. 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Regions • Regions may be simple (as depicted here) or complex • F(lon, lat) -> locationId. • Note: could also implement ride—share surge pricing using similar partitioning.
  • 33. 33 Special Offer Change-log Stream debezium Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 ... MySQL Special Offer Info Special_Offers [changelog, compacted] Partition by location_id
  • 34. 34 Multi-stage Data Pipeline OBD_Data App_State [offers already sent] User_Info [address] K: device_id V: OBD record consume enrich K: device_id V: OBD record address K: device_id V: OBD record Address offers_sent enrich
  • 35. 35 Multi-stage Data Pipeline (continued) K: [device_id] V: OBD record Address offers_sent K: location_id V: OBD record Address offers_sent OBD_Data_By_Location P1 …… … Repartition by location_id P2 P1 P3 Data from given device will still all be on the same partition (except when region changes)
  • 36. 36 Multi-stage Data Pipeline (continued) K: location_id V: OBD record Address offers_sent Special_Offers K: location_id V: OBD record address offers_sent available_offers re-partition enrich
  • 37. 37 Multi-stage Data Pipeline (continued) Special offer available in location Special offer not already sent User address near location? MQTT Server filter filter filter ... [deviceId]/alert
  • 38. 38
  • 39. 39
  • 40. 40 Discount code: kafcom17 Use the Apache Kafka community discount code to get $50 off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by