SlideShare a Scribd company logo
Stream Processing Live
Traffic Data with Kafka
Streams
Tim Ysewyn
Solutions Architect
@ Pivotal
Spring & Spring Cloud
Contributor
@TYsewyn
Who are we
Tom Van den Bulck
Principal Java
Software Engineer
@ Ordina
Competence Leader
Fast & Big Data
@tomvdbulck
Setup Environment
http://bit.ly/docker-kafka
http://bit.ly/Spring-Cloud-Stream-Workshop
What
http://bit.ly/Spring-Cloud-Stream-Workshop
What
http://bit.ly/Spring-Cloud-Stream-Workshop
What: Event
● Data it owns
● Data it needs
● References data
What: Streaming
● Reacts on events
● Continuously
Why
● Much shorter feedback loop
● More resource efficient
● Stream processing feels more natural
● Decentralize and decouple infrastructure
The Data
The Data
● Every minute XML is generated
○ So it is not the raw data
● Be aware:
○ Dutch words
The Data
● XML with fixed sensor data
○ <meetpunt unieke_id="3640">
<beschrijvende_id>H291L10</beschrijvende_id>
<volledige_naam>Parking Kruibeke</volledige_naam>
<Ident_8>A0140002</Ident_8>
<lve_nr>437</lve_nr>
<Kmp_Rsys>94,695</Kmp_Rsys>
<Rijstrook>R10</Rijstrook>
<X_coord_EPSG_31370>144477,0917</X_coord_EPSG_31370>
<Y_coord_EPSG_31370>208290,6237</Y_coord_EPSG_31370>
<lengtegraad_EPSG_4326>4,289767347</lengtegraad_EPSG_4326>
<breedtegraad_EPSG_4326>51,18458196</breedtegraad_EPSG_4326>
</meetpunt>
The Data
● XML with dynamic traffic data
○ <meetpunt beschrijvende_id="H222L10" unieke_id="29">
<lve_nr>55</lve_nr>
<tijd_waarneming>2018-11-03T14:43:00+01:00</tijd_waarneming>
<tijd_laatst_gewijzigd>2018-11-03T14:44:24+01:00</tijd_laatst_gewijzigd>
<actueel_publicatie>1</actueel_publicatie>
<beschikbaar>1</beschikbaar>
The Data
● XML with dynamic traffic data
○ <meetdata klasse_id="4">
<verkeersintensiteit>2</verkeersintensiteit>
<voertuigsnelheid_rekenkundig>60</voertuigsnelheid_rekenkundig>
<voertuigsnelheid_harmonisch>59</voertuigsnelheid_harmonisch>
</meetdata>
The Data
● XML with dynamic traffic data
○ /*
Note: the vehicle class MOTO(1),
does not provide reliable data.
*/
MOTO(1),
CAR(2),
CAMIONET(3), // a VAN
RIGGID_LORRIES(4),
TRUCK_OR_BUS(5),
UNKNOWN(0);
The Data
● XML with dynamic traffic data
○ <meetdata klasse_id="3">
<verkeersintensiteit>0</verkeersintensiteit>
<voertuigsnelheid_rekenkundig>0</voertuigsnelheid_rekenkundig>
<voertuigsnelheid_harmonisch>252</voertuigsnelheid_harmonisch>
</meetdata>
The Data
● Do not worry
● We translated it to simplified POJO
● TrafficEvent.java
The Data: Some Lessons
● Think about the language
● Think about the values you are going to output
○ 252 when no readings
○ 254 when an error occurred
How
How
Lab 1: Send events to Kafka - Imperative
● Dependencies
○ spring-cloud-starter-stream-kafka
● Added @EnableBinding
● Properties:
○ spring.cloud.stream.bindings.output.destination=traffic-data
● Added @Scheduling
Lab 1: Send events to Kafka
● Don’t use @Scheduling for use cases like this in production
○ Bad practice, use batch jobs: eg. Spring Cloud Task or K8s
CronJob!
Lab 2: Intake of data from Kafka
● @EnableBinding
● @StreamListener(Sink.INPUT)
● Properties:
○ spring.cloud.stream.bindings.input.destination=traffic-data
Native streaming: KStream
Native streaming: KTable
Native streaming operations: toStream
Native streaming operations: Stateless
● No need of a state store for these operations
Native streaming operations: filter
Native streaming operations: map
Native streaming operations: flatMap
Native streaming operations: peek
Native streaming operations: forEach
Native streaming operations: Stateless
● selectKey
● filter
● map/mapValues
● flatMap/flatMapValues
● peek
● forEach
● groupByKey
● toStream
Lab 3: Stateless
● Dependencies
○ spring-cloud-stream-binder-kafka-streams
● Added custom interface: KStreamSink
● Methods used
○ .filter
○ .print
● Updated configuration:
○ spring.cloud.stream.default-binder=kafka
○ spring.cloud.stream.bindings.native-input.binder=kstream
Native streaming operations: stateful
● State store is used
○ In memory database
○ RocksDB
● Fault-Tolerant: replicated changelog topic in Kafka
Native streaming operations: groupByKey
● Groups records in KGroupedStream
● Required before aggregation operations
● Writes data to new topic (might repartition)
Native streaming operations: count
Native streaming operations: aggregations
● Transforms groupedKStream to Ktable
● Need Initializer: aggValue = 0
● Operation: “adder”: aggValue + oldValue
Native streaming operations: joining
Native streaming operations: stateful
● groupByKey (still stateless)
● count
● aggregations
● joining
● windowing
Lab 3: Stateful
Windows
● Tumbling
● Sliding
● Session
Tumbling
Sliding
Session windows
Session windows
● Limited by an inactivity gap
● Be aware: the data you need to process might grow
Lab 4: Windows & Statefull
● GroupByKey
○ Use of SerDe (StringSerde and JsonSerde)
Lab 4: Windows
● Methods used
○ .windowedBy
○ .aggregate
■ Use of aggregator class
■ Materialized with
○ .mapValues: convert records
○ .toStream: Convert KTable to KStream
Session windows: Traffic Congestion
Session windows: Traffic Congestion
Session windows: Traffic Congestion
● Merge results of all lanes
● If average speed < 50km => slow traffic
● To: slow-traffic-topic
● @Input slow-traffic-topic => session window with gap of 5 minutes
● Aggregate results: vehicle count
● To: vehicles-involved-in-traffic-jam
● Because the session window also has a start and end time
● => length of the traffic jam
Thank you for attending!

More Related Content

What's hot (20)

PDF
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
PDF
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
PDF
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
HostedbyConfluent
 
PDF
Tale of two streaming frameworks (Karthik D - Walmart)
KafkaZone
 
PDF
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
PDF
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
PDF
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
PDF
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
PDF
Death of the dumb pipes: Using Apache Kafka® for Integration projects
HostedbyConfluent
 
PDF
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
Lisa Roth, PMP
 
PDF
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
PPTX
Real-World Pulsar Architectural Patterns
Devin Bost
 
PPTX
Neo4j Graph Streaming Services with Apache Kafka
jexp
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
PDF
Can Apache Kafka Replace a Database?
Kai Wähner
 
PPTX
Flink SQL in Action
Fabian Hueske
 
PDF
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
PPTX
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Kairo Tavares
 
PDF
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
PDF
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
confluent
 
Using Location Data to Showcase Keys, Windows, and Joins in Kafka Streams DSL...
confluent
 
Kafka Summit SF 2017 - Database Streaming at WePay
confluent
 
Self-service Events & Decentralised Governance with AsyncAPI: A Real World Ex...
HostedbyConfluent
 
Tale of two streaming frameworks (Karthik D - Walmart)
KafkaZone
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
confluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
HostedbyConfluent
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
confluent
 
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
HostedbyConfluent
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
Lisa Roth, PMP
 
Maximize the Business Value of Machine Learning and Data Science with Kafka (...
confluent
 
Real-World Pulsar Architectural Patterns
Devin Bost
 
Neo4j Graph Streaming Services with Apache Kafka
jexp
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
HostedbyConfluent
 
Can Apache Kafka Replace a Database?
Kai Wähner
 
Flink SQL in Action
Fabian Hueske
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
confluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Kairo Tavares
 
Event-Driven Stream Processing and Model Deployment with Apache Kafka, Kafka ...
Kai Wähner
 
Tackling Kafka, with a Small Team ( Jaren Glover, Robinhood) Kafka Summit SF ...
confluent
 

Similar to Stream Processing Live Traffic Data with Kafka Streams (20)

PDF
Stream Processing Live Traffic Data with Kafka Streams
Tim Ysewyn
 
PPTX
Stream Processing Live Traffic Data with Kafka Streams
Tim Ysewyn
 
PPTX
Kafka streams distilled
Mikhail Grinfeld
 
PDF
Streaming with Spring Cloud Stream and Apache Kafka - Soby Chacko
VMware Tanzu
 
PDF
Data Streaming in Kafka
SilviuMarcu1
 
PDF
Build real time stream processing applications using Apache Kafka
Hotstar
 
PPTX
Real time data pipline with kafka streams
Yoni Farin
 
ODP
Stream processing using Kafka
Knoldus Inc.
 
PPTX
Kafka Streams for Java enthusiasts
Slim Baltagi
 
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
PDF
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
PDF
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko
 
PDF
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Ben Stopford
 
PDF
Introduction to Kafka Streams
Guozhang Wang
 
PDF
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
PPTX
Building Distributed Data Streaming System
Ashish Tadose
 
PDF
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
PPTX
Kafka streams decoupling with stores
Yoni Farin
 
PDF
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
PDF
Streaming Visualisation
Guido Schmutz
 
Stream Processing Live Traffic Data with Kafka Streams
Tim Ysewyn
 
Stream Processing Live Traffic Data with Kafka Streams
Tim Ysewyn
 
Kafka streams distilled
Mikhail Grinfeld
 
Streaming with Spring Cloud Stream and Apache Kafka - Soby Chacko
VMware Tanzu
 
Data Streaming in Kafka
SilviuMarcu1
 
Build real time stream processing applications using Apache Kafka
Hotstar
 
Real time data pipline with kafka streams
Yoni Farin
 
Stream processing using Kafka
Knoldus Inc.
 
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko
 
Building Event Driven Services with Apache Kafka and Kafka Streams - Devoxx B...
Ben Stopford
 
Introduction to Kafka Streams
Guozhang Wang
 
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Building Distributed Data Streaming System
Ashish Tadose
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
Helena Edelson
 
Kafka streams decoupling with stores
Yoni Farin
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Streaming Visualisation
Guido Schmutz
 
Ad

Recently uploaded (20)

PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Online Queue Management System for Public Service Offices in Nepal [Focused i...
Rishab Acharya
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pdf
Varsha Nayak
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Ad

Stream Processing Live Traffic Data with Kafka Streams

Editor's Notes

  • #4: Shorter feedback loop: fraud detection, much nicer feedback to the customers, …
  • #5: The old days: Query in order to retrieve data you need The glorious time of the batch jobs
  • #6: Reacts on Events Events should be as complete as possible Continuous stream
  • #7: Data it owns: this is data tied owned by the publisher in the event Data it needs: this is data which can originate from other services but which is necessary to handle the event Referenced Data: data which might be relevant for the event. For example when booking a holiday, the reference temperatures of the location to where you want to travel to. Example contract update: owns: new price / needs: contract data, old price, discounts, …. / references: customer data to contact customer
  • #8: Shorter feedback loop: fraud detection, much nicer feedback to the customers, …
  • #9: Much shorter feedback for your business users Because you are processing smaller sets of data at the same time resources can be used more efficiently Stream processing tends to feel more natural, as most data also enters your system as a stream There is no longer a need for large and expensive databases, each stream processing application maintains its own data and state And each application also tends to decide itself what it will consume
  • #16: No traffic data for that lane and vehicle type … so we say that the vehicle speed is 252 …
  • #31: Every record processed can result in 0, 1 or more new records
  • #36: These data store can also be used by other processors
  • #50: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.
  • #51: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.
  • #52: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.