Stream Processing Live
Traffic Data with Kafka
Streams
Tim Ysewyn
Principal Java
Software Engineer
Spring & Spring Cloud
Contributor
@TYsewyn
Who are we
Tom Van den Bulck
Principal Java
Software Engineer
Competence Leader
Fast & Big Data
@tomvdbulck
Setup Environment
http://bit.ly/Hands-on-Labs-Devoxx-2018
What
http://bit.ly/Hands-on-Labs-Devoxx-2018
What
http://bit.ly/Hands-on-Labs-Devoxx-2018
What: Event
● Data it owns
● Data it needs
● References data
What: Streaming
● Reacts on events
● Continuously
Why
● Much shorter feedback loop
● More resource efficient
● Stream processing feels more natural
● Decentralize and decouple infrastructure
The Data
The Data
● Every minute XML is generated
○ So it is not the raw data
● Be aware:
○ Dutch words
The Data
● XML with fixed sensor data
○ <meetpunt unieke_id="3640">
<beschrijvende_id>H291L10</beschrijvende_id>
<volledige_naam>Parking Kruibeke</volledige_naam>
<Ident_8>A0140002</Ident_8>
<lve_nr>437</lve_nr>
<Kmp_Rsys>94,695</Kmp_Rsys>
<Rijstrook>R10</Rijstrook>
<X_coord_EPSG_31370>144477,0917</X_coord_EPSG_31370>
<Y_coord_EPSG_31370>208290,6237</Y_coord_EPSG_31370>
<lengtegraad_EPSG_4326>4,289767347</lengtegraad_EPSG_4326>
<breedtegraad_EPSG_4326>51,18458196</breedtegraad_EPSG_4326>
</meetpunt>
The Data
● XML with dynamic traffic data
○ <meetpunt beschrijvende_id="H222L10" unieke_id="29">
<lve_nr>55</lve_nr>
<tijd_waarneming>2018-11-03T14:43:00+01:00</tijd_waarneming>
<tijd_laatst_gewijzigd>2018-11-03T14:44:24+01:00</tijd_laatst_gewijzigd>
<actueel_publicatie>1</actueel_publicatie>
<beschikbaar>1</beschikbaar>
The Data
● XML with dynamic traffic data
○ <meetdata klasse_id="4">
<verkeersintensiteit>2</verkeersintensiteit>
<voertuigsnelheid_rekenkundig>60</voertuigsnelheid_rekenkundig>
<voertuigsnelheid_harmonisch>59</voertuigsnelheid_harmonisch>
</meetdata>
The Data
● XML with dynamic traffic data
○ /*
Note: the vehicle class MOTO(1),
does not provide reliable data.
*/
MOTO(1),
CAR(2),
CAMIONET(3), // a VAN
RIGGID_LORRIES(4),
TRUCK_OR_BUS(5),
UNKNOWN(0);
The Data
● XML with dynamic traffic data
○ <meetdata klasse_id="3">
<verkeersintensiteit>0</verkeersintensiteit>
<voertuigsnelheid_rekenkundig>0</voertuigsnelheid_rekenkundig>
<voertuigsnelheid_harmonisch>252</voertuigsnelheid_harmonisch>
</meetdata>
The Data
● Do not worry
● We translated it to simplified POJO
● TrafficEvent.java
The Data: Some Lessons
● Think about the language
● Think about the values you are going to output
○ 252 when no readings
○ 254 when an error occurred
How
How
Lab 1: Send events to Kafka
● Dependencies
○ spring-cloud-starter-stream-kafka
○ spring-cloud-stream-reactive
● Added @Scheduling
● Added @EnableBinding
● Added @StreamEmitter (spring-cloud-stream-reactive)
● Added @SendTo
● Properties:
○ spring.cloud.stream.bindings.output.destination=traffic-data
Lab 2: Intake of data from Kafka
● @EnableBinding
● @StreamListener(Source.INPUT)
● Properties:
○ spring.cloud.stream.bindings.input.destination=traffic-data
Native streaming: KStream
Native streaming: KTable
Native streaming operations: toStream
Native streaming operations: Stateless
● selectKey
● filter
● map/mapValues
● flatMap/flatMapValues
● peek
● forEach
● groupByKey
● toStream
Native streaming operations: filter
Native streaming operations: map
Native streaming operations: flatMap
Native streaming operations: peek
Native streaming operations: forEach
Lab 3: Stateless
● Dependencies
○ spring-cloud-stream-binder-kafka-streams
● Added custom interface: KStreamSink
● Methods used
○ .filter
○ .print
● Update configuration:
○ spring.cloud.stream.default-binder=kafka
○ spring.cloud.stream.bindings.native-input.binder=kstream
Native streaming operations: stateful
● groupByKey (still stateless)
● count
● aggregations
● joining
● windowing
Native streaming operations: groupByKey
● Groups records in KGroupedStream
● Required before aggregation operations
● Writes data to new topic (might repartition)
Native streaming operations: count
Native streaming operations: aggregations
● Transforms groupedKStream to Ktable
● Need Initializer: aggValue = 0
● Operation: “adder”: aggValue + oldValue
Native streaming operations: joining
Lab 3: Stateful
● GroupByKey
○ Use of SerDe (StringSerde and JsonSerde)
● Methods used
○ .count
○ .toStream: Convert KTable to KStream
Windows
● Tumbling
● Sliding
● Session
Tumbling
Sliding
Session windows
Session windows
● Limited by an inactivity gap
● Be aware: the data you need to process might grow
Lab 4: Windows
● Methods used
○ .windowedBy
○ .aggregate
■ Use of aggregator class
■ Materialized with
○ .mapValues: convert records
Session windows: Traffic Congestion
Session windows: Traffic Congestion
Session windows: Traffic Congestion
● Merge results of all lanes
● If average speed < 50km => slow traffic
● To: slow-traffic-topic
● @Input slow-traffic-topic => session window with gap of 5 minutes
● Aggregate results: vehicle count
● To: vehicles-involved-in-traffic-jam
● Because the session window also has a start and end time
● => length of the traffic jam
Thank you for attending!

More Related Content

PDF
Stream Processing Live Traffic Data with Kafka Streams
PDF
Apache Airflow at Dailymotion
PDF
From business requirements to working pipelines with apache airflow
PDF
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
PDF
[Meetup] a successful migration from elastic search to clickhouse
PPTX
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
DOCX
empirical analysis modeling of power dissipation control in internet data ce...
PPTX
Dato vs GraphX
Stream Processing Live Traffic Data with Kafka Streams
Apache Airflow at Dailymotion
From business requirements to working pipelines with apache airflow
Gyula Fóra - RBEA- Scalable Real-Time Analytics at King
[Meetup] a successful migration from elastic search to clickhouse
ClickHouse Paris Meetup. ClickHouse Analytical DBMS, Introduction. By Alexand...
empirical analysis modeling of power dissipation control in internet data ce...
Dato vs GraphX

What's hot (20)

PDF
Presto Summit 2018 - 04 - Netflix Containers
PDF
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
PDF
AIRflow at Scale
PDF
Collecting metrics with Graphite and StatsD
PDF
Google Cloud Dataflow
PDF
Presto Summit 2018 - 10 - Qubole
PDF
GraphQL in Kiwi.com
PDF
Statsd introduction
PDF
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
PDF
Zentral QueryCon 2018
PPTX
Serverless GraphQL. AppSync 101
PDF
Using ClickHouse for Experimentation
PDF
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
PDF
DOWNSAMPLING DATA
PDF
Presto Summit 2018 - 03 - Starburst CBO
PDF
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
PDF
Graphite, an introduction
PDF
Industrializing Machine learning pipelines
PDF
tado° Makes Your Home Environment Smart with InfluxDB
PDF
RBea: Scalable Real-Time Analytics at King
Presto Summit 2018 - 04 - Netflix Containers
uReplicator: Uber Engineering’s Scalable, Robust Kafka Replicator
AIRflow at Scale
Collecting metrics with Graphite and StatsD
Google Cloud Dataflow
Presto Summit 2018 - 10 - Qubole
GraphQL in Kiwi.com
Statsd introduction
Introducing TiDB [Delivered: 09/25/18 at Portland Cloud Native Meetup]
Zentral QueryCon 2018
Serverless GraphQL. AppSync 101
Using ClickHouse for Experimentation
Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing
DOWNSAMPLING DATA
Presto Summit 2018 - 03 - Starburst CBO
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Graphite, an introduction
Industrializing Machine learning pipelines
tado° Makes Your Home Environment Smart with InfluxDB
RBea: Scalable Real-Time Analytics at King
Ad

Similar to Stream Processing Live Traffic Data with Kafka Streams (20)

PPTX
Stream Processing Live Traffic Data with Kafka Streams
PPTX
Kafka streams distilled
PPTX
Real time data pipline with kafka streams
PPTX
Kafka Streams for Java enthusiasts
PDF
Streaming Visualisation
PDF
Build real time stream processing applications using Apache Kafka
PDF
Spark (Structured) Streaming vs. Kafka Streams
PPTX
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
PDF
Streaming Visualization
PDF
Building Streaming Data Applications Using Apache Kafka
PDF
Data Streaming in Kafka
PDF
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
ODP
Stream processing using Kafka
PPTX
Kafka streams decoupling with stores
PDF
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
PPTX
KDD 2016 Streaming Analytics Tutorial
PPTX
Trivento summercamp fast data 9/9/2016
PPTX
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
PDF
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
PDF
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Stream Processing Live Traffic Data with Kafka Streams
Kafka streams distilled
Real time data pipline with kafka streams
Kafka Streams for Java enthusiasts
Streaming Visualisation
Build real time stream processing applications using Apache Kafka
Spark (Structured) Streaming vs. Kafka Streams
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Streaming Visualization
Building Streaming Data Applications Using Apache Kafka
Data Streaming in Kafka
From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...
Stream processing using Kafka
Kafka streams decoupling with stores
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
KDD 2016 Streaming Analytics Tutorial
Trivento summercamp fast data 9/9/2016
Real Time UI with Apache Kafka Streaming Analytics of Fast Data and Server Push
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Ad

Recently uploaded (20)

PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
DOCX
Basics of Cloud Computing - Cloud Ecosystem
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
giants, standing on the shoulders of - by Daniel Stenberg
Data Virtualization in Action: Scaling APIs and Apps with FME
Advancing precision in air quality forecasting through machine learning integ...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
Rapid Prototyping: A lecture on prototyping techniques for interface design
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Introduction to MCP and A2A Protocols: Enabling Agent Communication
LMS bot: enhanced learning management systems for improved student learning e...
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
Basics of Cloud Computing - Cloud Ecosystem
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
A symptom-driven medical diagnosis support model based on machine learning te...
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf

Stream Processing Live Traffic Data with Kafka Streams

Editor's Notes

  • #4: Shorter feedback loop: fraud detection, much nicer feedback to the customers, …
  • #5: The old days: Query in order to retrieve data you need The glorious time of the batch jobs
  • #6: Reacts on Events Events should be as complete as possible Continuous stream
  • #7: Data it owns: this is data tied owned by the publisher in the event Data it needs: this is data which can originate from other services but which is necessary to handle the event Referenced Data: data which might be relevant for the event. For example when booking a holiday, the reference temperatures of the location to where you want to travel to. Example contract update: owns: new price / needs: contract data, old price, discounts, …. / references: customer data to contact customer
  • #8: Shorter feedback loop: fraud detection, much nicer feedback to the customers, …
  • #9: Much shorter feedback for your business users Because you are processing smaller sets of data at the same time resources can be used more efficiently Stream processing tends to feel more natural, as most data also enters your system as a stream There is no longer a need for large and expensive databases, each stream processing application maintains its own data and state And each application also tends to decide itself what it will consume
  • #16: No traffic data for that lane and vehicle type … so we say that the vehicle speed is 252 …
  • #29: Every record processed can result in 0, 1 or more new records
  • #45: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.
  • #46: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.
  • #47: Interactive whiteboard session to show what you could do with a session window on the current dataset. => merge results into single data point for entire highway section (all lanes and all vehicles) => if average speed < 50 km => traffic jam => send this out to another topic => apply session window with a gap of 5 minutes => aggregate results: vehicle count => resulting output should give you the amount of vehicles involved within a traffic jam => Because you also know the length of every given session you should also be able to know how long it lasted.