SlideShare a Scribd company logo
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek
1!
Aljoscha Krettek
@aljoscha
Big Data Spain
November 17, 2016
Apache Flink for IoT:
How Event-Time Processing Enables
Easy and Accurate Analytics
What I’d Like to Talk About
2
§  Streaming Architecture and Flink
§  IoT and Event-Time based stream
processing
§  Use-Case Examples
3
Original creators of
Apache Flink®
Providers of the
dA Platform, a supported
Flink distribution
Intro: The Streaming Architecture
4
Rethinking Data Architecture
§  Better app isolation
§  Real-time reaction to events
§  Robust continuous applications
§  Process both real-time and historical data
5
6
app state
app state
app state
event log
Query
service
What is (Distributed) Streaming
§  Streaming:
Computations on never-
ending “streams” of data
records (“events”)
§  Distributed:
Computation spread
across many machines
7
Your
code
Your
code
Your
code
Your
code
What is Stateful Streaming
§  Computation and state
•  E.g., counters, windows of past
events, state machines, trained ML
models
§  Result depends on history of
stream
§  A stateful stream processor
should gives the tools to manage
state
•  Recover, roll back, version,
upgrade, etc
8
Your
code
state
What is Event-Time Streaming
§  Data records associated with
timestamps (time series data)
§  Processing depends on timestamps
§  An event-time stream processor
should give you the tools to reason
about time
•  Handle streams that are out of order
•  Core feature is watermarks – a clock
to measure event time
9
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
Recap: What is Streaming?
§  Continuous processing on data that is
continuously generated
§  I.e., pretty much all “big” data
§  It’s all about state and time
§  Flink does all of what we just saw
10
IoT and Event-time Stream
Processing
11
12
1read.bi/1yDOQQ3
The 'Internet Of Everything' Will
Generate $14.4 Trillion Of Value Over
The Next Decade.1
Example Event Sources
13
A Simple Definition
14
IoT use cases from the system’s
perspective:
A large number of (distributed) things
generating a large amount of data.
Important Properties
15
§  Data is continuously produced
→ Stream Processing
§  Events have a timestamp that has to be
considered
→ Event-time based processing
§  Data/Events can arrive with huge delays
§  Most analyses happen on time windows
Remember: Streaming technology is
enabling the obvious: continuous
processing on data that is continuously
produced
Hint: you already have streaming data
16
What Is Event-Time Processing
17
1312735961112
1234567891011121314
Processing Time
Event timestamp
Message Queue
What’s The Problem?
18
13
12
735961112
1234567891011121314
Processing Time
Processing-Time Windows 137356
12 137 356Event-Time Windows
12
1112
Mismatch between event time
and processing time.
Sources of Time Mismatch
§  Big Mismatch
•  Network disconnects
•  Slow network
§  Small Mismatch
•  The nature of distributed systems
•  Differing system clock time
19
Big Event-Time Mismatch
20
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Event Time
Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®:
A Simple Walkthrough
http://data-artisans.com/robust-stream-processing-flink-walkthrough/
22
23
24
Recap: Event-Time
§  IoT use cases need event-time
processing
§  Even small mismatch of event time/
processing time will lead to wrong results
25
Use-Case Examples
26
30 Flink applications in production for more than
one year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining
state of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
27
King
§  Challenges:
•  Many games (Candy Crush, Farm Heroes, Pet
Rescue, and Bubble Witch…)
•  300 million monthly unique users
•  30 billion events received every day
§  Need Event-Time Based statistics
28https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
29https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
§  Multiplexing of multiple data scientist
requests into a single Flink job
§  Groovy as language for analysis scripts
§  Event-time windowing
30https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Bouygues Telecom
31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
~120
users*
5 Flink
Production
Apps
750 TB
Storage
4 billion
Events/
day
2015
~300
users*
30 Flink
Production
Apps
2 PB
Storage5
10 billion
Events/
day
2016
* Users of the information system
Bouygues: Challenges
§  Low latency & streaming fashion counters
§  Massive amounts of data + bursty loads
§  Reliability
§  Multiple flow correlation
§  Time management:
•  Out of order & late events → our worst enemies
•  Flexible window management
32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
In Summary
34
§  If you need to ask: you already have a
streaming use case!
§  IoT requires Proper Time Management
§  Apache Flink has done that for a long
time now*
* Since version 0.10
3
Thank you!
@aljoscha
@ApacheFlink
@dataArtisans
36
One day of hands-on Flink training
One day of conference
Tickets are on sale
Call for Papers is already open
Please visit our website:
http://sf.flink-forward.org
Follow us on Twitter:
@FlinkForward
We are hiring!		
data-artisans.com/careers
Appendix
38

More Related Content

What's hot (20)

PPTX
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
PDF
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
PPTX
Migrating Big Data Workloads to the Cloud
Robert Sanders
 
PDF
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
PDF
Reliable and Scalable Data Ingestion at Airbnb
DataWorks Summit/Hadoop Summit
 
PPTX
Telco analytics at scale
datamantra
 
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks
 
PDF
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Databricks
 
PPTX
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Institute e-Austria Timisoara
 
PPTX
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
PDF
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
PDF
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
PDF
Data Pipline Observability meetup
Omid Vahdaty
 
PDF
Batch and Interactive Analytics: From Data to Insight
WSO2
 
PPTX
Five ways database modernization simplifies your data life
SingleStore
 
PDF
Big Data Monitoring Cockpit
Stefan Bergstein
 
PPTX
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
PDF
Stream Scaling in Pravega
DataWorks Summit
 
Intuit Analytics Cloud 101
DataWorks Summit/Hadoop Summit
 
Power Your Delta Lake with Streaming Transactional Changes
Databricks
 
Migrating Big Data Workloads to the Cloud
Robert Sanders
 
Building Reactive Real-time Data Pipeline
Trieu Nguyen
 
Reliable and Scalable Data Ingestion at Airbnb
DataWorks Summit/Hadoop Summit
 
Telco analytics at scale
datamantra
 
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
Databricks
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Databricks
 
Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015
Institute e-Austria Timisoara
 
Big Data Day LA 2016/ Data Science Track - The Evolving Data Science Landscap...
Data Con LA
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
Data Pipline Observability meetup
Omid Vahdaty
 
Batch and Interactive Analytics: From Data to Insight
WSO2
 
Five ways database modernization simplifies your data life
SingleStore
 
Big Data Monitoring Cockpit
Stefan Bergstein
 
Zero Downtime App Deployment using Hadoop
DataWorks Summit/Hadoop Summit
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
 
Stream Scaling in Pravega
DataWorks Summit
 

Viewers also liked (11)

PDF
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
PDF
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
PDF
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
PDF
Managing Data Science by David Martínez Rego
Big Data Spain
 
PDF
Growing Data Scientists by Amparo Alonso Betanzos
Big Data Spain
 
PDF
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
Big Data Spain
 
PDF
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
PDF
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
PDF
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Big Data Spain
 
PDF
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
PDF
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Big Data Spain
 
HOW TO APPLY BIG DATA ANALYTICS AND MACHINE LEARNING TO REAL TIME PROCESSING ...
Big Data Spain
 
Next generation Polyglot Architectures using Neo4j by Stefan Kolmar
Big Data Spain
 
Stream Processing as Game Changer for Big Data and Internet of Things by Kai ...
Big Data Spain
 
Managing Data Science by David Martínez Rego
Big Data Spain
 
Growing Data Scientists by Amparo Alonso Betanzos
Big Data Spain
 
RUNNING A PETASCALE DATA SYSTEM: GOOD, BAD, AND UGLY CHOICES by Alexey Kharlamov
Big Data Spain
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Big Data Spain
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
Inferring the effect of an event using CausalImpact by Kay H. Brodersen
Big Data Spain
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
Big Data Spain
 
Open data : from Insight to Visualisation with Google BigQuery and Carto.com ...
Big Data Spain
 
Ad

Similar to Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek (20)

PPTX
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
PDF
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Stream Processing with Apache Flink
C4Media
 
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
PDF
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
PPTX
Debunking Common Myths in Stream Processing
Kostas Tzoumas
 
PPTX
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
PPTX
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
PPTX
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
PPTX
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
PDF
Complex event processing platform handling millions of users - Krzysztof Zarz...
GetInData
 
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
PDF
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PPTX
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
PPTX
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Stream Processing with Apache Flink
C4Media
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Thomas Lamirault_Mohamed Amine Abdessemed -A brief history of time with Apac...
Flink Forward
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
Debunking Common Myths in Stream Processing
Kostas Tzoumas
 
Streaming in the Wild with Apache Flink
DataWorks Summit/Hadoop Summit
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
HostedbyConfluent
 
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
Debunking Common Myths in Stream Processing
DataWorks Summit/Hadoop Summit
 
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
GetInData
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
Unified Stream and Batch Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi
 
Ad

More from Big Data Spain (20)

PDF
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
PDF
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
PDF
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
PDF
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
PDF
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
PDF
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
PDF
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
PDF
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
PDF
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
PDF
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
PDF
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
PDF
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
PDF
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
PDF
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
PDF
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
PDF
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
PDF
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
PDF
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
PDF
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
PDF
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data Spain
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Big Data Spain
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
Big Data Spain
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Big Data Spain
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Big Data Spain
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Big Data Spain
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Big Data Spain
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Big Data Spain
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
Big Data Spain
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Big Data Spain
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Big Data Spain
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Big Data Spain
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Big Data Spain
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Big Data Spain
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Big Data Spain
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
Big Data Spain
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Big Data Spain
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
Big Data Spain
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Big Data Spain
 

Recently uploaded (20)

PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Français Patch Tuesday - Juillet
Ivanti
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Top Managed Service Providers in Los Angeles
Captain IT
 

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek