SlideShare a Scribd company logo
#DevoxxFR
Stream Processing with Apache Flink
Tugdual “Tug” Grall
Technical Evangelist @ MapR
tug@mapr.com
@tgrall
1
#DevoxxFR
{“about” : “me”}
2
Tugdual “Tug” Grall
• MapR : Technical Evangelist
• MongoDB, Couchbase, eXo, Oracle
• NantesJUG co-founder

• @tgrall
• http://tgrall.github.io
• tug@mapr.com / tugdual@gmail.com
#DevoxxFR 3
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
#DevoxxFR 4
Streaming technology is enabling the obvious:
continuous processing on data
that is continuously produced
Hint: you already have streaming data
#DevoxxFR
Decoupling
5
App B
App A
App C
State managed centralized
App B
App A
App C
Applications build their own state
#DevoxxFR 6
Event
Stream=Data
Pipelines
#DevoxxFR
Streaming and Batch
7
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
#DevoxxFR
Streaming and Batch
8
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
Stream (low latency)
Stream (high latency)
#DevoxxFR
Streaming and Batch
9
2016-3-1

12:00 am
2016-3-1

1:00 am
2016-3-1

2:00 am
2016-3-11

11:00pm
2016-3-12

12:00am
2016-3-12

1:00am
2016-3-11

10:00pm
2016-3-12

2:00am
2016-3-12

3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
#DevoxxFR
Processing
10
• Request / Response
#DevoxxFR
Processing
11
• Request / Response
• Batch
#DevoxxFR
Processing
12
• Request / Response
• Batch
• Stream Processing
#DevoxxFR
Processing
13
• Request / Response
• Batch
• Stream Processing
• Real-time reaction to events
• Continuous applications
• Process both real-time and historical data
#DevoxxFR 14
#DevoxxFR
Flink Architecture
15
#DevoxxFR
Flink Architecture
16
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
#DevoxxFR
Flink Architecture
17
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
#DevoxxFR 18
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
API
&
Libraries
#DevoxxFR
Flink Architecture
19
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
#DevoxxFR
Flink Architecture
20
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
DataStream API
Stream Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
#DevoxxFR
Flink Architecture
21
Deployment
Local Cluster Cloud
Single JVM Standalone,YARN, Mesos AWS, Google
Core
Runtime
Distributed Streaming Dataflow
DataSet API
Batch Processing
DataStream API
Stream Processing
API
&
Libraries
FlinkML
Machine Learning
Gelly
Graph Processing
Table
Relational
CEP
Event Processing
Table
Relational
#DevoxxFR 22
Demonstration
Flink Basics
#DevoxxFR
Batch & Stream
23
case class Word (word: String, frequency: Int)
// DataSet API - Batch
val lines: DataSet[String] = env.readTextFile(…)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
// DataStream API - Streaming
val lines: DataSream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(“ ”).map(word => Word(word,1))}
.keyBy("word”).window(Time.of(5,SECONDS))
.every(Time.of(1,SECONDS)).sum(”frequency")
.print()
#DevoxxFR
Steam Processing
24
Source
Filter /

Transform
Sink
#DevoxxFR
Flink Ecosystem
25
Source Sink
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Twitter
Apache Bahir
…
Apache Kafka
MapR Streams
AWS Kinesis
RabbitMQ
Elasticsearch
HDFS/MapR-FS
…
#DevoxxFR
Stateful Steam Processing
26
Source
Filter /

Transform
State

read/write
Sink
#DevoxxFR 27
Is Flink used?
#DevoxxFR
Powered by Flink
28
#DevoxxFR 29
10 Billion events/day
2Tb of data/day
30 Applications
2Pb of storage and growing
Source Bouyges Telecom : http://berlin.flink-forward.org/wp-content/uploads/2016/07/Thomas-Lamirault_Mohamed-Amine-Abdessemed-A-brief-history-of-time-with-Apache-Flink.pdf
#DevoxxFR 30
Stream Processing
Windowing
#DevoxxFR
Stream Windows
31
#DevoxxFR
Stream Windows
32
#DevoxxFR
Stream Windows
33
#DevoxxFR
Stream Windows
34
#DevoxxFR
Stream Windows
35
#DevoxxFR 36
Demonstration
Flink Windowing
#DevoxxFR 37
Time
What about it ?
#DevoxxFR
Demonstration
38
• Multiple notion of “Time” in Flink
• Event Time
• Ingestion Time
• Processing Time
#DevoxxFR
What Is Event-Time Processing
39
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode

IV
Episode

V
Episode

VI
Episode

I
Episode

II
Episode

III
Episode

VII
Event Time
#DevoxxFR
Time in Flink
40
#DevoxxFR 41
Complex Event Processing
#DevoxxFR
Complex Event Processing
42
• Analyzing a stream of events and drawing conclusions
• “if A and then B ! infer event C”
• Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
#DevoxxFR
Stream Windows
43
#DevoxxFR
Order Events
44
Process is reflected in a stream of order events
Order(orderId, tStamp, “received”)
Shipment(orderId, tStamp, “shipped”)
Delivery(orderId, tStamp,
“delivered”)
orderId: Identifies the order
tStamp: Time at which the event happened
#DevoxxFR
Real-time Warnings
45
#DevoxxFR
CEP to the Rescue
46
Define processing and delivery intervals (SLAs)
ProcessSucc(orderId, tStamp, duration)
ProcessWarn(orderId, tStamp)
DeliverySucc(orderId, tStamp, duration)
DeliveryWarn(orderId, tStamp)
orderId: Identifies the order
tStamp: Time when the event happened
duration: Duration of the processing/delivery
#DevoxxFR
CEP Example
47
#DevoxxFR
Processing: Order ! Shipment
48
#DevoxxFR 49
Processing: Order ! Shipment
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
#DevoxxFR 50
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
Processing: Order ! Shipment
#DevoxxFR 51
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
Processing: Order ! Shipment
#DevoxxFR
Count Delayed Shipments
52
#DevoxxFR
Compute Avg Processing
Time
53
#DevoxxFR
The End
54
• Process events in real time and/or batch
• Complex Event Processing (CEP)
• Many other things to discover
• Deployment
• High Availability
• Table/Relational API
• … https://mapr.com/ebooks/
#DevoxxFR 55
Flink Community
&
Thanks to
Kostas Tzoumas
Stephan Ewen
Fabian Hueske
Till Rohrmann
Jamie Grier
#DevoxxFR
Stream Processing with Apache Flink
Tugdual “Tug” Grall
Technical Evangelist @ MapR
tug@mapr.com
@tgrall
56

More Related Content

What's hot (20)

PDF
From stream to recommendation using apache beam with cloud pubsub and cloud d...
Neville Li
 
PDF
Uber Real Time Data Analytics
Ankur Bansal
 
PDF
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
PDF
Stream Processing in Uber
C4Media
 
PDF
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
PDF
dA Platform Overview
Robert Metzger
 
PDF
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
PPTX
Apache HBase at Airbnb
HBaseCon
 
PDF
Using Kafka to integrate DWH and Cloud Based big data systems
confluent
 
PDF
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
PDF
Kapacitor Manager
InfluxData
 
PDF
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
PDF
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
PDF
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
PPTX
Symantec: Cassandra Data Modelling techniques in action
DataStax Academy
 
PPT
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
PDF
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
PDF
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
 
PPTX
Log Events @Twitter
lohitvijayarenu
 
PPTX
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
Neville Li
 
Uber Real Time Data Analytics
Ankur Bansal
 
Gelly-Stream: Single-Pass Graph Streaming Analytics with Apache Flink
Vasia Kalavri
 
Stream Processing in Uber
C4Media
 
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
dA Platform Overview
Robert Metzger
 
Bringing Streaming Data To The Masses: Lowering The “Cost Of Admission” For Y...
confluent
 
Apache HBase at Airbnb
HBaseCon
 
Using Kafka to integrate DWH and Cloud Based big data systems
confluent
 
Kenneth Knowles - Apache Beam - A Unified Model for Batch and Streaming Data...
Flink Forward
 
Kapacitor Manager
InfluxData
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Yaroslav Tkachenko
 
Build a Time Series Application with Apache Spark and Apache HBase
Carol McDonald
 
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Symantec: Cassandra Data Modelling techniques in action
DataStax Academy
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
Change Data Streaming Patterns for Microservices With Debezium
confluent
 
Capital One: Using Cassandra In Building A Reporting Platform
DataStax Academy
 
Log Events @Twitter
lohitvijayarenu
 
KEYNOTE Flink Forward San Francisco 2019: From Stream Processor to a Unified ...
Flink Forward
 

Similar to Introduction to Streaming with Apache Flink (20)

PPTX
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PPTX
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
ODP
Meet Up - Spark Stream Processing + Kafka
Knoldus Inc.
 
PPTX
Introduction to Apache Flink at Vienna Meet Up
Stefan Papp
 
PDF
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
PPTX
Apache Flink Deep Dive
DataWorks Summit
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
PPTX
Have your cake and eat it too
Gwen (Chen) Shapira
 
PPTX
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 
PDF
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
PPTX
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
PPT
Spark streaming
Venkateswaran Kandasamy
 
PDF
Flink Streaming Berlin Meetup
Márton Balassi
 
PDF
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
PPTX
First Flink Bay Area meetup
Kostas Tzoumas
 
PPTX
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
PPTX
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Robert Metzger
 
Meet Up - Spark Stream Processing + Kafka
Knoldus Inc.
 
Introduction to Apache Flink at Vienna Meet Up
Stefan Papp
 
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
Apache Flink Deep Dive
DataWorks Summit
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Have your cake and eat it too
Gwen (Chen) Shapira
 
Have your Cake and Eat it Too - Architecture for Batch and Real-time processing
DataWorks Summit
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Legacy Typesafe (now Lightbend)
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Spark streaming
Venkateswaran Kandasamy
 
Flink Streaming Berlin Meetup
Márton Balassi
 
Don't Cross The Streams - Data Streaming And Apache Flink
John Gorman (BSc, CISSP)
 
First Flink Bay Area meetup
Kostas Tzoumas
 
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Real-time Stream Processing with Apache Flink
DataWorks Summit
 
Intro to Apache Apex @ Women in Big Data
Apache Apex
 
Ad

More from Tugdual Grall (20)

PDF
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 
PPTX
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
PDF
Big Data Journey
Tugdual Grall
 
PDF
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
 
PDF
Introduction to NoSQL with MongoDB - SQLi Workshop
Tugdual Grall
 
PDF
Enabling Telco to Build and Run Modern Applications
Tugdual Grall
 
PPTX
MongoDB and Hadoop
Tugdual Grall
 
PDF
Proud to be polyglot
Tugdual Grall
 
PDF
Drop your table ! MongoDB Schema Design
Tugdual Grall
 
PDF
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Tugdual Grall
 
PDF
Some cool features of MongoDB
Tugdual Grall
 
PDF
Building Your First MongoDB Application
Tugdual Grall
 
PDF
Opensourceday 2014-iot
Tugdual Grall
 
PDF
Neotys conference
Tugdual Grall
 
PDF
Softshake 2013: Introduction to NoSQL with Couchbase
Tugdual Grall
 
PDF
Introduction to NoSQL with Couchbase
Tugdual Grall
 
PDF
Why and How to integrate Hadoop and NoSQL?
Tugdual Grall
 
PDF
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
Tugdual Grall
 
PPT
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
PDF
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
Fast Cars, Big Data - How Streaming Can Help Formula 1
Tugdual Grall
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Tugdual Grall
 
Big Data Journey
Tugdual Grall
 
Proud to be Polyglot - Riviera Dev 2015
Tugdual Grall
 
Introduction to NoSQL with MongoDB - SQLi Workshop
Tugdual Grall
 
Enabling Telco to Build and Run Modern Applications
Tugdual Grall
 
MongoDB and Hadoop
Tugdual Grall
 
Proud to be polyglot
Tugdual Grall
 
Drop your table ! MongoDB Schema Design
Tugdual Grall
 
Devoxx 2014 : Atelier MongoDB - Decouverte de MongoDB 2.6
Tugdual Grall
 
Some cool features of MongoDB
Tugdual Grall
 
Building Your First MongoDB Application
Tugdual Grall
 
Opensourceday 2014-iot
Tugdual Grall
 
Neotys conference
Tugdual Grall
 
Softshake 2013: Introduction to NoSQL with Couchbase
Tugdual Grall
 
Introduction to NoSQL with Couchbase
Tugdual Grall
 
Why and How to integrate Hadoop and NoSQL?
Tugdual Grall
 
NoSQL Matters 2013 - Introduction to Map Reduce with Couchbase 2.0
Tugdual Grall
 
Big Data Paris : Hadoop and NoSQL
Tugdual Grall
 
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
Ad

Recently uploaded (20)

PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 

Introduction to Streaming with Apache Flink