SlideShare a Scribd company logo
The Power of Snapshots
Stateful Stream Processing
with Apache Flink
Stephan Ewen
QCon San Francisco, 2017
1
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
distributed-stream-processing-flink
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
2
Original creators of
Apache Flink®
dA Platform 2
Open Source Apache Flink +
dA Application Manager
3
Stream Processing
What changes faster? Data or Query?
4
Data changes slowly
compared to fast
changing queries
ad-hoc queries, data exploration,
ML training and
(hyper) parameter tuning
Batch Processing
Use Case
Data changes fast
application logic
is long-lived
continuous applications,
data pipelines, standing queries,
anomaly detection, ML evaluation, …
Stream Processing
Use Case
Batch Processing
5
Stream Processing
6
7
Stateful
Stream Processing
Moving State into the Processors
8
Application
External DBstate
Stateless
Stream Processor
Stateful
Stream Processor
Application
state
9
Apache Flink
Apache Flink in a Nutshell
10
Queries
Applications
Devices
etc.
Database
Stream
File / Object
Storage
Stateful computations over streams
real-time and historic
fast, scalable, fault tolerant, in-memory,
event time, large state, exactly-once
Historic
Data
Streams
Application
11
Event Streams State (Event) Time Snapshots
The Core Building Blocks
real-time and
hindsight
complex
business logic
consistency with
out-of-order data
and late data
forking /
versioning /
time-travel
Stateful Event & Stream Processing
12
Source
Transformation
Transformation
Sink
val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…))
val events: DataStream[Event] = lines.map((line) => parse(line))
val stats: DataStream[Statistic] = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum(new MyAggregationFunction())
stats.addSink(new RollingSink(path))
Streaming
Dataflow
Source Transform Window
(state read/write)
Sink
Stateful Event & Stream Processing
13
Scalable embedded state
Access at memory speed &
scales with parallel operators
Event time and Processing Time
14
Event Producer Message Queue
Flink
Data Source
Flink
Window Operator
partition 1
partition 2
Event
Time
Ingestion
Time
Processing
Time
Broker
Time
Event time, Watermarks, as in the Dataflow model
Powerful Abstractions
15
Process Function (events, state, time)
DataStream API (streams, windows)
Stream SQL / Tables (dynamic tables)
Stream- & Batch
Data Processing
High-level
Analytics API
Stateful Event-
Driven Applications
val stats = stream
.keyBy("sensor")
.timeWindow(Time.seconds(5))
.sum((a, b) -> a.add(b))
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = {
// work with event and state
(event, state.value) match { … }
out.collect(…) // emit events
state.update(…) // modify state
// schedule a timer callback
ctx.timerService.registerEventTimeTimer(event.timestamp + 500)
}
Layered abstractions to
navigate simple to complex use cases
16
Distributed Snapshots
Event Sourcing + Memory Image
17
event log
persists events
(temporarily)
event /
command
Process
main memory
update local
variables/structures
periodically snapshot
the memory
Event Sourcing + Memory Image
18
Recovery: Restore snapshot and replay events
since snapshot
event log
persists events
(temporarily)
Process
Consistent Distributed Snapshots
19
Scalable embedded state
Access at memory speed &
scales with parallel operators
Checkpoint Barriers
20
Consistent Distributed Snapshots
21
Trigger checkpoint Inject checkpoint barrier
Consistent Distributed Snapshots
22
Take state snapshot Trigger state
copy-on-write
Consistent Distributed Snapshots
23
Persist state snapshots Persist
snapshots
asynchronously
Processing pipeline continues
Consistent Distributed Snapshots
25
Re-load state
Reset positions
in input streams
Rolling back computation
Re-processing
Consistent Distributed Snapshots
26
Restore to different
programs
27
Checkpoints and Savepoints
in Apache Flink
Speed or Operability?
28
Fast snapshots
Checkpoint
Flexible
Operations on
Snapshots
Savepoint
What to optimize for?
Savepoints: Opt. for Operability
 Self contained: No references to other checkpoints
 Canonical format: Switch between state structures
 Efficiently re-scalable: Indexed by key group
 Future: More self-describing serialization format for to
archiving / versioning (like Avro, Thrift, etc.)
29
Checkpoints: Opt. for Efficiency
 Often incremental:
• Snapshot only diff from last snapshot
• Reference older snapshots, compaction over time
 Format specific to state backend:
• No extra copied or re-encoding
• Not possible to switch to another state backend between checkpoints
 Compact serialization: Optimized for speed/space, not long term
archival and evolution
 Key goups not indexed: Re-distribution may be more expensive
30
31
What else are snapshots /
checkpoints good for?
What users built on checkpoints
 Upgrades and Rollbacks
 Cross Datacenter Failover
 State Archiving
 State Bootstrapping
 Application Migration
 Spot Instance Region Arbitrage
 A/B testing
 …
32
33
Distributed Snapshots
and side effects
Transaction coordination for side fx
34
One snapshot can transactionally move
data between different systems
Snapshots may include side effects
Transaction coordination for side fx
 Similar to a distributed 2-phase commit
 Coordinated by asynchronous checkpoints, no voting delays
 Basic algorithm:
• Between checkpoints: Produce into transaction or Write Ahead Log
• On operator snapshot: Flush local transaction (vote-to-commit)
• On checkpoint complete: Commit transactions (commit)
• On recovery: check and commit any pending transactions
35
36
Distributed Snapshots
and Application Architectures
(A Philosophical Monologue)
Good old centralized architecture
37
The big mean
central database
$$$
The grumpy
DBA
Application Application Application Application
Stateful Stream Proc. & Applications
38
Application Application Application
Application Application
decentralized infrastructure
DevOps
decentralized responsibilities
still involves
managing databases
Stateless Application Containers
39
State management
is nasty, let's pretend we don't
have to do it
Stateless Application Containers
40
Kudos to Kiki Carter
for the Broccoli
Metaphor
Broccoli (state management)
is nasty, let's pretend we don't
have to eat do it
Stateful Stream Proc. to the rescue
41
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
of the application
Compute, State, and Storage
42
Classic tiered architecture Streaming architecture
database
layer
compute
layer
application state
+ backup
compute
+
stream storage
and
snapshot storage
(backup)
application state
Performance
43
synchronous reads/writes
across tier boundary
asynchronous writes
of large blobs
all modifications
are local
Classic tiered architecture Streaming architecture
Consistency
44
distributed transactions
at scale typically
at-most / at-least once
exactly once
per state =1 =1
Classic tiered architecture Streaming architecture
Scaling a Service
45
separately provision additional
database capacity
provision compute
and state together
Classic tiered architecture Streaming architecture
provision
compute
Rolling out a new Service
46
provision a new database
(or add capacity to an existing one)
simply occupies some
additional backup space
Classic tiered architecture Streaming architecture
provision compute
and state together
Time, Completeness, Out-of-order
47
?
event time clocks
define data completeness
event time timers
handle actions for
out-of-order data
Classic tiered architecture Streaming architecture
Stateful Stream Processing
48
Application
Sensor
APIs
Application
Application
Application
very simple: state is just part
of the application
The Challenges with that:
 Upgrades are stateful, need consistency
• application evolution and bug fixes
 Migration of application state
• cluster migration, A/B testing
 Re-processing and reinstatement
• fix corrupt results, bootstrap new applications
 State evolution (schema evolution)
49
50
Consistent Distributed
Snapshots
The answer
(my personal and obviously biased take)
51
Payments Dashboard
Demo Time!
52
Thank you very much 
(shameless plug)
We are hiring!
data-artisans.com/careers
Appendix
54
55
Details about Snapshots
and Transactional
Side Effects
Exactly-once via Transactions
56
chk-1 chk-2
TXN-1
✔chk-1 ✔chk-2
TXN-2
✘
TXN-3
Side effect
✔ global ✔ global
Transaction fails after local snapshot
57
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global
Side effect
Transaction fails before commit…
58
chk-1 chk-2
TXN-1
✔chk-1
TXN-2
✘
TXN-3
✔ global ✔ global
Side effect
… commit on recovery
59
chk-2
TXN-2 TXN-3
✔ global
recover
TXN handle
chk-3
Side effect
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
distributed-stream-processing-flink

More Related Content

What's hot (20)

PPTX
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
PDF
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward
 
PPTX
The End of a Myth: Ultra-Scalable Transactional Management
Ricardo Jimenez-Peris
 
PPTX
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
PPTX
Apache flink 1.7 and Beyond
Till Rohrmann
 
PDF
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
PPTX
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward
 
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
PDF
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
PDF
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward
 
PDF
Stateful Distributed Stream Processing
Gyula Fóra
 
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
PDF
ksqlDB: Building Consciousness on Real Time Events
confluent
 
PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
PPTX
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
PDF
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
PPTX
Data Stream Processing with Apache Flink
Fabian Hueske
 
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
PDF
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Till Rohrmann
 
Flink Forward San Francisco 2019: Moving from Lambda and Kappa Architectures ...
Flink Forward
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Flink Forward San Francisco 2019: The Trade Desk's Year in Flink - Jonathan ...
Flink Forward
 
The End of a Myth: Ultra-Scalable Transactional Management
Ricardo Jimenez-Peris
 
From Apache Flink® 1.3 to 1.4
Till Rohrmann
 
Apache flink 1.7 and Beyond
Till Rohrmann
 
Flink Forward San Francisco 2018: Andrew Torson - "Extending Flink metrics: R...
Flink Forward
 
Flink Forward Berlin 2017: Patrick Gunia - Migration of a realtime stats prod...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Flink Forward Berlin 2017: Mihail Vieru - A Materialization Engine for Data I...
Flink Forward
 
Flink Forward San Francisco 2019: Developing and operating real-time applicat...
Flink Forward
 
Stateful Distributed Stream Processing
Gyula Fóra
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
ksqlDB: Building Consciousness on Real Time Events
confluent
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
Stephan Ewen
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward
 
Data Stream Processing with Apache Flink
Fabian Hueske
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Till Rohrmann
 

Similar to The Power of Distributed Snapshots in Apache Flink (20)

PDF
Building Applications with Streams and Snapshots
J On The Beach
 
PPTX
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
PPTX
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PDF
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
PPTX
Counting Elements in Streams
Jamie Grier
 
PPTX
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
PDF
Streaming analytics state of the art
Stavros Kontopoulos
 
PPTX
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
PPTX
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
PPTX
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PPTX
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PDF
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
PPTX
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
PPTX
Flink history, roadmap and vision
Stephan Ewen
 
PPTX
Stream processing - Apache flink
Renato Guimaraes
 
PDF
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
Building Applications with Streams and Snapshots
J On The Beach
 
Flink Forward SF 2017: Stephan Ewen - Convergence of real-time analytics and ...
Flink Forward
 
The Stream Processor as a Database Apache Flink
DataWorks Summit/Hadoop Summit
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Apache Flink: Better, Faster & Uncut - Piotr Nowojski, data Artisans
Evention
 
Counting Elements in Streams
Jamie Grier
 
Apache Flink at Strata San Jose 2016
Kostas Tzoumas
 
Streaming analytics state of the art
Stavros Kontopoulos
 
QCon London - Stream Processing with Apache Flink
Robert Metzger
 
January 2016 Flink Community Update & Roadmap 2016
Robert Metzger
 
Streaming in the Wild with Apache Flink
Kostas Tzoumas
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Apache Flink Taiwan User Group
 
GOTO Night Amsterdam - Stream processing with Apache Flink
Robert Metzger
 
Flink history, roadmap and vision
Stephan Ewen
 
Stream processing - Apache flink
Renato Guimaraes
 
It's Time To Stop Using Lambda Architecture | Yaroslav Tkachenko, Shopify
HostedbyConfluent
 
Ad

More from C4Media (20)

PDF
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
C4Media
 
PDF
Next Generation Client APIs in Envoy Mobile
C4Media
 
PDF
Software Teams and Teamwork Trends Report Q1 2020
C4Media
 
PDF
Understand the Trade-offs Using Compilers for Java Applications
C4Media
 
PDF
Kafka Needs No Keeper
C4Media
 
PDF
High Performing Teams Act Like Owners
C4Media
 
PDF
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
C4Media
 
PDF
Service Meshes- The Ultimate Guide
C4Media
 
PDF
Shifting Left with Cloud Native CI/CD
C4Media
 
PDF
CI/CD for Machine Learning
C4Media
 
PDF
Fault Tolerance at Speed
C4Media
 
PDF
Architectures That Scale Deep - Regaining Control in Deep Systems
C4Media
 
PDF
ML in the Browser: Interactive Experiences with Tensorflow.js
C4Media
 
PDF
Build Your Own WebAssembly Compiler
C4Media
 
PDF
User & Device Identity for Microservices @ Netflix Scale
C4Media
 
PDF
Scaling Patterns for Netflix's Edge
C4Media
 
PDF
Make Your Electron App Feel at Home Everywhere
C4Media
 
PDF
The Talk You've Been Await-ing For
C4Media
 
PDF
Future of Data Engineering
C4Media
 
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
C4Media
 
Next Generation Client APIs in Envoy Mobile
C4Media
 
Software Teams and Teamwork Trends Report Q1 2020
C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
C4Media
 
Kafka Needs No Keeper
C4Media
 
High Performing Teams Act Like Owners
C4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
C4Media
 
Service Meshes- The Ultimate Guide
C4Media
 
Shifting Left with Cloud Native CI/CD
C4Media
 
CI/CD for Machine Learning
C4Media
 
Fault Tolerance at Speed
C4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
C4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
C4Media
 
Build Your Own WebAssembly Compiler
C4Media
 
User & Device Identity for Microservices @ Netflix Scale
C4Media
 
Scaling Patterns for Netflix's Edge
C4Media
 
Make Your Electron App Feel at Home Everywhere
C4Media
 
The Talk You've Been Await-ing For
C4Media
 
Future of Data Engineering
C4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
Ad

Recently uploaded (20)

PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 

The Power of Distributed Snapshots in Apache Flink

  • 1. The Power of Snapshots Stateful Stream Processing with Apache Flink Stephan Ewen QCon San Francisco, 2017 1
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ distributed-stream-processing-flink
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 4. 2 Original creators of Apache Flink® dA Platform 2 Open Source Apache Flink + dA Application Manager
  • 6. What changes faster? Data or Query? 4 Data changes slowly compared to fast changing queries ad-hoc queries, data exploration, ML training and (hyper) parameter tuning Batch Processing Use Case Data changes fast application logic is long-lived continuous applications, data pipelines, standing queries, anomaly detection, ML evaluation, … Stream Processing Use Case
  • 10. Moving State into the Processors 8 Application External DBstate Stateless Stream Processor Stateful Stream Processor Application state
  • 12. Apache Flink in a Nutshell 10 Queries Applications Devices etc. Database Stream File / Object Storage Stateful computations over streams real-time and historic fast, scalable, fault tolerant, in-memory, event time, large state, exactly-once Historic Data Streams Application
  • 13. 11 Event Streams State (Event) Time Snapshots The Core Building Blocks real-time and hindsight complex business logic consistency with out-of-order data and late data forking / versioning / time-travel
  • 14. Stateful Event & Stream Processing 12 Source Transformation Transformation Sink val lines: DataStream[String] = env.addSource(new FlinkKafkaConsumer09(…)) val events: DataStream[Event] = lines.map((line) => parse(line)) val stats: DataStream[Statistic] = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum(new MyAggregationFunction()) stats.addSink(new RollingSink(path)) Streaming Dataflow Source Transform Window (state read/write) Sink
  • 15. Stateful Event & Stream Processing 13 Scalable embedded state Access at memory speed & scales with parallel operators
  • 16. Event time and Processing Time 14 Event Producer Message Queue Flink Data Source Flink Window Operator partition 1 partition 2 Event Time Ingestion Time Processing Time Broker Time Event time, Watermarks, as in the Dataflow model
  • 17. Powerful Abstractions 15 Process Function (events, state, time) DataStream API (streams, windows) Stream SQL / Tables (dynamic tables) Stream- & Batch Data Processing High-level Analytics API Stateful Event- Driven Applications val stats = stream .keyBy("sensor") .timeWindow(Time.seconds(5)) .sum((a, b) -> a.add(b)) def processElement(event: MyEvent, ctx: Context, out: Collector[Result]) = { // work with event and state (event, state.value) match { … } out.collect(…) // emit events state.update(…) // modify state // schedule a timer callback ctx.timerService.registerEventTimeTimer(event.timestamp + 500) } Layered abstractions to navigate simple to complex use cases
  • 19. Event Sourcing + Memory Image 17 event log persists events (temporarily) event / command Process main memory update local variables/structures periodically snapshot the memory
  • 20. Event Sourcing + Memory Image 18 Recovery: Restore snapshot and replay events since snapshot event log persists events (temporarily) Process
  • 21. Consistent Distributed Snapshots 19 Scalable embedded state Access at memory speed & scales with parallel operators
  • 23. Consistent Distributed Snapshots 21 Trigger checkpoint Inject checkpoint barrier
  • 24. Consistent Distributed Snapshots 22 Take state snapshot Trigger state copy-on-write
  • 25. Consistent Distributed Snapshots 23 Persist state snapshots Persist snapshots asynchronously Processing pipeline continues
  • 26. Consistent Distributed Snapshots 25 Re-load state Reset positions in input streams Rolling back computation Re-processing
  • 29. Speed or Operability? 28 Fast snapshots Checkpoint Flexible Operations on Snapshots Savepoint What to optimize for?
  • 30. Savepoints: Opt. for Operability  Self contained: No references to other checkpoints  Canonical format: Switch between state structures  Efficiently re-scalable: Indexed by key group  Future: More self-describing serialization format for to archiving / versioning (like Avro, Thrift, etc.) 29
  • 31. Checkpoints: Opt. for Efficiency  Often incremental: • Snapshot only diff from last snapshot • Reference older snapshots, compaction over time  Format specific to state backend: • No extra copied or re-encoding • Not possible to switch to another state backend between checkpoints  Compact serialization: Optimized for speed/space, not long term archival and evolution  Key goups not indexed: Re-distribution may be more expensive 30
  • 32. 31 What else are snapshots / checkpoints good for?
  • 33. What users built on checkpoints  Upgrades and Rollbacks  Cross Datacenter Failover  State Archiving  State Bootstrapping  Application Migration  Spot Instance Region Arbitrage  A/B testing  … 32
  • 35. Transaction coordination for side fx 34 One snapshot can transactionally move data between different systems Snapshots may include side effects
  • 36. Transaction coordination for side fx  Similar to a distributed 2-phase commit  Coordinated by asynchronous checkpoints, no voting delays  Basic algorithm: • Between checkpoints: Produce into transaction or Write Ahead Log • On operator snapshot: Flush local transaction (vote-to-commit) • On checkpoint complete: Commit transactions (commit) • On recovery: check and commit any pending transactions 35
  • 37. 36 Distributed Snapshots and Application Architectures (A Philosophical Monologue)
  • 38. Good old centralized architecture 37 The big mean central database $$$ The grumpy DBA Application Application Application Application
  • 39. Stateful Stream Proc. & Applications 38 Application Application Application Application Application decentralized infrastructure DevOps decentralized responsibilities still involves managing databases
  • 40. Stateless Application Containers 39 State management is nasty, let's pretend we don't have to do it
  • 41. Stateless Application Containers 40 Kudos to Kiki Carter for the Broccoli Metaphor Broccoli (state management) is nasty, let's pretend we don't have to eat do it
  • 42. Stateful Stream Proc. to the rescue 41 Application Sensor APIs Application Application Application very simple: state is just part of the application
  • 43. Compute, State, and Storage 42 Classic tiered architecture Streaming architecture database layer compute layer application state + backup compute + stream storage and snapshot storage (backup) application state
  • 44. Performance 43 synchronous reads/writes across tier boundary asynchronous writes of large blobs all modifications are local Classic tiered architecture Streaming architecture
  • 45. Consistency 44 distributed transactions at scale typically at-most / at-least once exactly once per state =1 =1 Classic tiered architecture Streaming architecture
  • 46. Scaling a Service 45 separately provision additional database capacity provision compute and state together Classic tiered architecture Streaming architecture provision compute
  • 47. Rolling out a new Service 46 provision a new database (or add capacity to an existing one) simply occupies some additional backup space Classic tiered architecture Streaming architecture provision compute and state together
  • 48. Time, Completeness, Out-of-order 47 ? event time clocks define data completeness event time timers handle actions for out-of-order data Classic tiered architecture Streaming architecture
  • 50. The Challenges with that:  Upgrades are stateful, need consistency • application evolution and bug fixes  Migration of application state • cluster migration, A/B testing  Re-processing and reinstatement • fix corrupt results, bootstrap new applications  State evolution (schema evolution) 49
  • 51. 50 Consistent Distributed Snapshots The answer (my personal and obviously biased take)
  • 53. 52 Thank you very much  (shameless plug)
  • 56. 55 Details about Snapshots and Transactional Side Effects
  • 57. Exactly-once via Transactions 56 chk-1 chk-2 TXN-1 ✔chk-1 ✔chk-2 TXN-2 ✘ TXN-3 Side effect ✔ global ✔ global
  • 58. Transaction fails after local snapshot 57 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global Side effect
  • 59. Transaction fails before commit… 58 chk-1 chk-2 TXN-1 ✔chk-1 TXN-2 ✘ TXN-3 ✔ global ✔ global Side effect
  • 60. … commit on recovery 59 chk-2 TXN-2 TXN-3 ✔ global recover TXN handle chk-3 Side effect
  • 61. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ distributed-stream-processing-flink