The Past, Present, and Future of Apache Flink

Aljoscha Krettek, engineering manager & co-founder
Till Rohrmann, engineering manager & co-founder
The Past, Present, and
Future of Apache Flink®

© 2018 data Artisans3
It all started in 2014
2009 - 2014 since 2014
● Batch processor on top of streaming runtime
● First Apache Flink 0.6.0 release August 2014

August 2014
Batch processing

Flink learns to stream in real time
DataStream API
Stream Processing
DataSet API
Batch Processing
Runtime
Distributed Streaming Data Flow

● Continuous & real-time
November 2014
Batch processing Stream processing

Flink learns to remember

Remember where we left off

● Stateful & exactly once
June 2015

Latency vs. Throughput?
high latency
low high throughput
Prevailing
belief
● 10s of millions of events/s
● Latency down to 1 ms
≠

Flink becomes event-time aware
Episode
IV
Episode
V
Episode
VI
Episode
I
Episode
II
Episode
III
Episode
VII
Episode
VIII
Processing
time
1977 1980 1983 1999 2002 2005 2015 2017

Episode
I
Episode
II
Episode
III
Episode
IV
Episode
V
Episode
VI
Episode
VII
Episode
VIII
Processing time
1999 2002 2005 1977 1980 1983 2015 2017
Processing time
Event time

● High throughput & low
latency
● Event time
November 2015

More than just analytics: ProcessFunction
class MyFunction extends ProcessFunction[MyEvent, Result] {
// declare state to use in the program
lazy val state: ValueState[CountWithTimestamp] = getRuntimeContext().getState(…)
def processElement(event: MyEvent, ctx: Context, out: Collector[Result]): Unit = {
// work with event and state and schedule timers
}
def onTimer(timestamp: Long, ctx: OnTimerContext, out: Collector[Result]): Unit = {
// handle callback when event-/processing- time instant is reached
}
}
● ProcessFunction gives access to state, time
and events
● Low level API
● Enables data-driven applications

latency
● Event time
February 2017
Data-driven
applications

Present & Future

Hardening
Faster network stack
Application level flow control
Resolving dependency hell
Present in a nutshell
Scaling
Incremental snapshots
Local recovery
Scalable timers
Interoperability
Resource elasticity
REST client-server interface
Container entrypoint
Stream SQL
SQL client
User-defined functions
More powerful joins
Misc
State TTL
Broadcast state
Kafka exactly-once producer

Large, larger, Flink
Time
State
Incremental
snapshots
● Snapshot only state diff
● Incremental snapshots allow to handle very large state

Faster failover is always better

Varying workloads
• Violating SLAs vs. wasting money
• Varying workloads require to adapt resources

Revamped distributed architecture
ResourceManager ClusterManager
TaskManagerJobManager
Dispatcher
Client
1. Submit job 2. Start job
3. Request slots
4. Allocate resources
5. Start TaskManager
6. Execute job
● Support for full resource elasticity
● Application parallelism can be dynamically changed

latency
● Event time
● Applications as first
class citizens
Present & Future
Data-driven
applications

Flink as a library (and still as a framework)
• Deploying Flink applications should be as easy as starting a process
• Bundle application code and Flink into a single image
• Process connects to other application processes and figures out its role
• Removing the cluster out of the equation
P2 P3
P1
P4
New process

How much control do I need?
Batch
processing
Continuous
processing
Real-time &
data-driven
applications
● Multiple short lived stages
● Different resource requirements
per stage
● Efficient execution requires
control over resources
● Flink allocates actively resources
● Continuously processing operators
● Constrained by external systems,
SLAs and application logic
● External system can assign
resources
● Flink reacts to available resources

Active vs. reactive mode
• Active mode
‒ Flink is aware of underlying cluster framework
‒ Flink allocate resources
‒ E.g. existing YARN and Mesos integration
• Reactive mode
‒ Flink is oblivious to its runtime environment
‒ External system allocates and releases resources
‒ Flink scales with respect to available resources
‒ Relevant for environments: Kubernetes, Docker, as a
library

Scaling automatically
• Latency
• Throughput
• Resource utilization
• Connector signals

How we create Flink Jobs
Flink APIs
Stream/Batch Processing
Runtime
Java/Scala

Flink SQL
Flink APIs
Stream/Batch Processing
Runtime
Java/Scala SQL
“NO CODING REQUIRED”
Source/Sink
definition in YAML
Configuration in
YAML
SQL commandline
User-defined
functions
Streaming and
Batch
Event time and
processing time
*since Flink 0.9.0 (June 2015)

“Join” me for some trading
buy buy sell buy
Join
$ 17
£ 42
12.5
buy sell
₪

Introducing Time-versioned Table Joins
buy buy sell buy
Join
buy sell
curr rate time
£ 42 3
£ 12 17
1453
31753
14
event time

SQL for pattern analysis?
SELECT * from ?

Introducing MATCH_RECOGNIZE
SELECT *
FROM TaxiRides
MATCH_RECOGNIZE (
PARTITION BY driverId
ORDER BY rideTime
MEASURES
S.rideId as sRideId
AFTER MATCH SKIP PAST LAST ROW
PATTERN (S M{2,} E)
DEFINE
S AS S.isStart = true,
M AS M.rideId <> S.rideId,
E AS E.isStart = false
AND E.rideId = S.rideId
)

Todays processing landscape
Streaming Batch

Batch/streaming unification

Into the Future
Big state
SQL

Thank
s!

The Past, Present, and Future of Apache Flink

More Related Content

What's hot (20)

Similar to The Past, Present, and Future of Apache Flink (20)

More from Aljoscha Krettek (12)

Recently uploaded (20)

The Past, Present, and Future of Apache Flink