SlideShare a Scribd company logo
Till Rohrmann
trohrmann@apache.org
@stsffap
Fabian Hueske
fhueske@apache.org
@fhueske
Streaming Analytics & CEP
Two sides of the same coin?
Streams are Everywhere
 Most data is continuously produced as stream
 Processing data as it arrives
is becoming very popular
 Many diverse applications
and use cases
2
Complex Event Processing
 Analyzing a stream of events and drawing conclusions
• Detect patterns and assemble new events
 Applications
• Network intrusion
• Process monitoring
• Algorithmic trading
 Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
3
Batch Analytics
4
 The batch approach to data analytics
Streaming Analytics
 Online aggregation of streams
• No delay – Continuous results
 Stream analytics subsumes batch analytics
• Batch is a finite stream
 Demanding requirements on stream processor
• High throughput
• Exactly-once semantics
• Event-time & advanced window support
5
Apache Flink®
 Platform for scalable stream processing
 Meets requirements of CEP and stream analytics
• Low latency and high throughput
• Exactly-once semantics
• Event-time & advanced windowing
 Core DataStream API available for Java & Scala
6
This Talk is About
 Flink’s new APIs for CEP and Stream Analytics
• DSL to define CEP patterns and actions
• Stream SQL to define queries on streams
 Integration of CEP and Stream SQL
 Early stage  Work in progress
7
Tracking an Order Process
Use Case
8
Order Fulfillment Scenario
9
Order Events
 Process is reflected in a stream of order events
 Order(orderId, tStamp, “received”)
 Shipment(orderId, tStamp, “shipped”)
 Delivery(orderId, tStamp, “delivered”)
 orderId: Identifies the order
 tStamp: Time at which the event happened
10
Aggregating Massive Streams
Stream Analytics
11
Stream Analytics
 Traditional batch analytics
• Repeated queries on finite and changing data sets
• Queries join and aggregate large data sets
 Stream analytics
• “Standing” query produces continuous results
from infinite input stream
• Query computes aggregates on high-volume streams
 How to compute aggregates on infinite streams?
12
Compute Aggregates on Streams
 Split infinite stream into finite “windows”
• Split usually by time
 Tumbling windows
• Fixed size & consecutive
 Sliding windows
• Fixed size & may overlap
 Event time mandatory for correct & consistent results!
13
Example: Count Orders by Hour
14
Example: Count Orders by Hour
15
SELECT STREAM
TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour,
COUNT(*) AS cnt
FROM events
WHERE
status = ‘received’
GROUP BY
TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
Stream SQL Architecture
 Flink features SQL on static
and streaming tables
 Parsing and optimization by
Apache Calcite
 SQL queries are translated
into native Flink programs
16
Pattern Matching on Streams
Complex Event Processing
17
Real-time Warnings
18
CEP to the Rescue
 Define processing and delivery intervals (SLAs)
 ProcessSucc(orderId, tStamp, duration)
 ProcessWarn(orderId, tStamp)
 DeliverySucc(orderId, tStamp, duration)
 DeliveryWarn(orderId, tStamp)
 orderId: Identifies the order
 tStamp: Time when the event happened
 duration: Duration of the processing/delivery
19
CEP Example
20
Processing: Order  Shipment
21
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
… and both at the same time!
Integrated Stream Analytics with CEP
22
Count Delayed Shipments
23
Compute Avg Processing Time
24
CEP + Stream SQL
25
// complex event processing result
val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption)
val deliveryWarningTable: Table = delWarn.toTable(tableEnv)
tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable)
// calculate the delayed deliveries per day
val delayedDeliveriesPerDay = tableEnv.sql(
"""SELECT STREAM
| TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day,
| COUNT(*) AS cnt
|FROM deliveryWarnings
|GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
CEP-enriched Stream SQL
26
SELECT
TUMBLE_START(tStamp, INTERVAL '1' DAY) as day,
AVG(duration) as avgDuration
FROM (
// CEP pattern
SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp
FROM inputs
PATTERN
a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp
WITHIN INTERVAL '1’ HOUR
WHERE
a.status = ‘received’ AND b.status = ‘shipped’
)
GROUP BY
TUMBLE(tStamp, INTERVAL '1’ DAY)
Conclusion
 Apache Flink handles CEP and analytical
workloads
 Apache Flink offers intuitive APIs
 New class of applications by CEP and
Stream SQL integration 
27

More Related Content

What's hot (20)

PDF
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
PPTX
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
PPTX
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Ververica
 
PPTX
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
PPTX
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
PPTX
Flink Streaming @BudapestData
Gyula Fóra
 
PDF
Big Data Warsaw
Maximilian Michels
 
PPTX
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
PPTX
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
PPTX
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Ververica
 
PDF
Stateful Distributed Stream Processing
Gyula Fóra
 
PPTX
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
PDF
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
PDF
A look at Flink 1.2
Stefan Richter
 
PPTX
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
PPTX
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
PDF
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
PPTX
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
PPTX
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Ververica
 
Continuous Processing with Apache Flink - Strata London 2016
Stephan Ewen
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Ververica
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Flink Streaming @BudapestData
Gyula Fóra
 
Big Data Warsaw
Maximilian Michels
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
Ververica
 
Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink'...
Ververica
 
Stateful Distributed Stream Processing
Gyula Fóra
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Robert Metzger
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
A look at Flink 1.2
Stefan Richter
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Ververica
 
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Flink Forward
 
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Flink Forward Berlin 2017: Jörg Schad, Till Rohrmann - Apache Flink meets Apa...
Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Ververica
 
Debunking Six Common Myths in Stream Processing
Kostas Tzoumas
 

Viewers also liked (20)

PDF
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
PPTX
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
PPTX
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
PPTX
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
PPTX
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
PPTX
Ted Dunning-Faster and Furiouser- Flink Drift
Flink Forward
 
PDF
Julian Hyde - Streaming SQL
Flink Forward
 
PPTX
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
PDF
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Flink Forward
 
PDF
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
PPTX
Stephan Ewen - Scaling to large State
Flink Forward
 
PDF
Márton Balassi Streaming ML with Flink-
Flink Forward
 
PPTX
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
PDF
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Flink Forward
 
PDF
Automatic Detection of Web Trackers by Vasia Kalavri
Flink Forward
 
PDF
Alexander Kolb - Flinkspector – Taming the squirrel
Flink Forward
 
PDF
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Flink Forward
 
PDF
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
PPTX
Eron Wright - Introducing Flink on Mesos
Flink Forward
 
PPTX
Ted Dunning - Keynote: How Can We Take Flink Forward?
Flink Forward
 
Jamie Grier - Robust Stream Processing with Apache Flink
Flink Forward
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Flink Forward
 
Aljoscha Krettek - The Future of Apache Flink
Flink Forward
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Flink Forward
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger
 
Ted Dunning-Faster and Furiouser- Flink Drift
Flink Forward
 
Julian Hyde - Streaming SQL
Flink Forward
 
Stephan Ewen - Running Flink Everywhere
Flink Forward
 
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Flink Forward
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Flink Forward
 
Stephan Ewen - Scaling to large State
Flink Forward
 
Márton Balassi Streaming ML with Flink-
Flink Forward
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Ververica
 
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Flink Forward
 
Automatic Detection of Web Trackers by Vasia Kalavri
Flink Forward
 
Alexander Kolb - Flinkspector – Taming the squirrel
Flink Forward
 
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Flink Forward
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Eron Wright - Introducing Flink on Mesos
Flink Forward
 
Ted Dunning - Keynote: How Can We Take Flink Forward?
Flink Forward
 
Ad

Similar to Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL and CEP (20)

PPTX
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
PPTX
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
PDF
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PDF
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
PDF
Streaming SQL
Julian Hyde
 
PDF
Streaming SQL
Julian Hyde
 
PPTX
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward
 
PDF
WSO2 Complex Event Processor
Sriskandarajah Suhothayan
 
PDF
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PDF
Stream Processing with Ballerina
Ballerina
 
PDF
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko
 
PPTX
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
PDF
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
PDF
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Sriskandarajah Suhothayan
 
PDF
Strtio Spark Streaming + Siddhi CEP Engine
Myung Ho Yun
 
PPTX
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
PDF
Anton Moldovan "Load testing which you always wanted"
Fwdays
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
DataWorks Summit/Hadoop Summit
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Stephan Ewen
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
Flink Forward
 
Streaming SQL
Julian Hyde
 
Streaming SQL
Julian Hyde
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward
 
WSO2 Complex Event Processor
Sriskandarajah Suhothayan
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Stream Processing with Ballerina
Ballerina
 
Kafka Streams: the easiest way to start with stream processing
Yaroslav Tkachenko
 
Cloud Dataflow - A Unified Model for Batch and Streaming Data Processing
DoiT International
 
Introduction to Data streaming - 05/12/2014
Raja Chiky
 
Scalable Event Processing with WSO2CEP @ WSO2Con2015eu
Sriskandarajah Suhothayan
 
Strtio Spark Streaming + Siddhi CEP Engine
Myung Ho Yun
 
Microsoft SQL Server - StreamInsight Overview Presentation
Microsoft Private Cloud
 
Anton Moldovan "Load testing which you always wanted"
Fwdays
 
Ad

More from Flink Forward (20)

PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
PPTX
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
PPTX
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
PDF
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
PDF
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
PPTX
Autoscaling Flink with Reactive Mode
Flink Forward
 
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
PPTX
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
PDF
Flink powered stream processing platform at Pinterest
Flink Forward
 
PPTX
Apache Flink in the Cloud-Native Era
Flink Forward
 
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
PPTX
The Current State of Table API in 2022
Flink Forward
 
PDF
Flink SQL on Pulsar made easy
Flink Forward
 
PPTX
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
PPTX
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
PDF
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
PDF
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Autoscaling Flink with Reactive Mode
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Flink Forward
 
Flink powered stream processing platform at Pinterest
Flink Forward
 
Apache Flink in the Cloud-Native Era
Flink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 

Recently uploaded (20)

PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Helsinki & North 2025 - API access control strategies beyond JWT bear...
apidays
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
deep dive data management sharepoint apps.ppt
novaprofk
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL and CEP

  • 2. Streams are Everywhere  Most data is continuously produced as stream  Processing data as it arrives is becoming very popular  Many diverse applications and use cases 2
  • 3. Complex Event Processing  Analyzing a stream of events and drawing conclusions • Detect patterns and assemble new events  Applications • Network intrusion • Process monitoring • Algorithmic trading  Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support 3
  • 4. Batch Analytics 4  The batch approach to data analytics
  • 5. Streaming Analytics  Online aggregation of streams • No delay – Continuous results  Stream analytics subsumes batch analytics • Batch is a finite stream  Demanding requirements on stream processor • High throughput • Exactly-once semantics • Event-time & advanced window support 5
  • 6. Apache Flink®  Platform for scalable stream processing  Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time & advanced windowing  Core DataStream API available for Java & Scala 6
  • 7. This Talk is About  Flink’s new APIs for CEP and Stream Analytics • DSL to define CEP patterns and actions • Stream SQL to define queries on streams  Integration of CEP and Stream SQL  Early stage  Work in progress 7
  • 8. Tracking an Order Process Use Case 8
  • 10. Order Events  Process is reflected in a stream of order events  Order(orderId, tStamp, “received”)  Shipment(orderId, tStamp, “shipped”)  Delivery(orderId, tStamp, “delivered”)  orderId: Identifies the order  tStamp: Time at which the event happened 10
  • 12. Stream Analytics  Traditional batch analytics • Repeated queries on finite and changing data sets • Queries join and aggregate large data sets  Stream analytics • “Standing” query produces continuous results from infinite input stream • Query computes aggregates on high-volume streams  How to compute aggregates on infinite streams? 12
  • 13. Compute Aggregates on Streams  Split infinite stream into finite “windows” • Split usually by time  Tumbling windows • Fixed size & consecutive  Sliding windows • Fixed size & may overlap  Event time mandatory for correct & consistent results! 13
  • 14. Example: Count Orders by Hour 14
  • 15. Example: Count Orders by Hour 15 SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  • 16. Stream SQL Architecture  Flink features SQL on static and streaming tables  Parsing and optimization by Apache Calcite  SQL queries are translated into native Flink programs 16
  • 17. Pattern Matching on Streams Complex Event Processing 17
  • 19. CEP to the Rescue  Define processing and delivery intervals (SLAs)  ProcessSucc(orderId, tStamp, duration)  ProcessWarn(orderId, tStamp)  DeliverySucc(orderId, tStamp, duration)  DeliveryWarn(orderId, tStamp)  orderId: Identifies the order  tStamp: Time when the event happened  duration: Duration of the processing/delivery 19
  • 21. Processing: Order  Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  • 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  • 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT STREAM | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  • 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT (b.tStamp - a.tStamp) as duration, b.tStamp as tStamp FROM inputs PATTERN a FOLLOW BY b PARTITION BY orderId ORDER BY tStamp WITHIN INTERVAL '1’ HOUR WHERE a.status = ‘received’ AND b.status = ‘shipped’ ) GROUP BY TUMBLE(tStamp, INTERVAL '1’ DAY)
  • 27. Conclusion  Apache Flink handles CEP and analytical workloads  Apache Flink offers intuitive APIs  New class of applications by CEP and Stream SQL integration  27