SlideShare a Scribd company logo
Scalable Realtime
Analytics with
declarative, SQL like,
Complex Event
Processing Scripts
Srinath Perera
Director, Research WSO2
Apache Member
(@srinath_perera)
srinath@wso2.com
(Batch) Analytics
Scientists are doing this for 25 year with
MPI (1991) on special Hardware
Took off with Google’s MapReduce
paper (2004), Apache Hadoop, Hive and
whole eco system created.
It was successful, So we are here!!
But, processing takes time.
Value of Some Insights degrade Fast!
For some usecases ( e.g. stock markets, traffic, surveillance, patient
monitoring) the value of insights degrade very quickly with time.
- E.g. stock markets and speed of light
We need technology that can produce
outputs fast
- Static Queries, but need very fast output
(Alerts, Realtime control)
- Dynamic and Interactive Queries ( Data
exploration)
History
Realtime Analytics are not new either!!
- Active Databases (2000+)
- Stream processing (Aurora, Borealis (2005+)
and later Storm)
- Distributed Streaming Operators (e.g.
Database research topic around 2005)
- CEP vendor roadmap ( from
http://www.complexevents.com/2014/12/03/cep-
tooling-market-survey-2014/)
Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts
Realtime AnalyticsTools
I. Stream Processing
Program a set of processors and wire them up, data flows though
the graph.
A middleware framework handles data flow, distribution, and fault
tolerance (e.g. Apache Storm, Samza)
Processors may be in the same machine or multiple machines
II. Complex Event Processing
III. Micro Batch
Process data in small batches, and
then combine results for final results
(e.g. Spark)
Works for simple aggregates, but
tricky to do this for complex
operations (e.g. Event Sequences)
Can do it with MapReduce as well if
the deadlines are not too tight.
IV. OLAP Style In Memory Computing
Usually done to support interactive
queries
Index data to make them them
readily accessible so you can respond
to queries fast. (e.g. Apache Drill)
Tools like Druid, VoltDB and SAP
Hana can do this with all data in
memory to make things really fast.
Realtime Analytics Patterns
Simple counting (e.g. failure count)
Counting with Windows ( e.g. failure count every hour)
Preprocessing: filtering, transformations (e.g. data cleanup)
Alerts , thresholds (e.g. Alarm on high temperature)
Data Correlation, Detect missing events, detecting erroneous data
(e.g. detecting failed sensors)
Joining event streams (e.g. detect a hit on soccer ball)
Merge with data in a database, collect, update data conditionally
Realtime Analytics Patterns (contd.)
Detecting Event Sequence Patterns (e.g. small transaction followed
by large transaction)
Tracking - follow some related entity’s state in space, time etc. (e.g.
location of airline baggage, vehicle, tracking wild life)
 Detect trends – Rise, turn, fall, Outliers, Complex trends like triple
bottom etc., (e.g. algorithmic trading, SLA, load balancing)
Learning a Model (e.g. Predictive maintenance)
Predicting next value and corrective actions (e.g. automated car)
Apache Hive
A SQL like data processing language
Since many understand SQL, Hive
made large scale data processing Big
Data accessible to many
Expressive, short, and sweet.
Define core operations that covers 90%
of problems
Lets experts dig in when they like!
(Batch Processing, Hive)
(Realtime Analytics, X)
What is X?
CEP = SQL for Realtime Analytics
Easy to follow from SQL
Expressive, short, and sweet.
Define core operations that covers 90% of
problems
Lets experts dig in when they like!
Lets look at the core operations.
Operators: Filters
Assume a temperature stream
Here weather:convertFtoC() is a
user defined function. They are
used to extend the language.
define stream TempStream (ts long, temp double);
from TempratureStream [weather:convertFtoC(temp) > 30.0)
and roomNo != 2043]
select roomNo, temp
insert into HotRoomsStream ;
Usecases:
- Alerts , thresholds (e.g. Alarm on
high temperature)
- Preprocessing: filtering,
transformations (e.g. data cleanup)
Operators:Windows and Aggregation
Support many window types
- Batch Windows, Sliding windows, Custom windows
Usecases
- Simple counting (e.g. failure count)
- Counting with Windows ( e.g. failure count every hour)
from TempratureStream#window.time(1 min)
select roomNo, avg(temp) as avgTemp
insert into HotRoomsStream ;
Operators: Patterns
Models a followed by relation: e.g.
event A followed by event B
Very powerful tool for tracking
and detecting patterns
from every (a1 = TempratureStream)
-> a2 = TempratureStream [temp > a1.temp + 5 ]
within 1 day
select a2.ts as ts, a2.temp – a1.temp as diff
insert into HotDayAlertStream;
Usecases
- Detecting Event Sequence Patterns
- Tracking
- Detect trends
Operators: Joins
Join two data streams based on a condition and windows
Usecases
- Data Correlation, Detect missing events, detecting erroneous data
- Joining event streams
from TempStream[temp > 30.0]#window.time(1 min) as T
join RegulatorStream[isOn == false]#window.length(1) as R on
T.roomNo == R.roomNo
select T.roomNo, R.deviceID, ‘start’ as action insert into
RegulatorActionStream
Operators:Access Data from the Disk
Event tables allow users to map a database to a window and join a
data stream with the window
Usecases
- Merge with data in a database, collect, update data conditionally
define stream TempStream (ts long, temp double);
define table HistTempTable(day long, avgT double);
from TempStream #window.length(1) join OldTempTable
on getDayOfYear(ts) == HistTempTable.day && ts > avgT
select ts, temp
insert into PurchaseUserStream ;
Revisit Patterns
Predictive Analytics
 Build models and use them with
WSO2 CEP, BAM and ESB using
upcoming WSO2 Machine Learner
Product ( 2015 Q2)
 Build model using R, export them as
PMML, and use within WSO2 CEP
 Call R Scripts from CEP queries
 Regression and Anomaly Detection
Operators in CEP
Case Study: Realtime Soccer Analysis
Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
TFLTraffic Analysis
Built using TFL
( Transport for
London) open data
feeds.
http://goo.gl/04tX6k
http://goo.gl/9xNiCm
Great, Does it Scale?
Idea 1: Network of CEP Nodes
For scaling, we arrange CEP
processing nodes in a graph like with
stream processing.
The Graph can be implemented
using an stream processing engine
like Apache Storm
Idea II: Compile SQL like Queries to a
Network of CEP Nodes
from TempStream[temp > 33]
insert into HighTempStream;
from HighTempStream#window(1h)
select max(temp)as max
insert into HourlyMaxTempStream;

How do We partition the Data to scale
up the Analysis?
Lets follow MapReduce
Map Reduce does not scale itself, it asks users to break
the problem to many small independent problems.
Idea III: Let the Users specify Parallelism
Language include parallel constructs:
partitions, pipelines, distributed
operators
Assign each partition to a different
node, and partition the data accordingly
define partition on TempStream.region {
from TempStream[temp > 33]
insert into HighTempStream;
}
from HighTempStream#window(1h)
select max(temp)as max
insert into HourlyMaxTempStream;
Handling Ordering
When the data processed in
parallel, output might be generated
out of order.
Due to lack of a global time, we
cannot trigger windows and other
time sensitive constructs
Solution: the current time needs to
be propagated though the graph
Putting EverythingTogether
WSO2 CEP & Big Data Platform
CEP = SQL for Realtime Analytics
Easy to follow from SQL
Expressive, short, sweet and fast!!
Define core operations that covers 90% of
problems
Lets experts dig in when they like!
And it Scales!!
Questions?
Visit us at Booth 1025http://wso2.com/landing/strata-
hadoop-world-ca-2015/

More Related Content

What's hot (20)

PDF
So you think you can stream.pptx
Prakash Chockalingam
 
PDF
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
PPTX
Need for Time series Database
Pramit Choudhary
 
PPTX
Apache Beam (incubating)
Apache Apex
 
PDF
Reactive mistakes reactive nyc
Petr Zapletal
 
PDF
Spark streaming: Best Practices
Prakash Chockalingam
 
PPTX
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
PPTX
Databricks clusters in autopilot mode
Prakash Chockalingam
 
PDF
Spark Streaming into context
David Martínez Rego
 
PDF
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
PDF
QConSF 2014 talk on Netflix Mantis, a stream processing system
Danny Yuan
 
PPTX
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
PDF
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
PPTX
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
PDF
Spark Summit EU talk by Qifan Pu
Spark Summit
 
PPTX
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
PDF
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
PDF
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
PDF
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
PPTX
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
So you think you can stream.pptx
Prakash Chockalingam
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
Need for Time series Database
Pramit Choudhary
 
Apache Beam (incubating)
Apache Apex
 
Reactive mistakes reactive nyc
Petr Zapletal
 
Spark streaming: Best Practices
Prakash Chockalingam
 
Apache Flink: API, runtime, and project roadmap
Kostas Tzoumas
 
Databricks clusters in autopilot mode
Prakash Chockalingam
 
Spark Streaming into context
David Martínez Rego
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Julian Hyde
 
QConSF 2014 talk on Netflix Mantis, a stream processing system
Danny Yuan
 
Always On: Building Highly Available Applications on Cassandra
Robbie Strickland
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Alexey Kharlamov
 
Predictive Maintenance with Deep Learning and Apache Flink
Dongwon Kim
 
Spark Summit EU talk by Qifan Pu
Spark Summit
 
Virtual Flink Forward 2020: Cogynt: Flink without code - Samantha Chan, Aslam...
Flink Forward
 
New Analytics Toolbox DevNexus 2015
Robbie Strickland
 
Cassandra as event sourced journal for big data analytics
Anirvan Chakraborty
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Databricks
 
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

PDF
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
PPT
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
PPTX
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
PPTX
Analyzing a Soccer Game with WSO2 CEP
Srinath Perera
 
PDF
fluent-plugin-norikra #fluentdcasual
SATOSHI TAGOMORI
 
DOC
Sql queires
MohitKumar1985
 
PPTX
Role of Analytics in Digital Business
Srinath Perera
 
PPTX
Complex Event Processing - A brief overview
István Dávid
 
PPTX
RethinkDB on Oracle Linux
Johan Louwers
 
PPTX
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Srinath Perera
 
PPT
Value stream analysis sample exercise
Jeremy Jay V. Lim, MBB, PMP
 
PDF
Best practice bi_design_bestpracticesv_1_5
rajibzzaman
 
PPTX
realtime- transaction Processing System
Rashmi Agale
 
PDF
Sql 99 and_some_techniques
Alexey Kiselyov
 
PDF
SQL Commands
Divyank Jindal
 
PPTX
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
VoltDB
 
PPT
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Tim Bass
 
PDF
Real time applications using the R Language
Lou Bajuk
 
PDF
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
DOC
Sql task answers
Nawaz Sk
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Introduction to Large Scale Data Analysis with WSO2 Analytics Platform
Srinath Perera
 
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
Analyzing a Soccer Game with WSO2 CEP
Srinath Perera
 
fluent-plugin-norikra #fluentdcasual
SATOSHI TAGOMORI
 
Sql queires
MohitKumar1985
 
Role of Analytics in Digital Business
Srinath Perera
 
Complex Event Processing - A brief overview
István Dávid
 
RethinkDB on Oracle Linux
Johan Louwers
 
Introduction to WSO2 Analytics Platform: 2016 Q2 Update
Srinath Perera
 
Value stream analysis sample exercise
Jeremy Jay V. Lim, MBB, PMP
 
Best practice bi_design_bestpracticesv_1_5
rajibzzaman
 
realtime- transaction Processing System
Rashmi Agale
 
Sql 99 and_some_techniques
Alexey Kiselyov
 
SQL Commands
Divyank Jindal
 
How to Build Real-Time Streaming Analytics with an In-memory, Scale-out SQL D...
VoltDB
 
Detecting Opportunities and Threats with Complex Event Processing: Case St...
Tim Bass
 
Real time applications using the R Language
Lou Bajuk
 
Temporal Operators For Spark Streaming And Its Application For Office365 Serv...
Jen Aman
 
Sql task answers
Nawaz Sk
 
Ad

Similar to Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts (20)

PDF
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
PPTX
Introduction to WSO2 Data Analytics Platform
Srinath Perera
 
PDF
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2
 
PPTX
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
PDF
Streaming Analytics and Internet of Things - Geesara Prathap
WithTheBest
 
PDF
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2
 
PDF
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2
 
PDF
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
PDF
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
PDF
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2
 
PDF
WSO2 Complex Event Processor
Sriskandarajah Suhothayan
 
PDF
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
PDF
WSO2Con USA 2017: Driving Insights for Your Digital Business With Analytics
WSO2
 
PDF
WSO2 Complex Event Processor - Product Overview
WSO2
 
PDF
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2
 
PDF
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan
 
PPTX
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
PPTX
WSO2 Workshop Sydney 2016 - Analytics
Dassana Wijesekara
 
PPTX
Event Hub & Azure Stream Analytics
Davide Mauri
 
PDF
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
DEBS 2015 Tutorial : Patterns for Realtime Streaming Analytics
Sriskandarajah Suhothayan
 
Introduction to WSO2 Data Analytics Platform
Srinath Perera
 
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
Streaming Analytics and Internet of Things - Geesara Prathap
WithTheBest
 
WSO2Con USA 2017: Discover Data That Matters: Deep Dive into WSO2 Analytics
WSO2
 
WSO2Con EU 2016: An Introduction to the WSO2 Analytics Platform
WSO2
 
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
WSO2
 
WSO2 Complex Event Processor
Sriskandarajah Suhothayan
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Sriskandarajah Suhothayan
 
WSO2Con USA 2017: Driving Insights for Your Digital Business With Analytics
WSO2
 
WSO2 Complex Event Processor - Product Overview
WSO2
 
WSO2Con USA 2017: Analytics Patterns for Your Digital Enterprise
WSO2
 
Analytics Patterns for Your Digital Enterprise
Sriskandarajah Suhothayan
 
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
WSO2 Workshop Sydney 2016 - Analytics
Dassana Wijesekara
 
Event Hub & Azure Stream Analytics
Davide Mauri
 
[WSO2Con EU 2018] Streaming SQL in the Real World
WSO2
 
Ad

More from Srinath Perera (20)

PDF
Book: Software Architecture and Decision-Making
Srinath Perera
 
PDF
Data science Applications in the Enterprise
Srinath Perera
 
PDF
An Introduction to APIs
Srinath Perera
 
PDF
An Introduction to Blockchain for Finance Professionals
Srinath Perera
 
PDF
AI in the Real World: Challenges, and Risks and how to handle them?
Srinath Perera
 
PDF
Healthcare + AI: Use cases & Challenges
Srinath Perera
 
PDF
How would AI shape Future Integrations?
Srinath Perera
 
PDF
The Role of Blockchain in Future Integrations
Srinath Perera
 
PDF
Future of Serverless
Srinath Perera
 
PDF
Blockchain: Where are we? Where are we going?
Srinath Perera
 
PDF
Few thoughts about Future of Blockchain
Srinath Perera
 
PDF
A Visual Canvas for Judging New Technologies
Srinath Perera
 
PDF
Privacy in Bigdata Era
Srinath Perera
 
PDF
Blockchain, Impact, Challenges, and Risks
Srinath Perera
 
PPTX
Today's Technology and Emerging Technology Landscape
Srinath Perera
 
PDF
An Emerging Technologies Timeline
Srinath Perera
 
PDF
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
PDF
Analytics and AI: The Good, the Bad and the Ugly
Srinath Perera
 
PDF
Transforming a Business Through Analytics
Srinath Perera
 
PDF
SoC Keynote:The State of the Art in Integration Technology
Srinath Perera
 
Book: Software Architecture and Decision-Making
Srinath Perera
 
Data science Applications in the Enterprise
Srinath Perera
 
An Introduction to APIs
Srinath Perera
 
An Introduction to Blockchain for Finance Professionals
Srinath Perera
 
AI in the Real World: Challenges, and Risks and how to handle them?
Srinath Perera
 
Healthcare + AI: Use cases & Challenges
Srinath Perera
 
How would AI shape Future Integrations?
Srinath Perera
 
The Role of Blockchain in Future Integrations
Srinath Perera
 
Future of Serverless
Srinath Perera
 
Blockchain: Where are we? Where are we going?
Srinath Perera
 
Few thoughts about Future of Blockchain
Srinath Perera
 
A Visual Canvas for Judging New Technologies
Srinath Perera
 
Privacy in Bigdata Era
Srinath Perera
 
Blockchain, Impact, Challenges, and Risks
Srinath Perera
 
Today's Technology and Emerging Technology Landscape
Srinath Perera
 
An Emerging Technologies Timeline
Srinath Perera
 
The Rise of Streaming SQL and Evolution of Streaming Applications
Srinath Perera
 
Analytics and AI: The Good, the Bad and the Ugly
Srinath Perera
 
Transforming a Business Through Analytics
Srinath Perera
 
SoC Keynote:The State of the Art in Integration Technology
Srinath Perera
 

Recently uploaded (20)

PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PDF
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
ER_Model_Relationship_in_DBMS_Presentation.pptx
dharaadhvaryu1992
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
Copia de Strategic Roadmap Infographics by Slidesgo.pptx (1).pdf
ssuserd4c6911
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
The European Business Wallet: Why It Matters and How It Powers the EUDI Ecosy...
Lal Chandran
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 

Scalable Realtime Analytics with declarative SQL like Complex Event Processing Scripts

  • 1. Scalable Realtime Analytics with declarative, SQL like, Complex Event Processing Scripts Srinath Perera Director, Research WSO2 Apache Member (@srinath_perera) [email protected]
  • 2. (Batch) Analytics Scientists are doing this for 25 year with MPI (1991) on special Hardware Took off with Google’s MapReduce paper (2004), Apache Hadoop, Hive and whole eco system created. It was successful, So we are here!! But, processing takes time.
  • 3. Value of Some Insights degrade Fast! For some usecases ( e.g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. - E.g. stock markets and speed of light We need technology that can produce outputs fast - Static Queries, but need very fast output (Alerts, Realtime control) - Dynamic and Interactive Queries ( Data exploration)
  • 4. History Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e.g. Database research topic around 2005) - CEP vendor roadmap ( from http://www.complexevents.com/2014/12/03/cep- tooling-market-survey-2014/)
  • 7. I. Stream Processing Program a set of processors and wire them up, data flows though the graph. A middleware framework handles data flow, distribution, and fault tolerance (e.g. Apache Storm, Samza) Processors may be in the same machine or multiple machines
  • 8. II. Complex Event Processing
  • 9. III. Micro Batch Process data in small batches, and then combine results for final results (e.g. Spark) Works for simple aggregates, but tricky to do this for complex operations (e.g. Event Sequences) Can do it with MapReduce as well if the deadlines are not too tight.
  • 10. IV. OLAP Style In Memory Computing Usually done to support interactive queries Index data to make them them readily accessible so you can respond to queries fast. (e.g. Apache Drill) Tools like Druid, VoltDB and SAP Hana can do this with all data in memory to make things really fast.
  • 11. Realtime Analytics Patterns Simple counting (e.g. failure count) Counting with Windows ( e.g. failure count every hour) Preprocessing: filtering, transformations (e.g. data cleanup) Alerts , thresholds (e.g. Alarm on high temperature) Data Correlation, Detect missing events, detecting erroneous data (e.g. detecting failed sensors) Joining event streams (e.g. detect a hit on soccer ball) Merge with data in a database, collect, update data conditionally
  • 12. Realtime Analytics Patterns (contd.) Detecting Event Sequence Patterns (e.g. small transaction followed by large transaction) Tracking - follow some related entity’s state in space, time etc. (e.g. location of airline baggage, vehicle, tracking wild life)  Detect trends – Rise, turn, fall, Outliers, Complex trends like triple bottom etc., (e.g. algorithmic trading, SLA, load balancing) Learning a Model (e.g. Predictive maintenance) Predicting next value and corrective actions (e.g. automated car)
  • 13. Apache Hive A SQL like data processing language Since many understand SQL, Hive made large scale data processing Big Data accessible to many Expressive, short, and sweet. Define core operations that covers 90% of problems Lets experts dig in when they like!
  • 14. (Batch Processing, Hive) (Realtime Analytics, X) What is X?
  • 15. CEP = SQL for Realtime Analytics Easy to follow from SQL Expressive, short, and sweet. Define core operations that covers 90% of problems Lets experts dig in when they like! Lets look at the core operations.
  • 16. Operators: Filters Assume a temperature stream Here weather:convertFtoC() is a user defined function. They are used to extend the language. define stream TempStream (ts long, temp double); from TempratureStream [weather:convertFtoC(temp) > 30.0) and roomNo != 2043] select roomNo, temp insert into HotRoomsStream ; Usecases: - Alerts , thresholds (e.g. Alarm on high temperature) - Preprocessing: filtering, transformations (e.g. data cleanup)
  • 17. Operators:Windows and Aggregation Support many window types - Batch Windows, Sliding windows, Custom windows Usecases - Simple counting (e.g. failure count) - Counting with Windows ( e.g. failure count every hour) from TempratureStream#window.time(1 min) select roomNo, avg(temp) as avgTemp insert into HotRoomsStream ;
  • 18. Operators: Patterns Models a followed by relation: e.g. event A followed by event B Very powerful tool for tracking and detecting patterns from every (a1 = TempratureStream) -> a2 = TempratureStream [temp > a1.temp + 5 ] within 1 day select a2.ts as ts, a2.temp – a1.temp as diff insert into HotDayAlertStream; Usecases - Detecting Event Sequence Patterns - Tracking - Detect trends
  • 19. Operators: Joins Join two data streams based on a condition and windows Usecases - Data Correlation, Detect missing events, detecting erroneous data - Joining event streams from TempStream[temp > 30.0]#window.time(1 min) as T join RegulatorStream[isOn == false]#window.length(1) as R on T.roomNo == R.roomNo select T.roomNo, R.deviceID, ‘start’ as action insert into RegulatorActionStream
  • 20. Operators:Access Data from the Disk Event tables allow users to map a database to a window and join a data stream with the window Usecases - Merge with data in a database, collect, update data conditionally define stream TempStream (ts long, temp double); define table HistTempTable(day long, avgT double); from TempStream #window.length(1) join OldTempTable on getDayOfYear(ts) == HistTempTable.day && ts > avgT select ts, temp insert into PurchaseUserStream ;
  • 22. Predictive Analytics  Build models and use them with WSO2 CEP, BAM and ESB using upcoming WSO2 Machine Learner Product ( 2015 Q2)  Build model using R, export them as PMML, and use within WSO2 CEP  Call R Scripts from CEP queries  Regression and Anomaly Detection Operators in CEP
  • 23. Case Study: Realtime Soccer Analysis Watch at: https://www.youtube.com/watch?v=nRI6buQ0NOM
  • 24. TFLTraffic Analysis Built using TFL ( Transport for London) open data feeds. http://goo.gl/04tX6k http://goo.gl/9xNiCm
  • 25. Great, Does it Scale?
  • 26. Idea 1: Network of CEP Nodes For scaling, we arrange CEP processing nodes in a graph like with stream processing. The Graph can be implemented using an stream processing engine like Apache Storm
  • 27. Idea II: Compile SQL like Queries to a Network of CEP Nodes from TempStream[temp > 33] insert into HighTempStream; from HighTempStream#window(1h) select max(temp)as max insert into HourlyMaxTempStream; 
  • 28. How do We partition the Data to scale up the Analysis? Lets follow MapReduce Map Reduce does not scale itself, it asks users to break the problem to many small independent problems.
  • 29. Idea III: Let the Users specify Parallelism Language include parallel constructs: partitions, pipelines, distributed operators Assign each partition to a different node, and partition the data accordingly define partition on TempStream.region { from TempStream[temp > 33] insert into HighTempStream; } from HighTempStream#window(1h) select max(temp)as max insert into HourlyMaxTempStream;
  • 30. Handling Ordering When the data processed in parallel, output might be generated out of order. Due to lack of a global time, we cannot trigger windows and other time sensitive constructs Solution: the current time needs to be propagated though the graph
  • 32. WSO2 CEP & Big Data Platform
  • 33. CEP = SQL for Realtime Analytics Easy to follow from SQL Expressive, short, sweet and fast!! Define core operations that covers 90% of problems Lets experts dig in when they like! And it Scales!!
  • 34. Questions? Visit us at Booth 1025http://wso2.com/landing/strata- hadoop-world-ca-2015/