SlideShare a Scribd company logo
RISE Research Institutes of Sweden
knowledge
Computer Systems Laboratory
Paris Carbone - parisc@kth.se
PhD Candidate @ KTH
Committer @ Apache Foundation
CONTINUOUS
DEEP
ANALYTICS
RISE Open House 2018
!2
knowledge
Data
it is raining
it is cold
those who listen to A also like B
someone is crossing the street
credit fraud detected
users hate the website redesign
water/light sensor events
temperature sensor events
app-user likes
web user clicks
vehicle camera feed
user data and transactions
a storm is approachingclimate data and simulations
PROCESSING
!3
knowledge
Decision
Making
it is raining bring an umbrella
it is cold turn heating on
those who listen to A also like B recommend B to fans of A
someone is crossing the street stop the vehicle
credit fraud detected cancel transactions
users hate the website redesign switch to the old version
a storm is approaching alert boats!
REASONING
!4
complexity
> data dependencies
< response time
time criticality
!5
> data dependencies
it is raining
it is cold
those who
listen to A
will love B
someone is
crossing the street
credit fraud
people hate
the new menu
a storm approaches
< response time
complexity
time criticality
!6
CEP
bulk synchronous
iterative processing
deep neural
networks
graph analysis
derivative approximation
simulations
> data dependencies
< response time
complexity
time criticality
relational
algebra online ML
CEP
relational
algebra online ML
!7
bulk synchronous
iterative processing
deep neural
networks
graph analysis
derivative approximation
simulations
> data dependencies
< response time
complexity
time criticality
Tensor Programming
Platforms
(GPUs/TPUs)
Batch Processing
Platforms
(mem/CPU)
High Performance
Computing Platforms
(petaflops)
Data Stream/DBMS Platforms
(mem/CPU)
CEP
relational
algebra online ML
!7
bulk synchronous
iterative processing
deep neural
networks
graph analysis
derivative approximation
simulations
> data dependencies
< response time
complexity
time criticality
Tensor Programming
Platforms
(GPUs/TPUs)
Batch Processing
Platforms
(mem/CPU)
High Performance
Computing Platforms
(petaflops)
Data Stream/DBMS Platforms
(mem/CPU)
Continuous Deep Analytics
▪ Modern Data Pipelines need to combine diverse workloads!
(ML Training & Serving, Relational Algebra, Streams, Tensors, Graphs)
!8
⋈
⋈
⋈
σθ
σθ
σθ
σθ
π
π
Relational Data Streams
Feature Learning
Tensor Programming Dynamic
Graphs
!9
Hardware Acceleration: Important but Not Enough
Pipeline (CPU)
Pipeline (GPU/TPU)
time until decision
critical decision
making
!10
Cross-Platform Computation is Inefficient
Stream
Tasks
Tensor
Tasks
Graph
Tasks
computationcomputation computation
!10
Cross-Platform Computation is Inefficient
Stream
Tasks
Tensor
Tasks
Graph
Tasks
computationcomputation computation
- No computation sharing optimisations
!10
Cross-Platform Computation is Inefficient
Stream
Tasks
Tensor
Tasks
Graph
Tasks
computationcomputation computation
- expensive data exchange through disk
- No computation sharing optimisations
!11
Computation Sharing
f f’…. ….Data knowledge
!11
Computation Sharing
f f’…. ….Data knowledge
!11
Computation Sharing
f f’
f + f’
loop fusion
vectorization
….
+less data passes
+no additional overhead
…. ….Data knowledge
!12
Critical Decision Making demands Efficiency
time until decision
Pipeline (CPU)
Pipeline (GPU/TPU)
critical decision
making
!12
Critical Decision Making demands Efficiency
Pipeline (CPU) - Optimised
Pipeline (GPU/TPU)
- Optimised
time until decision
Pipeline (CPU)
Pipeline (GPU/TPU)
critical decision
making
!13
The Problem
f f’
?
…. ….Data knowledge
!14
!15
Using an Intermediate Representation (IR)
f f’…. ….Data knowledge
!15
Using an Intermediate Representation (IR)
f f’…. ….Data knowledge
f+f’
IR IR
IR
f f’
!16
1) Weld IR (Stanford DAWN Project)
Related IR Projects
Matei Zaharia (Spark architect) et. al.
!16
!16
1) Weld IR (Stanford DAWN Project)
Related IR Projects
+ supports large number of existing libraries
- currently limited to short-lived local task execution
Matei Zaharia (Spark architect) et. al.
!16
!17
2) Tensor Comprehensions (Facebook)
Related IR Projects
!17
2) Tensor Comprehensions (Facebook)
Related IR Projects
+ JIT compiler, multi-GPU autotuner
- overspecialized to tensor operations in CUDA
!18
⋈
σθ
σθ
π
Dynamic
Graphs
Tensors
Relational
Streams
Data Pipeline
Our Goals: Model and Language Independence
!18
⋈
σθ
σθ
π
Dynamic
Graphs
Tensors
Relational
Streams
Data Pipeline
Our Goals: Model and Language Independence
!18
⋈
σθ
σθ
π
Dynamic
Graphs
Tensors
Relational
Streams
Data Pipeline
Our Goals: Model and Language Independence
…
!19
A Distributed Runtime for Heterogeneous Hardware
CDA Platform
Worker Worker Worker
LLVM code
Rust-Based
!20
Intermediate
Representation (IR)
Logically Optimised
IR
Compilation Steps
!21
Intermediate
Representation (IR)
Logically Optimised
IR
Physically Optimised
IR
Compilation Steps
!22
Physically Optimised
IR
Distributed JIT Compilation and Reconfiguration
Rust AST
LLVM
!23
The Current CDA Team (RISE SICS + KTH)
Computer
Systems
Machine
Learning
Lars
Kroll
Paris
Carbone
Christian
Schulte
Seif
Haridi
Theodore
Vasiloudis
Daniel
Gillblad
MSc Students
• Klas Segeljakt
• Oscar Bjuhr
• Johan Mickos
!24
Thank You!
Don’t miss our poster :)

More Related Content

PDF
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
PDF
Exascale Deep Learning for Climate Analytics
inside-BigData.com
 
PPTX
Big Data Analytics .pptx
priti jadhao
 
PDF
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
PPTX
Lecture_IIITD.pptx
achakracu
 
PDF
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
ijcsit
 
PDF
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
AIRCC Publishing Corporation
 
PPTX
Real-time Energy Data Analytics with Storm
DataWorks Summit
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
Exascale Deep Learning for Climate Analytics
inside-BigData.com
 
Big Data Analytics .pptx
priti jadhao
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Value Association
 
Lecture_IIITD.pptx
achakracu
 
STEAM++ AN EXTENSIBLE END-TO-END FRAMEWORK FOR DEVELOPING IOT DATA PROCESSING...
ijcsit
 
Steam++ An Extensible End-to-end Framework for Developing IoT Data Processing...
AIRCC Publishing Corporation
 
Real-time Energy Data Analytics with Storm
DataWorks Summit
 

Similar to Continuous Deep Analytics (20)

PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
PPTX
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
PDF
04 open source_tools
Marco Quartulli
 
PDF
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Databricks
 
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
PDF
FPGA-enhanced Bioinformatics @ NECST
NECST Lab @ Politecnico di Milano
 
PPTX
AI on the Edge
Jared Rhodes
 
PDF
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
PDF
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
PDF
Exploring emerging technologies in the HPC co-design space
jsvetter
 
PPTX
Parallel Linear Regression in Interative Reduce and YARN
DataWorks Summit
 
PDF
An Introduction to Neural Architecture Search
Bill Liu
 
PDF
Rapids: Data Science on GPUs
inside-BigData.com
 
PDF
NVIDIA Rapids presentation
testSri1
 
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
PDF
Exploring Advanced Deep Learning Projects.pdf
prakashdm2024
 
PPTX
Sundance's presentation at B:RAI 2020
Sundance Multiprocessor Technology Ltd.
 
PDF
Convergencia HPC - Big Data: Arquitectura y modelos de programación
Facultad de Informática UCM
 
PPTX
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
Demetris Trihinas
 
PDF
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
04 open source_tools
Marco Quartulli
 
Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
Databricks
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
FPGA-enhanced Bioinformatics @ NECST
NECST Lab @ Politecnico di Milano
 
AI on the Edge
Jared Rhodes
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
HostedbyConfluent
 
RAPIDS – Open GPU-accelerated Data Science
Data Works MD
 
Exploring emerging technologies in the HPC co-design space
jsvetter
 
Parallel Linear Regression in Interative Reduce and YARN
DataWorks Summit
 
An Introduction to Neural Architecture Search
Bill Liu
 
Rapids: Data Science on GPUs
inside-BigData.com
 
NVIDIA Rapids presentation
testSri1
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
Exploring Advanced Deep Learning Projects.pdf
prakashdm2024
 
Sundance's presentation at B:RAI 2020
Sundance Multiprocessor Technology Ltd.
 
Convergencia HPC - Big Data: Arquitectura y modelos de programación
Facultad de Informática UCM
 
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
Demetris Trihinas
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Otávio Carvalho
 
Ad

More from Paris Carbone (13)

PDF
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Paris Carbone
 
PDF
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
PDF
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Paris Carbone
 
PDF
A Future Look of Data Stream Processing as an Architecture for AI
Paris Carbone
 
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
PDF
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
PDF
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
PDF
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
PDF
Data Stream Analytics - Why they are important
Paris Carbone
 
PDF
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
 
PDF
Aggregate Sharing for User-Define Data Stream Windows
Paris Carbone
 
PPTX
An Introduction to Distributed Data Streaming
Paris Carbone
 
PDF
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Paris Carbone
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Paris Carbone
 
A Future Look of Data Stream Processing as an Architecture for AI
Paris Carbone
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
Data Stream Analytics - Why they are important
Paris Carbone
 
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
 
Aggregate Sharing for User-Define Data Stream Windows
Paris Carbone
 
An Introduction to Distributed Data Streaming
Paris Carbone
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Ad

Recently uploaded (20)

PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
HSE WEEKLY REPORT for dummies and lazzzzy.pptx
ahmedibrahim691723
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 

Continuous Deep Analytics