SlideShare a Scribd company logo
!1
Full Human-Level Artificial Intelligence
and … Data Stream Processing
Heh, I got this!
Paris Carbone
▪ There is one known runtime for human-level intelligence
▪ What is so special about the human brain structure?
!3
Neurobiological Foundations of Action Planning and Execution - Human Action Control — B.Hommel et al.
▪ Diverse functionality/workloads
▪ Common runtime (neurons)
▪ The Brain Neural Network Runtime
!4
▪ Distributed
▪ Organised in Logical Units
▪ Embedded State with Computation
▪ Shared Network
▪ Configured Data Dependencies
▪ Messages (signals)
▪ Supports Low latency Serving
▪ Supports Incremental Updates
▪ Supports Iterative Tasks
▪ Asynchronous Processing
▪ 100% Organic
▪ Distributed
▪ Organised in Logical Units
▪ Embedded State with Computation
▪ Shared Network
▪ Configured Data Dependencies
▪ Messages (signals)
▪ Supports Low latency Serving
▪ Supports Incremental Updates
▪ Supports Iterative Tasks
▪ Asynchronous Processing
▪ 100% Organic
▪ The Data Stream Processing Runtime
!5
!6
▪ Compilers - Our first and best “super-human” invention
▪ Instead, compilers can understand instructions…
▪ explained by humans in a high-level declarative language
▪ and then optimise them
▪ and translate to stupid machines to execute them reliably
“A revolutionary technology
that does NOT require you to throw tons of data
to your problem to be able to solve it”
!7
▪ Our ‘Continuous Deep Analytics’ Project
Compilers
+
Data Streams
▪ Modern Data Pipelines need to combine diverse workloads!
(ML Training & Serving, Relational Algebra, Streams, Tensors, Graphs)
!8
⋈
⋈
⋈
σθ
σθ
σθ
σθ
π
π
Relational Data Streams
Feature Learning
Tensor Programming Dynamic
Graphs
!9
Arc Compiler
▪ diverse workloads
▪ common runtime
!10
Intelligence: Smart Choice / Responce Time
Pipeline (CPU) - Optimised
Pipeline (GPU/TPU)
- Optimised
time until decision
Pipeline (CPU)
Pipeline (GPU/TPU)
critical decision
making
!11
▪ It will be able to solve complex Climate Science problems, fast
val rawStreams = streams("models/*/ts*.nc").
withType[LabelledTensor[Inf x Int x Int -> Double,
Float x (Float, Float) x (Float, Float)]].
dimensionLabels('time x 'lat x 'lon);
val averageStreams = rawStreams.map { raw =>
val timeSliced = raw.sliceBy('time);
val aligned = timeSlices.tile(360 x 720).
map(grid => average(grid));
val gridSlices = aligned.sliceBy('lat, 'lon);
val agg12h = gridSlices.window('time, t => t.between(TimeOfDay(6.h), TimeOfDay(18.h))).
average;
val agg1d = gridSlices.window('time, t => Day(t)).average;
val agg1month = gridSlices.window('time, t => Month(t)).average;
val agg1Season = gridSlices.window('time, t => Month(t).in(
Set(Dec, Jan, Feb),
Set(Mar, Apr, May),
Set(Jun, Jul, Aug),
Set(Sep, Oct, Nov)).average;
(agg12h, agg1d, agg1month, agg1season)
}.unzip4;
val diffs = averageStreams.map { inv =>
val merged = inv.mergeOn('time, 'lat, 'lon);
val averageModels = merged.map(models => (models, average(models)));
averageModels.map {
case (models, avg) => models.map(t => t-avg)
};
}
!12
equi-join time slices then map:
average then diff
sink:
12h
sink:
1d
sink:
month
sink:
season
src20 window:
12h
aggregate with
shared tree of
partials:
average
window:
1d
window:
month
window:
season
src1 tile
map:
average window:
12h
aggregate with
shared tree of
partials:
average
window:
1d
window:
month
window:
season
equi-join time slices then map:
average then diff
equi-join time slices then map:
average then diff
equi-join time slices then map:
average then diff
▪ And generate an optimised stream process graph (program)
!13
Using an Intermediate Representation (IR)
f f’…. ….Data knowledge
f+f’
IR IR
IR
f f’
!14
Weld IR (Stanford DAWN Project)
+ supports large number of existing libraries
- currently limited to short-lived local task execution
Matei Zaharia (Spark architect) et. al.
!14
The Arc Compilation Stack
Available Resources
Stream Metadata
Intermediate
Representation (IR)
Frontends
Logically Optimised
IR
Physically Optimised
IR
Binaries
Arc: Weld for Streams
!16
JIT - Live Rewiring of Continuous Programs
Physically Optimised
IR
Binaries
Change in Resources
Change in Load Distribution
Monitoring
Discovered better Plan
!17
The Current CDA Team (RISE SICS + KTH)
Computer
Systems
Machine
Learning
Lars
Kroll
Paris
Carbone
Christian
Schulte
Seif
Haridi
Theodore
Vasiloudis
Daniel
Gillblad
MSc Students
• Klas Segeljakt
• Oscar Bjuhr
• Johan Mickos
▪ The Brain Neural Network Runtime
!18
▪ Distributed
▪ Organised in Logical Units
▪ Embedded State with Computation
▪ Shared Network
▪ Configured Data Dependencies
▪ Messages (signals)
▪ Supports Low latency Serving
▪ Supports Incremental Updates
▪ Supports Iterative Tasks
▪ Asynchronous Processing
▪ 100% Organic
▪ Just in Time Reconfiguration
▪ Executes Declarative Instructions Reliably

More Related Content

Similar to A Future Look of Data Stream Processing as an Architecture for AI (20)

PDF
Knowledge Discovery
André Karpištšenko
 
PPTX
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Jen Aman
 
PPTX
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Databricks
 
PDF
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
PPTX
Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoo...
Yahoo Developer Network
 
PDF
Sc12 workshop-writeup
Aaron Zauner
 
PDF
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Holden Karau
 
PDF
Towards Data Operations
Andrea Monacchi
 
PPTX
Zaharia spark-scala-days-2012
Skills Matter Talks
 
PDF
WSO2Con ASIA 2016: IoT Analytics
WSO2
 
PDF
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
PDF
Artificial intelligence and data stream mining
Albert Bifet
 
PPT
Moving Towards a Streaming Architecture
Gabriele Modena
 
PPTX
Feature Store as a Data Foundation for Machine Learning
Provectus
 
PDF
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
Jim Dowling
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PPTX
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
PPTX
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
PDF
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
PPTX
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
Knowledge Discovery
André Karpištšenko
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Jen Aman
 
Deep Learning and Streaming in Apache Spark 2.x with Matei Zaharia
Databricks
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Paris Carbone
 
Apache Hadoop India Summit 2011 talk "Online Content Optimization using Hadoo...
Yahoo Developer Network
 
Sc12 workshop-writeup
Aaron Zauner
 
Streaming ML on Spark: Deprecated, experimental and internal ap is galore!
Holden Karau
 
Towards Data Operations
Andrea Monacchi
 
Zaharia spark-scala-days-2012
Skills Matter Talks
 
WSO2Con ASIA 2016: IoT Analytics
WSO2
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Artificial intelligence and data stream mining
Albert Bifet
 
Moving Towards a Streaming Architecture
Gabriele Modena
 
Feature Store as a Data Foundation for Machine Learning
Provectus
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
Jim Dowling
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Matching Data Intensive Applications and Hardware/Software Architectures
Geoffrey Fox
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Geoffrey Fox
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Big Data Analysis : Deciphering the haystack
Srinath Perera
 

More from Paris Carbone (12)

PDF
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Paris Carbone
 
PDF
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
PDF
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Paris Carbone
 
PDF
Continuous Deep Analytics
Paris Carbone
 
PDF
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
PDF
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
PDF
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
PDF
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
PDF
Data Stream Analytics - Why they are important
Paris Carbone
 
PDF
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
 
PDF
Aggregate Sharing for User-Define Data Stream Windows
Paris Carbone
 
PDF
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Scalable and Reliable Data Stream Processing - Doctorate Seminar
Paris Carbone
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Paris Carbone
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Paris Carbone
 
Continuous Deep Analytics
Paris Carbone
 
State Management in Apache Flink : Consistent Stateful Distributed Stream Pro...
Paris Carbone
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Paris Carbone
 
Not Less, Not More: Exactly Once, Large-Scale Stream Processing in Action
Paris Carbone
 
Graph Stream Processing : spinning fast, large scale, complex analytics
Paris Carbone
 
Data Stream Analytics - Why they are important
Paris Carbone
 
Single-Pass Graph Stream Analytics with Apache Flink
Paris Carbone
 
Aggregate Sharing for User-Define Data Stream Windows
Paris Carbone
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Paris Carbone
 
Ad

Recently uploaded (20)

DOCX
AI/ML Applications in Financial domain projects
Rituparna De
 
PPTX
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
PDF
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PPT
Data base management system Transactions.ppt
gandhamcharan2006
 
PDF
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
PPTX
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
PPTX
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
PPT
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
PPTX
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
PDF
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
PPTX
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
PPTX
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
AI/ML Applications in Financial domain projects
Rituparna De
 
TSM_08_0811111111111111111111111111111111111111111111111
csomonasteriomoscow
 
Incident Response and Digital Forensics Certificate
VICTOR MAESTRE RAMIREZ
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
Data base management system Transactions.ppt
gandhamcharan2006
 
How to Avoid 7 Costly Mainframe Migration Mistakes
JP Infra Pvt Ltd
 
Resmed Rady Landis May 4th - analytics.pptx
Adrian Limanto
 
This PowerPoint presentation titled "Data Visualization: Turning Data into In...
HemaDivyaKantamaneni
 
1 DATALINK CONTROL and it's applications
karunanidhilithesh
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
MusicVideoProjectRubric Animation production music video.pdf
ALBERTIANCASUGA
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Rocket-Launched-PowerPoint-Template.pptx
Arden31
 
Hadoop_EcoSystem slide by CIDAC India.pptx
migbaruget
 
Performance Report Sample (Draft7).pdf
AmgadMaher5
 
Usage of Power BI for Pharmaceutical Data analysis.pptx
Anisha Herala
 
AI Project Cycle and Ethical Frameworks.pptx
RiddhimaVarshney1
 
Ad

A Future Look of Data Stream Processing as an Architecture for AI

  • 1. !1 Full Human-Level Artificial Intelligence and … Data Stream Processing Heh, I got this! Paris Carbone
  • 2. ▪ There is one known runtime for human-level intelligence
  • 3. ▪ What is so special about the human brain structure? !3 Neurobiological Foundations of Action Planning and Execution - Human Action Control — B.Hommel et al. ▪ Diverse functionality/workloads ▪ Common runtime (neurons)
  • 4. ▪ The Brain Neural Network Runtime !4 ▪ Distributed ▪ Organised in Logical Units ▪ Embedded State with Computation ▪ Shared Network ▪ Configured Data Dependencies ▪ Messages (signals) ▪ Supports Low latency Serving ▪ Supports Incremental Updates ▪ Supports Iterative Tasks ▪ Asynchronous Processing ▪ 100% Organic
  • 5. ▪ Distributed ▪ Organised in Logical Units ▪ Embedded State with Computation ▪ Shared Network ▪ Configured Data Dependencies ▪ Messages (signals) ▪ Supports Low latency Serving ▪ Supports Incremental Updates ▪ Supports Iterative Tasks ▪ Asynchronous Processing ▪ 100% Organic ▪ The Data Stream Processing Runtime !5
  • 6. !6 ▪ Compilers - Our first and best “super-human” invention ▪ Instead, compilers can understand instructions… ▪ explained by humans in a high-level declarative language ▪ and then optimise them ▪ and translate to stupid machines to execute them reliably “A revolutionary technology that does NOT require you to throw tons of data to your problem to be able to solve it”
  • 7. !7 ▪ Our ‘Continuous Deep Analytics’ Project Compilers + Data Streams
  • 8. ▪ Modern Data Pipelines need to combine diverse workloads! (ML Training & Serving, Relational Algebra, Streams, Tensors, Graphs) !8 ⋈ ⋈ ⋈ σθ σθ σθ σθ π π Relational Data Streams Feature Learning Tensor Programming Dynamic Graphs
  • 9. !9 Arc Compiler ▪ diverse workloads ▪ common runtime
  • 10. !10 Intelligence: Smart Choice / Responce Time Pipeline (CPU) - Optimised Pipeline (GPU/TPU) - Optimised time until decision Pipeline (CPU) Pipeline (GPU/TPU) critical decision making
  • 11. !11 ▪ It will be able to solve complex Climate Science problems, fast val rawStreams = streams("models/*/ts*.nc"). withType[LabelledTensor[Inf x Int x Int -> Double, Float x (Float, Float) x (Float, Float)]]. dimensionLabels('time x 'lat x 'lon); val averageStreams = rawStreams.map { raw => val timeSliced = raw.sliceBy('time); val aligned = timeSlices.tile(360 x 720). map(grid => average(grid)); val gridSlices = aligned.sliceBy('lat, 'lon); val agg12h = gridSlices.window('time, t => t.between(TimeOfDay(6.h), TimeOfDay(18.h))). average; val agg1d = gridSlices.window('time, t => Day(t)).average; val agg1month = gridSlices.window('time, t => Month(t)).average; val agg1Season = gridSlices.window('time, t => Month(t).in( Set(Dec, Jan, Feb), Set(Mar, Apr, May), Set(Jun, Jul, Aug), Set(Sep, Oct, Nov)).average; (agg12h, agg1d, agg1month, agg1season) }.unzip4; val diffs = averageStreams.map { inv => val merged = inv.mergeOn('time, 'lat, 'lon); val averageModels = merged.map(models => (models, average(models))); averageModels.map { case (models, avg) => models.map(t => t-avg) }; }
  • 12. !12 equi-join time slices then map: average then diff sink: 12h sink: 1d sink: month sink: season src20 window: 12h aggregate with shared tree of partials: average window: 1d window: month window: season src1 tile map: average window: 12h aggregate with shared tree of partials: average window: 1d window: month window: season equi-join time slices then map: average then diff equi-join time slices then map: average then diff equi-join time slices then map: average then diff ▪ And generate an optimised stream process graph (program)
  • 13. !13 Using an Intermediate Representation (IR) f f’…. ….Data knowledge f+f’ IR IR IR f f’
  • 14. !14 Weld IR (Stanford DAWN Project) + supports large number of existing libraries - currently limited to short-lived local task execution Matei Zaharia (Spark architect) et. al. !14
  • 15. The Arc Compilation Stack Available Resources Stream Metadata Intermediate Representation (IR) Frontends Logically Optimised IR Physically Optimised IR Binaries Arc: Weld for Streams
  • 16. !16 JIT - Live Rewiring of Continuous Programs Physically Optimised IR Binaries Change in Resources Change in Load Distribution Monitoring Discovered better Plan
  • 17. !17 The Current CDA Team (RISE SICS + KTH) Computer Systems Machine Learning Lars Kroll Paris Carbone Christian Schulte Seif Haridi Theodore Vasiloudis Daniel Gillblad MSc Students • Klas Segeljakt • Oscar Bjuhr • Johan Mickos
  • 18. ▪ The Brain Neural Network Runtime !18 ▪ Distributed ▪ Organised in Logical Units ▪ Embedded State with Computation ▪ Shared Network ▪ Configured Data Dependencies ▪ Messages (signals) ▪ Supports Low latency Serving ▪ Supports Incremental Updates ▪ Supports Iterative Tasks ▪ Asynchronous Processing ▪ 100% Organic ▪ Just in Time Reconfiguration ▪ Executes Declarative Instructions Reliably