SlideShare a Scribd company logo
Tiark Rompf
Purdue University
FLARE Scale up Spark SQL
with Native Compilation
and set your Data on Fire!
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf
ScalaMartin Odersky + the Scala team
Rompf, Iulian Dragos, Adriaan Moors, Gilles Dubochet, Philipp Haller, Lukas Rytz, Ingo Maier, Antonio Cunei, Donna Malayeri, Miguel Garcia, Hubert Plociniczak, Aleksandar Pro
st: Geoffrey Washburn, Stéphane Micheloud, Lex Spoon, Sean Mc Dirmid, Burak Emir, Nikolay Mihaylov, Philippe Altherr, Vincent Cremet, Michel Schinz, Erik Stenman, Matthias
external/visiting contributors: Paul Phillips, Miles Sabin, Stepan Koltsov and others
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf
User Programs
(Java, Scala, Python, R)
SQL
(JDBC, Console)
Spark
Resilient Distributed Dataset
Code
Generation
DataFrame API
Catalyst Optimizer
Spark SQL
How Fast Is Spark?
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf
Demo
User Programs
(Java, Scala, Python, R)
SQL
(JDBC, Console)
Spark
Resilient Distributed Dataset
Code
Generation
DataFrame API
Catalyst Optimizer
Spark SQL
Spark Architecture
Flare: a New Back-end for Spark
User Programs
(Java, Scala, Python, R)
SQL
(JDBC, Console)
Spark
Resilient Distributed Dataset
Code
Generation
DataFrame API
Catalyst Optimizer
Spark SQL
Delite’s Back-end
DMLL
LMS Code Generation
Optimized Scala, C
(a) Spark SQL
Delite’s Runtime
native code
OptiQL OptiML OptiGraph
(b) Flare Level 1 (c) Flare Level 2 (d) Flare Level 3
Front-end
Flare’s
Code Generation
Flare’s
Code Generation
Flare’s Runtime
Export query plan
JNI
Front-end
Results
Single-Core Running Time: TPCH
Absolute running time in milliseconds (ms) for Postgres, Spark, HyPer and Flare in SF10
1
10
100
1000
10000
100000
1x106
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
RunningTime(ms)
PostgreSQL Spark HyPer Flare
Apache Parquet Format
1
10
100
1000
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21
Speedup
Spark CSV Spark Parquet Flare CSV Flare Parquet
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11
Spark CSV 16762 12244 21730 19836 19316 12278 24484 17726 30050 29533 5224
Spark Parquet 3728 13520 9099 6083 8706 535 13555 5512 19413 21822 3926
Flare CSV 641 168 757 698 758 568 788 875 1417 854 128
Flare Parquet 187 17 125 127 151 99 183 160 698 309 9
Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22
Spark CSV 21688 8554 12962 26721 12941 24690 27012 12409 19369 57330 7050
Spark Parquet 5570 7034 719 4506 21834 5176 6757 2681 8562 25089 5295
Flare CSV 701 388 573 551 150 1426 1229 605 792 1868 178
Flare Parquet 133 246 86 88 66 264 181 178 165 324 22
What about parallelism?
Parallel Scaling Experiment
Scaling-up Flare and Spark SQL in SF20
2
4
6
8
10
12
14
16
18
20
1 2 4 8 16 32
Speedup Flare
Q6 aggregate
Q13 outer-join
Q14 join
Q22 semi/anti-join
0
2000
4000
6000
8000
10000
12000
14000
16000
1 2 4 8 16 32
# Cores
Q22
0
2000
4000
6000
8000
10000
12000
14000
1 2 4 8 16 32
RunningTime(ms)
# Cores
Q14
0
10000
20000
30000
40000
50000
60000
1 2 4 8 16 32
Q13
0
1000
2000
3000
4000
5000
6000
7000
1 2 4 8 16 32
RunningTime(ms)
Q6
Spark SQL
Flare Level 2
2
4
6
8
10
12
14
16
18
20
1 2 4 8 16 32
Spark
Hardware: Single NUMA machine with 4 sockets, 12 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
NUMA Optimization
C
O
R
E
0
C
O
R
E
1
C
O
R
E
2
C
O
R
E
3
C
O
R
E
4
C
O
R
E
5
C
O
R
E
6
C
O
R
E
7
C
O
R
E
8
C
O
R
E
9
C
O
R
E
10
C
O
R
E
11
C
O
R
E
12
C
O
R
E
13
C
O
R
E
14
C
O
R
E
15
Memory
Columnar data
NUMA Optimization
5300
5400
5500
Q1
1
0
100
200
300
400
500
600
1 18 36 72
RunningTime(ms)
# Cores
12 12
23
12
24
46
3500
3600
3700
Q6
one socket
two sockets
four sockets
1
0
100
200
300
1 18 36 72
# Cores
14
22
29
23
44
58
Scaling-up Flare for SF100 with NUMA optimizations on different configurations: threads pinned to one, two and four sockets
• Q6 performs better when threads are dispatched on different sockets.
• Q1 is computation-bound, little effect
• On scaling-up Q1 and Q6 up to 72 cores (the capacity of the
machine), the maximum speedup is 46x and 58x.
Hardware: Single NUMA machine with 4 sockets, 12 Xeon E5-4657L cores per socket, and
256GB RAM per socket (1 TB total).
Heterogeneous Workloads:
UDFs and ML Kernels
Example: k-Means Clustering
untilconverged(mu, tol) { mu =>
// calculate distances to current centroids
val c = (0::m) {i =>
val allDistances = mu mapRows { centroid =>
dist(x(i), centroid)
}
allDistances.minIndex
}
// move each cluster centroid to the
// mean of the points assigned to it
val newMu = (0::k,*) { i =>
val (weightedpoints, points) = sum(0,m) { j =>
if (c(i) == j) (x(i),1)
}
if (points == 0) Vector.zeros(n)
else weightedpoints / points
}
newMu
5
10
15
20
25
30
35
40
45
50
1 12 24 48
# Cores
GDA
5
10
15
20
25
30
35
40
45
50
1 12 24 48
Speedup
# Cores
Gene
5
10
15
20
25
30
35
40
45
50
1 12 24 48
k-means
5
10
15
20
25
30
35
40
45
50
1 12 24 48
Speedup
LogReg
Spark
C++(nopin)
C++(pin)
C++(numa)
Level 3: Machine learning kernels, scaling on shared memory NUMA
with thread pinning and data partitioning
Flare Level 3: ML Performance
Flare Level 3: ML Performance
0
1
2
3
4
5
6
7
8
3.4 GB 17 GB
SpeedupoverSpark
LogReg
0
1
2
3
4
5
6
7
8
1.7 GB 17 GB
k-means
Spark
Delite-CPU
Delite-GPU
0
1
2
3
4
5
6
7
8
k-means LogReg
GPU Cluster
Level 3: Machine learning kernels run on a 20 node Amazon cluster (left, center)
and on a 4 node GPU cluster connected within a single rack.
TensorFlow -> TensorFlare
Relational + ML
/* TensorFlow inference as UDF */
val q = spark.sql("select ... from data
where class = findNearestCluster(...)
group by class")
flare(q).show
flaredata.github.io
flaredata.github.io
Grégory Essertel Ruby Tahboub James Decker
FLARE TEAM
Thank You.
Web: flaredata.github.io
Twitter: @flaredata
FLARE

More Related Content

What's hot (20)

PDF
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
PDF
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Databricks
 
PDF
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
PDF
How Machine Learning and AI Can Support the Fight Against COVID-19
Databricks
 
PPTX
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
PDF
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
PDF
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Jan Wiegelmann
 
PDF
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
 
PPTX
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
PDF
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
PDF
Getting The Best Performance With PySpark
Spark Summit
 
PPTX
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Ray and Its Growing Ecosystem
Databricks
 
PDF
Spark Community Update - Spark Summit San Francisco 2015
Databricks
 
PPTX
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
PDF
LCA13: Hadoop DFS Performance
Linaro
 
PDF
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
PDF
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
PDF
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Databricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Spark Summit
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Databricks
 
Web-Scale Graph Analytics with Apache® Spark™
Databricks
 
How Machine Learning and AI Can Support the Fight Against COVID-19
Databricks
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Distributed TensorFlow on Hadoop, Mesos, Kubernetes, Spark
Jan Wiegelmann
 
Deep Learning with DL4J on Apache Spark: Yeah it’s Cool, but are You Doing it...
Databricks
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Spark Summit
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Spark Summit
 
Getting The Best Performance With PySpark
Spark Summit
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
Databricks
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Ray and Its Growing Ecosystem
Databricks
 
Spark Community Update - Spark Summit San Francisco 2015
Databricks
 
High Resolution Energy Modeling that Scales with Apache Spark 2.0 Spark Summi...
Spark Summit
 
LCA13: Hadoop DFS Performance
Linaro
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
Distributed Deep Learning with Apache Spark and TensorFlow with Jim Dowling
Databricks
 

Similar to Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf (20)

PPTX
Intro to Spark development
Spark Summit
 
PDF
Introduction to Spark Training
Spark Summit
 
PDF
Big Data Analytics and Ubiquitous computing
Animesh Chaturvedi
 
PDF
Unified Big Data Processing with Apache Spark
C4Media
 
PDF
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Euangelos Linardos
 
PPTX
Big Data tools in practice
Darko Marjanovic
 
PDF
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Holden Karau
 
PDF
Bds session 13 14
Infinity Tech Solutions
 
PDF
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere
 
PPTX
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
PDF
Introduction to Spark with Python
Gokhan Atil
 
PPTX
Spark Overview and Performance Issues
Antonios Katsarakis
 
PDF
Simplifying Big Data Analytics with Apache Spark
Databricks
 
PDF
20150716 introduction to apache spark v3
Andrey Vykhodtsev
 
PPTX
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
PPTX
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
PPTX
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
PDF
Dev Ops Training
Spark Summit
 
PDF
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Intro to Spark development
Spark Summit
 
Introduction to Spark Training
Spark Summit
 
Big Data Analytics and Ubiquitous computing
Animesh Chaturvedi
 
Unified Big Data Processing with Apache Spark
C4Media
 
Apache Spark Workshop, Apr. 2016, Euangelos Linardos
Euangelos Linardos
 
Big Data tools in practice
Darko Marjanovic
 
Keeping the fun in functional w/ Apache Spark @ Scala Days NYC
Holden Karau
 
Bds session 13 14
Infinity Tech Solutions
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
BigDataEverywhere
 
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
Introduction to Spark with Python
Gokhan Atil
 
Spark Overview and Performance Issues
Antonios Katsarakis
 
Simplifying Big Data Analytics with Apache Spark
Databricks
 
20150716 introduction to apache spark v3
Andrey Vykhodtsev
 
Big data processing with Apache Spark and Oracle Database
Martin Toshev
 
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
4Introduction+to+Spark.pptx sdfsdfsdfsdfsdf
yafora8192
 
Dev Ops Training
Spark Summit
 
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
NIS2 Compliance for MSPs: Roadmap, Benefits & Cybersecurity Trends (2025 Guide)
GRC Kompas
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 

Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! with Tiark Rompf

  • 1. Tiark Rompf Purdue University FLARE Scale up Spark SQL with Native Compilation and set your Data on Fire!
  • 5. ScalaMartin Odersky + the Scala team Rompf, Iulian Dragos, Adriaan Moors, Gilles Dubochet, Philipp Haller, Lukas Rytz, Ingo Maier, Antonio Cunei, Donna Malayeri, Miguel Garcia, Hubert Plociniczak, Aleksandar Pro st: Geoffrey Washburn, Stéphane Micheloud, Lex Spoon, Sean Mc Dirmid, Burak Emir, Nikolay Mihaylov, Philippe Altherr, Vincent Cremet, Michel Schinz, Erik Stenman, Matthias external/visiting contributors: Paul Phillips, Miles Sabin, Stepan Koltsov and others
  • 7. User Programs (Java, Scala, Python, R) SQL (JDBC, Console) Spark Resilient Distributed Dataset Code Generation DataFrame API Catalyst Optimizer Spark SQL
  • 8. How Fast Is Spark?
  • 10. Demo
  • 11. User Programs (Java, Scala, Python, R) SQL (JDBC, Console) Spark Resilient Distributed Dataset Code Generation DataFrame API Catalyst Optimizer Spark SQL Spark Architecture
  • 12. Flare: a New Back-end for Spark User Programs (Java, Scala, Python, R) SQL (JDBC, Console) Spark Resilient Distributed Dataset Code Generation DataFrame API Catalyst Optimizer Spark SQL Delite’s Back-end DMLL LMS Code Generation Optimized Scala, C (a) Spark SQL Delite’s Runtime native code OptiQL OptiML OptiGraph (b) Flare Level 1 (c) Flare Level 2 (d) Flare Level 3 Front-end Flare’s Code Generation Flare’s Code Generation Flare’s Runtime Export query plan JNI Front-end
  • 14. Single-Core Running Time: TPCH Absolute running time in milliseconds (ms) for Postgres, Spark, HyPer and Flare in SF10 1 10 100 1000 10000 100000 1x106 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 RunningTime(ms) PostgreSQL Spark HyPer Flare
  • 15. Apache Parquet Format 1 10 100 1000 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Speedup Spark CSV Spark Parquet Flare CSV Flare Parquet Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Spark CSV 16762 12244 21730 19836 19316 12278 24484 17726 30050 29533 5224 Spark Parquet 3728 13520 9099 6083 8706 535 13555 5512 19413 21822 3926 Flare CSV 641 168 757 698 758 568 788 875 1417 854 128 Flare Parquet 187 17 125 127 151 99 183 160 698 309 9 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q21 Q22 Spark CSV 21688 8554 12962 26721 12941 24690 27012 12409 19369 57330 7050 Spark Parquet 5570 7034 719 4506 21834 5176 6757 2681 8562 25089 5295 Flare CSV 701 388 573 551 150 1426 1229 605 792 1868 178 Flare Parquet 133 246 86 88 66 264 181 178 165 324 22
  • 17. Parallel Scaling Experiment Scaling-up Flare and Spark SQL in SF20 2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 Speedup Flare Q6 aggregate Q13 outer-join Q14 join Q22 semi/anti-join 0 2000 4000 6000 8000 10000 12000 14000 16000 1 2 4 8 16 32 # Cores Q22 0 2000 4000 6000 8000 10000 12000 14000 1 2 4 8 16 32 RunningTime(ms) # Cores Q14 0 10000 20000 30000 40000 50000 60000 1 2 4 8 16 32 Q13 0 1000 2000 3000 4000 5000 6000 7000 1 2 4 8 16 32 RunningTime(ms) Q6 Spark SQL Flare Level 2 2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 Spark Hardware: Single NUMA machine with 4 sockets, 12 Xeon E5-4657L cores per socket, and 256GB RAM per socket (1 TB total).
  • 19. NUMA Optimization 5300 5400 5500 Q1 1 0 100 200 300 400 500 600 1 18 36 72 RunningTime(ms) # Cores 12 12 23 12 24 46 3500 3600 3700 Q6 one socket two sockets four sockets 1 0 100 200 300 1 18 36 72 # Cores 14 22 29 23 44 58 Scaling-up Flare for SF100 with NUMA optimizations on different configurations: threads pinned to one, two and four sockets • Q6 performs better when threads are dispatched on different sockets. • Q1 is computation-bound, little effect • On scaling-up Q1 and Q6 up to 72 cores (the capacity of the machine), the maximum speedup is 46x and 58x. Hardware: Single NUMA machine with 4 sockets, 12 Xeon E5-4657L cores per socket, and 256GB RAM per socket (1 TB total).
  • 21. Example: k-Means Clustering untilconverged(mu, tol) { mu => // calculate distances to current centroids val c = (0::m) {i => val allDistances = mu mapRows { centroid => dist(x(i), centroid) } allDistances.minIndex } // move each cluster centroid to the // mean of the points assigned to it val newMu = (0::k,*) { i => val (weightedpoints, points) = sum(0,m) { j => if (c(i) == j) (x(i),1) } if (points == 0) Vector.zeros(n) else weightedpoints / points } newMu
  • 22. 5 10 15 20 25 30 35 40 45 50 1 12 24 48 # Cores GDA 5 10 15 20 25 30 35 40 45 50 1 12 24 48 Speedup # Cores Gene 5 10 15 20 25 30 35 40 45 50 1 12 24 48 k-means 5 10 15 20 25 30 35 40 45 50 1 12 24 48 Speedup LogReg Spark C++(nopin) C++(pin) C++(numa) Level 3: Machine learning kernels, scaling on shared memory NUMA with thread pinning and data partitioning Flare Level 3: ML Performance
  • 23. Flare Level 3: ML Performance 0 1 2 3 4 5 6 7 8 3.4 GB 17 GB SpeedupoverSpark LogReg 0 1 2 3 4 5 6 7 8 1.7 GB 17 GB k-means Spark Delite-CPU Delite-GPU 0 1 2 3 4 5 6 7 8 k-means LogReg GPU Cluster Level 3: Machine learning kernels run on a 20 node Amazon cluster (left, center) and on a 4 node GPU cluster connected within a single rack.
  • 25. Relational + ML /* TensorFlow inference as UDF */ val q = spark.sql("select ... from data where class = findNearestCluster(...) group by class") flare(q).show
  • 28. Grégory Essertel Ruby Tahboub James Decker FLARE TEAM