SlideShare a Scribd company logo
Deep Dive into Project
Tungsten: Bringing Spark
Closer to Bare Metal
Josh Rosen (@jshrsn)
June 16, 2015
About Databricks
Offers a hosted service:
•  Spark on EC2
•  Notebooks
•  Plot visualizations
•  Cluster management
•  Scheduled jobs
2
Founded by creators of Spark and remains largest contributor
Goals of Project Tungsten
Substantially improve the memory and CPU efficiency of
Spark applications .
Push performance closer to the limits of modern
hardware.
3
In this talk
4
• Motivation: why we’re focusing on compute instead of IO
• How Tungsten optimizes memory + CPU
• Case study: aggregation
• Case study: record sorting
• Performance results
• Roadmap + next steps
Many big data workloads are now
compute bound
5
NSDI’15:
•  “Network optimizations can only reduce job completion time by
a median of at most 2%.”
•  “Optimizing or eliminating disk accesses can only reduce job
completion time by a median of at most 19%.”
•  We’ve observed similar characteristics in many Databricks Cloud
customer workloads.
Why is CPU the new bottleneck?
6
•  Hardware has improved:
–  Increasingly large aggregate IO bandwidth, such as 10Gbps links in
networks
–  High bandwidth SSD’s or striped HDD arrays for storage
•  Spark’s IO has been optimized:
–  many workloads now avoid significant disk IO by pruning input data
that is not needed in a given job
–  new shuffle and network layer implementations
•  Data formats have improved:
–  Parquet, binary data formats
•  Serialization and hashing are CPU-bound bottlenecks
How Tungsten improves CPU & memory
efficiency
•  Memory Management and Binary Processing: leverage
application semantics to manage memory explicitly and
eliminate the overhead of JVM object model and garbage
collection
•  Cache-aware computation: algorithms and data structures to
exploit memory hierarchy
•  Code generation: exploit modern compilers and CPUs; allow
efficient operation directly on binary data
7
8
The overheads of Java objects
“abcd”
9
•  Native: 4 bytes with UTF-8 encoding
•  Java: 48 bytes
java.lang.String object internals:	
OFFSET SIZE TYPE DESCRIPTION VALUE	
0 4 (object header) ...	
4 4 (object header) ...	
8 4 (object header) ...	
12 4 char[] String.value []	
16 4 int String.hash 0	
20 4 int String.hash32 0	
Instance size: 24 bytes (reported by Instrumentation API)	
12 byte object header
8 byte hashcode
20 bytes of overhead + 8 bytes for chars
Garbage collection challenges
•  Many big data workloads create objects in ways that are
unfriendly to regular Java GC.
•  Guest blog on GC tuning: tinyurl.com/db-gc-tuning
10
eden	
   S0	
   S1	
   tenured	
   permanent	
  
Permanent GenerationOld GenerationYoung Generation
Survivor Space
sun.misc.Unsafe
11
•  JVM internal API for directly manipulating memory without
safety checks (hence “unsafe”)
•  We use this API to build data structures in both on- and off-heap
memory
Data	
  
structures	
  
with	
  pointers	
  
Flat	
  data	
  
structures	
  
Complex	
  
examples	
  
Java object-based row representation
12
3 fields of type (int, string, string)
with value (123, “data”, “bricks”)
GenericMutableRow	
  
Array	
   String(“data”)	
  
String(“bricks”)	
  
5+ objects; high space overhead; expensive hashCode()
BoxedInteger(123)	
  
Tungsten’s UnsafeRow format
13
•  Bit set for tracking null values
•  Every column appears in the fixed-length values region:
–  Small values are inlined
–  For variable-length values (strings), we store a relative offset into the variable-
length data section
•  Rows are always 8-byte word aligned (size is multiple of 8 bytes)
•  Equality comparison and hashing can be performed on raw bytes without
requiring additional interpretation
null	
  bit	
  set	
  (1	
  bit/field)	
  
	
  
values	
  (8	
  bytes	
  /	
  field)	
  
	
  
	
  
variable	
  length	
  
	
  
Offset to var. length data
6 “bricks”
Example of an UnsafeRow
14
0x0 123 32L 48L 4 “data”
(123, “data”, “bricks”)
Null tracking bitmap
Offset to var. length data
Offset to var. length data Field lengths
How we encode memory addresses
15
•  Off heap: addresses are raw memory pointers.
•  On heap: addresses are base object + offset pairs.
•  We use our own “page table” abstraction to enable more
compact encoding of on-heap addresses:
0	
  
1	
  
…	
  
N	
  –	
  1	
  
Page table
Data	
  page	
  
(Java	
  object)	
  
page	
   offset	
  in	
  page	
  
16
java.util.HashMap
…	
  
key	
  ptr	
   value	
  ptr	
   next	
  
key	
   value	
  
array
•  Huge object overheads
•  Poor memory locality
•  Size estimation is hard
Memory	
  page	
  
hc	
  
17
Tungsten’s BytesToBytesMap
ptr	
  
…	
  
array
•  Low space overheads
•  Good memory locality, especially for scans
key	
   value	
   key	
   value	
  
key	
   value	
   key	
   value	
  
key	
   value	
   key	
   value	
  
Code generation
•  Generic evaluation of expression logic
is very expensive on the JVM
–  Virtual function calls
–  Branches based on expression type
–  Object creation due to primitive boxing
–  Memory consumption by boxed
primitive objects
•  Generating custom bytecode can
eliminate these overheads
18
9.33
9.36
36.65
Hand written
Code gen
Interpreted Projection
Evaluating “SELECT a + a + a”
(query time in seconds)
Code generation
•  Project Tungsten uses the Janino compiler to reduce code generation time.
•  Spark 1.5 will greatly expand the number of expressions that support code
generation:
–  SPARK-8159
19
Example: aggregation optimizations in
DataFrames and Spark SQL
20
df.groupBy("department").agg(max("age"), sum("expense"))
Example: aggregation optimizations in
DataFrames and Spark SQL
21
Input	
  Row	
   Grouping	
  Key	
   UnsafeRow	
  
project convert
BytesToBytesMap	
   scan
Update	
  
Aggregates	
  
Agg.	
  Result	
  
update in place
probe
SPARK-7080
Optimized record sorting in Spark SQL +
DataFrames (SPARK-7082)
22
pointer	
  
•  AlphaSort-style prefix sort:
–  Store prefixes of sort keys inside the sort pointer array
–  During sort, compare prefixes to short-circuit and avoid full record comparisons
•  Use this to build external sort-merge join to support joins larger than memory
record	
  
Key	
  prefix	
   pointer	
   record	
  
Naïve layout
Cache friendly layout
Initial performance results for agg. query
23
0
200
400
600
800
1000
1200
1x 2x 4x 8x 16x
Run time
(seconds)
Data set size (relative)
Default
Code Gen
Tungsten onheap
Tungsten offheap
Initial performance results for agg. query
24
0
50
100
150
200
1x 2x 4x 8x 16x
Average GC
time per
node
(seconds)
Data set size (relative)
Default
Code Gen
Tungsten onheap
Tungsten offheap
Project Tungsten Roadmap
25
Spark	
  1.4	
   Spark	
  1.5	
   Spark	
  1.6	
  
•  Binary processing for
aggregation in Spark
SQL / DataFrames
•  New Tungsten shuffle
manager
•  Compression &
serialization
optimizations
•  Optimized code
generation
•  Optimized sorting in
Spark SQL /
DataFrames
•  End-to-end processing
using binary data
representations
•  External aggregation
•  Vectorized / batched
processing
•  ???
Which Spark jobs can benefit from
Tungsten?
26
•  DataFrames
–  Java
–  Scala
–  Python
–  R
•  Spark SQL queries
•  Some Spark RDD API programs, via general serialization + compression
optimizations
logs.join(!
"users,!
"logs.userId == users.userId,!
""left_outer") !
.groupBy("userId").agg({"*": "count"})!
How to enable all of Spark 1.4’s
Tungsten optimizations
27
spark.sql.codegen = true	
spark.sql.unsafe.enabled = true	
spark.shuffle.manager = tungsten-sort	
Warning!	
  These	
  features	
  
are	
  experimental	
  in	
  1.4!	
  
Thank you.
Follow our progress on JIRA: SPARK-7075

More Related Content

What's hot (20)

PDF
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
Spark Summit
 
PPTX
Apache spark 소개 및 실습
동현 강
 
PDF
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
PDF
Spark shuffle introduction
colorant
 
PPTX
Programming in Spark using PySpark
Mostafa
 
PDF
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Databricks
 
PPTX
Spark
Heena Madan
 
PDF
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
PPTX
Apache BigtopによるHadoopエコシステムのパッケージング(Open Source Conference 2021 Online/Osaka...
NTT DATA Technology & Innovation
 
PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
PPTX
Apache Spark overview
DataArt
 
PPTX
Apache Spark Core
Girish Khanzode
 
PDF
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
PDF
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
PDF
Introduction to Apache Spark
Anastasios Skarlatidis
 
PDF
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
PDF
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 
From DataFrames to Tungsten: A Peek into Spark's Future-(Reynold Xin, Databri...
Spark Summit
 
Apache spark 소개 및 실습
동현 강
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
Databricks
 
Spark shuffle introduction
colorant
 
Programming in Spark using PySpark
Mostafa
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Databricks
 
A Deep Dive into Query Execution Engine of Spark SQL
Databricks
 
Apache BigtopによるHadoopエコシステムのパッケージング(Open Source Conference 2021 Online/Osaka...
NTT DATA Technology & Innovation
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
Apache Spark overview
DataArt
 
Apache Spark Core
Girish Khanzode
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Top 5 Mistakes When Writing Spark Applications
Spark Summit
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 
Introduction to Apache Spark
Anastasios Skarlatidis
 
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
On Improving Broadcast Joins in Apache Spark SQL
Databricks
 

Viewers also liked (6)

PDF
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
 
PPTX
Parallelizing Existing R Packages with SparkR
Databricks
 
PDF
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
Legacy Typesafe (now Lightbend)
 
PDF
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
PDF
Map reduce vs spark
Tudor Lapusan
 
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Databricks
 
Parallelizing Existing R Packages with SparkR
Databricks
 
[Sneak Preview] Apache Spark: Preparing for the next wave of Reactive Big Data
Legacy Typesafe (now Lightbend)
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 
Map reduce vs spark
Tudor Lapusan
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
Ad

Similar to Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks) (20)

PDF
Anatomy of in memory processing in Spark
datamantra
 
PDF
New Developments in Spark
Databricks
 
PDF
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
PDF
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
 
PPTX
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
PDF
Strata NYC 2015 - What's coming for the Spark community
Databricks
 
PDF
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
PDF
Memory Management in Apache Spark
Databricks
 
PDF
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
PDF
Java in 21st century: Are you thinking far enough ahead ?
Steve Wallin
 
PDF
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
PDF
Memory efficient java tutorial practices and challenges
mustafa sarac
 
PDF
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
PDF
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Chris Richardson
 
PDF
Avoiding big data antipatterns
grepalex
 
PPTX
Spark - Migration Story
Roman Chukh
 
PPT
How to Stop Worrying and Start Caching in Java
srisatish ambati
 
PDF
Collections forceawakens
RichardWarburton
 
PDF
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
PPTX
Intro to Spark development
Spark Summit
 
Anatomy of in memory processing in Spark
datamantra
 
New Developments in Spark
Databricks
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Databricks
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Databricks
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
Strata NYC 2015 - What's coming for the Spark community
Databricks
 
Challenges in Maintaining a High Performance Search Engine Written in Java
lucenerevolution
 
Memory Management in Apache Spark
Databricks
 
2021 04-20 apache arrow and its impact on the database industry.pptx
Andrew Lamb
 
Java in 21st century: Are you thinking far enough ahead ?
Steve Wallin
 
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
Memory efficient java tutorial practices and challenges
mustafa sarac
 
Polygot persistence for Java Developers - August 2011 / @Oakjug
Chris Richardson
 
SQL? NoSQL? NewSQL?!? What’s a Java developer to do? - JDC2012 Cairo, Egypt
Chris Richardson
 
Avoiding big data antipatterns
grepalex
 
Spark - Migration Story
Roman Chukh
 
How to Stop Worrying and Start Caching in Java
srisatish ambati
 
Collections forceawakens
RichardWarburton
 
JavaOne 2013: Memory Efficient Java
Chris Bailey
 
Intro to Spark development
Spark Summit
 
Ad

More from Spark Summit (20)

PDF
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
PDF
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
PDF
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
PDF
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
PDF
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
PDF
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
PDF
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
PDF
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
PDF
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
PDF
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
PDF
Goal Based Data Production with Sim Simeonov
Spark Summit
 
PDF
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
PDF
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
PDF
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 

Recently uploaded (20)

PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPT
deep dive data management sharepoint apps.ppt
novaprofk
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PDF
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
deep dive data management sharepoint apps.ppt
novaprofk
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
R Cookbook - Processing and Manipulating Geological spatial data with R.pdf
OtnielSimopiaref2
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Data Chunking Strategies for RAG in 2025.pdf
Tamanna
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Merits and Demerits of DBMS over File System & 3-Tier Architecture in DBMS
MD RIZWAN MOLLA
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 

Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Rosen, Databricks)

  • 1. Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal Josh Rosen (@jshrsn) June 16, 2015
  • 2. About Databricks Offers a hosted service: •  Spark on EC2 •  Notebooks •  Plot visualizations •  Cluster management •  Scheduled jobs 2 Founded by creators of Spark and remains largest contributor
  • 3. Goals of Project Tungsten Substantially improve the memory and CPU efficiency of Spark applications . Push performance closer to the limits of modern hardware. 3
  • 4. In this talk 4 • Motivation: why we’re focusing on compute instead of IO • How Tungsten optimizes memory + CPU • Case study: aggregation • Case study: record sorting • Performance results • Roadmap + next steps
  • 5. Many big data workloads are now compute bound 5 NSDI’15: •  “Network optimizations can only reduce job completion time by a median of at most 2%.” •  “Optimizing or eliminating disk accesses can only reduce job completion time by a median of at most 19%.” •  We’ve observed similar characteristics in many Databricks Cloud customer workloads.
  • 6. Why is CPU the new bottleneck? 6 •  Hardware has improved: –  Increasingly large aggregate IO bandwidth, such as 10Gbps links in networks –  High bandwidth SSD’s or striped HDD arrays for storage •  Spark’s IO has been optimized: –  many workloads now avoid significant disk IO by pruning input data that is not needed in a given job –  new shuffle and network layer implementations •  Data formats have improved: –  Parquet, binary data formats •  Serialization and hashing are CPU-bound bottlenecks
  • 7. How Tungsten improves CPU & memory efficiency •  Memory Management and Binary Processing: leverage application semantics to manage memory explicitly and eliminate the overhead of JVM object model and garbage collection •  Cache-aware computation: algorithms and data structures to exploit memory hierarchy •  Code generation: exploit modern compilers and CPUs; allow efficient operation directly on binary data 7
  • 8. 8
  • 9. The overheads of Java objects “abcd” 9 •  Native: 4 bytes with UTF-8 encoding •  Java: 48 bytes java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) ... 4 4 (object header) ... 8 4 (object header) ... 12 4 char[] String.value [] 16 4 int String.hash 0 20 4 int String.hash32 0 Instance size: 24 bytes (reported by Instrumentation API) 12 byte object header 8 byte hashcode 20 bytes of overhead + 8 bytes for chars
  • 10. Garbage collection challenges •  Many big data workloads create objects in ways that are unfriendly to regular Java GC. •  Guest blog on GC tuning: tinyurl.com/db-gc-tuning 10 eden   S0   S1   tenured   permanent   Permanent GenerationOld GenerationYoung Generation Survivor Space
  • 11. sun.misc.Unsafe 11 •  JVM internal API for directly manipulating memory without safety checks (hence “unsafe”) •  We use this API to build data structures in both on- and off-heap memory Data   structures   with  pointers   Flat  data   structures   Complex   examples  
  • 12. Java object-based row representation 12 3 fields of type (int, string, string) with value (123, “data”, “bricks”) GenericMutableRow   Array   String(“data”)   String(“bricks”)   5+ objects; high space overhead; expensive hashCode() BoxedInteger(123)  
  • 13. Tungsten’s UnsafeRow format 13 •  Bit set for tracking null values •  Every column appears in the fixed-length values region: –  Small values are inlined –  For variable-length values (strings), we store a relative offset into the variable- length data section •  Rows are always 8-byte word aligned (size is multiple of 8 bytes) •  Equality comparison and hashing can be performed on raw bytes without requiring additional interpretation null  bit  set  (1  bit/field)     values  (8  bytes  /  field)       variable  length     Offset to var. length data
  • 14. 6 “bricks” Example of an UnsafeRow 14 0x0 123 32L 48L 4 “data” (123, “data”, “bricks”) Null tracking bitmap Offset to var. length data Offset to var. length data Field lengths
  • 15. How we encode memory addresses 15 •  Off heap: addresses are raw memory pointers. •  On heap: addresses are base object + offset pairs. •  We use our own “page table” abstraction to enable more compact encoding of on-heap addresses: 0   1   …   N  –  1   Page table Data  page   (Java  object)   page   offset  in  page  
  • 16. 16 java.util.HashMap …   key  ptr   value  ptr   next   key   value   array •  Huge object overheads •  Poor memory locality •  Size estimation is hard
  • 17. Memory  page   hc   17 Tungsten’s BytesToBytesMap ptr   …   array •  Low space overheads •  Good memory locality, especially for scans key   value   key   value   key   value   key   value   key   value   key   value  
  • 18. Code generation •  Generic evaluation of expression logic is very expensive on the JVM –  Virtual function calls –  Branches based on expression type –  Object creation due to primitive boxing –  Memory consumption by boxed primitive objects •  Generating custom bytecode can eliminate these overheads 18 9.33 9.36 36.65 Hand written Code gen Interpreted Projection Evaluating “SELECT a + a + a” (query time in seconds)
  • 19. Code generation •  Project Tungsten uses the Janino compiler to reduce code generation time. •  Spark 1.5 will greatly expand the number of expressions that support code generation: –  SPARK-8159 19
  • 20. Example: aggregation optimizations in DataFrames and Spark SQL 20 df.groupBy("department").agg(max("age"), sum("expense"))
  • 21. Example: aggregation optimizations in DataFrames and Spark SQL 21 Input  Row   Grouping  Key   UnsafeRow   project convert BytesToBytesMap   scan Update   Aggregates   Agg.  Result   update in place probe SPARK-7080
  • 22. Optimized record sorting in Spark SQL + DataFrames (SPARK-7082) 22 pointer   •  AlphaSort-style prefix sort: –  Store prefixes of sort keys inside the sort pointer array –  During sort, compare prefixes to short-circuit and avoid full record comparisons •  Use this to build external sort-merge join to support joins larger than memory record   Key  prefix   pointer   record   Naïve layout Cache friendly layout
  • 23. Initial performance results for agg. query 23 0 200 400 600 800 1000 1200 1x 2x 4x 8x 16x Run time (seconds) Data set size (relative) Default Code Gen Tungsten onheap Tungsten offheap
  • 24. Initial performance results for agg. query 24 0 50 100 150 200 1x 2x 4x 8x 16x Average GC time per node (seconds) Data set size (relative) Default Code Gen Tungsten onheap Tungsten offheap
  • 25. Project Tungsten Roadmap 25 Spark  1.4   Spark  1.5   Spark  1.6   •  Binary processing for aggregation in Spark SQL / DataFrames •  New Tungsten shuffle manager •  Compression & serialization optimizations •  Optimized code generation •  Optimized sorting in Spark SQL / DataFrames •  End-to-end processing using binary data representations •  External aggregation •  Vectorized / batched processing •  ???
  • 26. Which Spark jobs can benefit from Tungsten? 26 •  DataFrames –  Java –  Scala –  Python –  R •  Spark SQL queries •  Some Spark RDD API programs, via general serialization + compression optimizations logs.join(! "users,! "logs.userId == users.userId,! ""left_outer") ! .groupBy("userId").agg({"*": "count"})!
  • 27. How to enable all of Spark 1.4’s Tungsten optimizations 27 spark.sql.codegen = true spark.sql.unsafe.enabled = true spark.shuffle.manager = tungsten-sort Warning!  These  features   are  experimental  in  1.4!  
  • 28. Thank you. Follow our progress on JIRA: SPARK-7075