SlideShare a Scribd company logo
TeraCache: Efficient Caching over
Fast Storage Devices
Iacovos G. Kolokasis1,2, Anastasios Papagiannis1,2, Foivos Zakkak3, Shoaib Akram4,
Christos Kozanitis2, Polyvios Pratikakis1,2, and Angelos Bilas1,2
1University of Crete
2Foundation of Research and Technology Hellas (FORTH), Greece
3Red Hat, Inc.
4Australian National University
Spark Caching Mechanism
â–Ş Stores the result of an RDD
â–Ş Essential when an RDD is used across
multiple Spark jobs
â–Ş Caching avoids recomputation and
reduces execution time
â–Ş Effective for iterative workloads
(e.g., ML, graph processing)
â–Ş How much data do we need to cache?
Storage Level
MEMORY_ONLY
MEMORY_AND_DISK
DISK_ONLY
OFF_HEAP
Source: https://spark.apache.org/docs/latest/rdd-programming-
guide.html
2
Increasing Memory Demands!
â–Ş Analytics datasets grow at high rate
â–Ş Today ~50ZB
â–Ş By 2025 ~175ZB
â–Ş Typical deployments use roughly as
much DRAM as the input dataset
â–Ş Typically cached data is even larger
than the input dataset
50ZB
175ZB
3.5x
Source: Seagate – The Digitization of the World
3
Cached Data Size Matters
â–Ş In-memory caching needs a lot of
DRAM
â–Ş DRAM density difficult to increase
â–Ş Fast storage (NVMe) scales to
TBs/device
â–Ş Spark already uses fast storage for
cached data – However, at high cost
Workload
Input
Dataset
(GB)
Cached
Rdds
(GB)
Linear Regression (LR)
64
182
Log. Regression (LgR) 160
SVM 188
4
3x
Dilemma: On-heap vs Off-heap NVMe Caching
Executor
Memory
Execution Memory Storage Memory
Executor
Memory
Pros Cons
On-heap
Cache
No Serialization High GC
Off-heap
Cache
Low GC
High
Serialization
Can we avoid
Serialization and reduce GC?
Serialization / Deserialization
Execution Memory Storage Memory
5
GC GC
Cached Objects Behave Differently
Dataset
Create RDD
Persist
Operations
Unpersist
GC
Spark App
Java Heap
6
Cached Objects Behave Differently
Dataset
Create RDD
Persist
Operations
Unpersist
GC
Create RDDs
Spark App
Java Heap
7
Cached Objects Behave Differently
Dataset
Persist
Operations
Unpersist
GC
Create RDDs
Persist
Spark App
Java Heap
Persist
Operations
â–Ş GC between persist-unpersist extremely wasteful
â–Ş GC scans all objects in the heap
8
Cached RDDs
Cached Objects Behave Differently
Dataset
Persist
Operations
Unpersist
GC
Create RDDs
Spark App
Java Heap
Unpersist
â–Ş GC reclaim cached RDDs after unpersist
9
Our Approach: Treat Cached Objects Differently
â–Ş Objects in JAVA follow generational hypothesis
â–Ş Opportunity: Nomadic hypothesis observation
â–Ş Spark cached objects are
â–Ş Long-lived: Used across multiple Spark jobs (cache)
â–Ş Intermittently-accessed: Long intervals without access (NVMe)
â–Ş Grouped life-times: RDD objects leave the cache at the same time (unpersist)
â–Ş Place cached objects in a special heap
10
TeraCache: Introduce a Second JVM heap on NVMe
â–Ş Execution Heap remains as a garbage collected heap
â–Ş Maintains the JVM heap for execution purposes
â–Ş The second TeraCache heap has two significant advantages
â–Ş No GC: Use persist/unpersist semantics to avoid GC
â–Ş No Serialization/Deserialization: Use memory-mapped I/O
11
TeraCache Design Overview
TeraCache: Design Overview
Execution Memory Storage Memory
JVM heap TeraCacheJVM
Spark Executor
DR1 DR2DRAM
NVMe SSD
mmap()
13
Spark Knocks on the JVM Door
Spark Application
Spark
Runtime
JVM
rdd.persist()
- Store RDD to Storage Memory
- Notify JVM to mark RDD object
â–Ş Spark notifies JVM for RDD caching
â–Ş At persist/unpersist operations
â–Ş Add new TeraFlag word in JVM objects
â–Ş JVM creates new object, sets TeraFlag
JVM heap TeraCache
14
Spark Knocks on the JVM Door
Spark Application
Spark
Runtime
JVM
rdd.persist()
- Store RDD to Storage Memory
- Notify JVM to mark RDD object
â–Ş Spark notifies JVM for RDD caching
â–Ş At persist/unpersist operations
â–Ş Add new TeraFlag word in JVM objects
â–Ş JVM creates new object, sets TeraFlag
â–Ş Move to TeraCache during next full GC
JVM heap TeraCache
15
TeraCache Design: Avoid GC
How to Avoid GC in TeraCache?
â–Ş Disallow backward pointers to Heap
â–Ş Move transitive closure in TeraCache
JVM heap TeraCache
17
How To Avoid GC in TeraCache?
â–Ş Disallow backward pointers to Heap
â–Ş Move transitive closure in TeraCache
â–Ş Allow forward pointers from Heap
â–Ş Objects in TeraCache do not move
â–Ş Fence GC from following forward pointers
JVM heap TeraCache
JVM heap TeraCache
18
Organize TeraCache in Regions
â–Ş Objects that belong to the same RDD
have similar life-time
â–Ş Organize TeraCache in regions
â–Ş Place objects in regions based on life-time
â–Ş Dynamic size of regions
â–Ş Bulk free
â–Ş Reclaim entire region
...
19
JVM heap TeraCache
Bulk Free Regions
â–Ş To provide correct and bulk free
â–Ş Allow only pointers within regions
â–Ş Merge regions with crossing
pointers when objects move to TeraCache
â–Ş Keep a bit map with live regions
â–Ş Track reachable regions from JVM heap
in every GC
â–Ş During GC marking phase identify
active regions
â–Ş Mark the bit array if there is a pointer from
the JVM heap to a TeraCache region
JVM heap TeraCache
JVM heap TeraCache
20
TeraCache Design: Avoid Serialization
No Serialization→Memory Mapped I/O
â–Ş MMIO allows same data format on memory and device
â–Ş No explicit device I/O - Only accesses using load/store
â–Ş Linux Kernel already supports required mechanisms for MMIO
â–Ş We use FastMap [ATC'20]: Optimize scalability of Linux MMIO
22
Competition for DRAM Resource
â–Ş Execution Memory must reside in DRAM
â–Ş A lot of short-lived data
â–Ş We need large DR1
â–Ş Cached objects are accessed as well
â–Ş E.g., Iterative jobs reuse cached data
â–Ş We need large DR2
â–Ş Can we statically divide DRAM between
the heaps?
Execution Memory Storage Memory
JVM heap TeraCache
DR1 DR2DRAM
JVM
Executor
NVMe SSD
mmap()
23
Dividing DRAM between Heaps
â–Ş KMeans (KM)-jobs produce more
short-lived data
â–Ş More minor GCs
â–Ş More space for DR1
â–Ş Linear Regression (LR)-jobs reuse
more cached data
â–Ş More page faults/s
â–Ş More space for DR2
â–Ş Dynamic Resizing of DR1, DR2
â–Ş Based on page-fault rate in MMIO
â–Ş Based on minor GCs
3x
2x
24
DR1 Size (GB) - DRAM = 32GB
Preliminary Evaluation
Early Prototype Implementation
â–Ş We implement a prototype of TeraCache based on ParallelGC
â–Ş Place New Generation on DRAM
â–Ş Place Old Generation on fast storage device
â–Ş Explicitly disable GC on Old Generation
â–Ş Remaining to be implemented
â–Ş Cached RDDs reclamation
â–Ş Dynamic DR1/DR2 resizing
â–Ş Evaluation
â–Ş GC overhead
â–Ş Serialization overhead
26
TeraCache Improves Performance by 25%
â–Ş Compared to Serialization: TC better up to 37% (on average 25%)
â–Ş Compared to GC + Linux swap: TC better up to 2x
2x
37%
SW – Linux Kernel Swap
HY – MEMORY_AND_DISK
TC - TeraCache
27
TeraCache Reduces GC Time by up to 50%
50%
HY – MEMORY_AND_DISK
TC - TeraCache
28
Conclusions
TeraCache: Efficient Caching over Fast Storage
â–Ş Spark incurs high overhead for caching RDDs
â–Ş We observe: Spark cached data follow a nomadic hypothesis
â–Ş We introduce TeraCache which both reduces GC and eliminates
serialization by using two heaps (generational, nomadic)
â–Ş We improve performance of Spark ML workloads by 25% (avg)
â–Ş Currently we are working on the full prototype
30
Iacovos G. Kolokasis
kolokasis@ics.forth.gr
www.csd.uoc.gr/~kolokasis
Thank you for your attention
This work is supported by the EU Horizon 2020 Evolve project (#825061)
Anastasios Papagiannis is supported by Facebook Graduate Fellowship
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

PPTX
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
PDF
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Spark Summit
 
PPTX
A Comparative Performance Evaluation of Apache Flink
Dongwon Kim
 
PPTX
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataWorks Summit
 
PDF
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
PDF
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
PDF
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
PDF
Parquet performance tuning: the missing guide
Ryan Blue
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Spark Summit
 
A Comparative Performance Evaluation of Apache Flink
Dongwon Kim
 
Beyond unit tests: Deployment and testing for Hadoop/Spark workflows
DataWorks Summit
 
DataEngConf SF16 - Collecting and Moving Data at Scale
Hakka Labs
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
Chester Chen
 
Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...
DataWorks Summit
 
Parquet performance tuning: the missing guide
Ryan Blue
 

What's hot (20)

PDF
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
PPTX
Realtime olap architecture in apache kylin 3.0
Shi Shao Feng
 
PDF
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
Rakuten Group, Inc.
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PDF
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
 
PDF
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Shelan Perera
 
PPTX
Time-Series Apache HBase
HBaseCon
 
PDF
Transactional writes to cloud storage with Eric Liang
Databricks
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
PDF
Set Up & Operate Real-Time Data Loading into Hadoop
Continuent
 
PDF
Replicate from Oracle to data warehouses and analytics
Continuent
 
PDF
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
 
PDF
From docker to kubernetes: running Apache Hadoop in a cloud native way
DataWorks Summit
 
PPTX
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Spark Summit
 
PDF
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
Michael Stack
 
PDF
Memory Management in Apache Spark
Databricks
 
PPTX
CaffeOnSpark Update: Recent Enhancements and Use Cases
DataWorks Summit
 
PDF
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Scalable Data Science with SparkR: Spark Summit East talk by Felix Cheung
Spark Summit
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Realtime olap architecture in apache kylin 3.0
Shi Shao Feng
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
Rakuten Group, Inc.
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
ORC 2015: Faster, Better, Smaller
The Apache Software Foundation
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Shelan Perera
 
Time-Series Apache HBase
HBaseCon
 
Transactional writes to cloud storage with Eric Liang
Databricks
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 
Set Up & Operate Real-Time Data Loading into Hadoop
Continuent
 
Replicate from Oracle to data warehouses and analytics
Continuent
 
A Spark Framework For < $100, < 1 Hour, Accurate Personalized DNA Analy...
Spark Summit
 
From docker to kubernetes: running Apache Hadoop in a cloud native way
DataWorks Summit
 
Leverage Mesos for running Spark Streaming production jobs by Iulian Dragos a...
Spark Summit
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
Michael Stack
 
Memory Management in Apache Spark
Databricks
 
CaffeOnSpark Update: Recent Enhancements and Use Cases
DataWorks Summit
 
Re-Architecting Spark For Performance Understandability
Jen Aman
 
Ad

Similar to TeraCache: Efficient Caching Over Fast Storage Devices (20)

PDF
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
PDF
Software Design for Persistent Memory Systems
C4Media
 
PDF
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...
harvraja
 
PPTX
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
PDF
Application Caching: The Hidden Microservice (SAConf)
Scott Mansfield
 
PDF
NUMA and Java Databases
Raghavendra Prabhu
 
PDF
In-Memory Computing - The Big Picture
Markus Kett
 
PDF
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
PPTX
Jug Lugano - Scale over the limits
Davide Carnevali
 
PPTX
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Ontico
 
PDF
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
PDF
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
PDF
Scaling Your Cache And Caching At Scale
Alex Miller
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
PPTX
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
PPTX
The age of rename() is over
Steve Loughran
 
PDF
Big data processing meets non-volatile memory: opportunities and challenges
DataWorks Summit
 
PDF
OpenDS_Jazoon2010
Ludovic Poitou
 
PPTX
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Speedment, Inc.
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Databricks
 
Software Design for Persistent Memory Systems
C4Media
 
JavaOne-2013: Save Scarce Resources by Managing Terabytes of Objects off-heap...
harvraja
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
In-Memory Computing Summit
 
Application Caching: The Hidden Microservice (SAConf)
Scott Mansfield
 
NUMA and Java Databases
Raghavendra Prabhu
 
In-Memory Computing - The Big Picture
Markus Kett
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield
 
Jug Lugano - Scale over the limits
Davide Carnevali
 
Developing Software for Persistent Memory / Willhalm Thomas (Intel)
Ontico
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Databricks
 
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Databricks
 
Scaling Your Cache And Caching At Scale
Alex Miller
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
The age of rename() is over
Steve Loughran
 
Big data processing meets non-volatile memory: opportunities and challenges
DataWorks Summit
 
OpenDS_Jazoon2010
Ludovic Poitou
 
Java one2015 - Work With Hundreds of Hot Terabytes in JVMs
Speedment, Inc.
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

Recently uploaded (20)

PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PDF
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
Key_Statistical_Techniques_in_Analytics_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 

TeraCache: Efficient Caching Over Fast Storage Devices

  • 1. TeraCache: Efficient Caching over Fast Storage Devices Iacovos G. Kolokasis1,2, Anastasios Papagiannis1,2, Foivos Zakkak3, Shoaib Akram4, Christos Kozanitis2, Polyvios Pratikakis1,2, and Angelos Bilas1,2 1University of Crete 2Foundation of Research and Technology Hellas (FORTH), Greece 3Red Hat, Inc. 4Australian National University
  • 2. Spark Caching Mechanism â–Ş Stores the result of an RDD â–Ş Essential when an RDD is used across multiple Spark jobs â–Ş Caching avoids recomputation and reduces execution time â–Ş Effective for iterative workloads (e.g., ML, graph processing) â–Ş How much data do we need to cache? Storage Level MEMORY_ONLY MEMORY_AND_DISK DISK_ONLY OFF_HEAP Source: https://spark.apache.org/docs/latest/rdd-programming- guide.html 2
  • 3. Increasing Memory Demands! â–Ş Analytics datasets grow at high rate â–Ş Today ~50ZB â–Ş By 2025 ~175ZB â–Ş Typical deployments use roughly as much DRAM as the input dataset â–Ş Typically cached data is even larger than the input dataset 50ZB 175ZB 3.5x Source: Seagate – The Digitization of the World 3
  • 4. Cached Data Size Matters â–Ş In-memory caching needs a lot of DRAM â–Ş DRAM density difficult to increase â–Ş Fast storage (NVMe) scales to TBs/device â–Ş Spark already uses fast storage for cached data – However, at high cost Workload Input Dataset (GB) Cached Rdds (GB) Linear Regression (LR) 64 182 Log. Regression (LgR) 160 SVM 188 4 3x
  • 5. Dilemma: On-heap vs Off-heap NVMe Caching Executor Memory Execution Memory Storage Memory Executor Memory Pros Cons On-heap Cache No Serialization High GC Off-heap Cache Low GC High Serialization Can we avoid Serialization and reduce GC? Serialization / Deserialization Execution Memory Storage Memory 5 GC GC
  • 6. Cached Objects Behave Differently Dataset Create RDD Persist Operations Unpersist GC Spark App Java Heap 6
  • 7. Cached Objects Behave Differently Dataset Create RDD Persist Operations Unpersist GC Create RDDs Spark App Java Heap 7
  • 8. Cached Objects Behave Differently Dataset Persist Operations Unpersist GC Create RDDs Persist Spark App Java Heap Persist Operations â–Ş GC between persist-unpersist extremely wasteful â–Ş GC scans all objects in the heap 8 Cached RDDs
  • 9. Cached Objects Behave Differently Dataset Persist Operations Unpersist GC Create RDDs Spark App Java Heap Unpersist â–Ş GC reclaim cached RDDs after unpersist 9
  • 10. Our Approach: Treat Cached Objects Differently â–Ş Objects in JAVA follow generational hypothesis â–Ş Opportunity: Nomadic hypothesis observation â–Ş Spark cached objects are â–Ş Long-lived: Used across multiple Spark jobs (cache) â–Ş Intermittently-accessed: Long intervals without access (NVMe) â–Ş Grouped life-times: RDD objects leave the cache at the same time (unpersist) â–Ş Place cached objects in a special heap 10
  • 11. TeraCache: Introduce a Second JVM heap on NVMe â–Ş Execution Heap remains as a garbage collected heap â–Ş Maintains the JVM heap for execution purposes â–Ş The second TeraCache heap has two significant advantages â–Ş No GC: Use persist/unpersist semantics to avoid GC â–Ş No Serialization/Deserialization: Use memory-mapped I/O 11
  • 13. TeraCache: Design Overview Execution Memory Storage Memory JVM heap TeraCacheJVM Spark Executor DR1 DR2DRAM NVMe SSD mmap() 13
  • 14. Spark Knocks on the JVM Door Spark Application Spark Runtime JVM rdd.persist() - Store RDD to Storage Memory - Notify JVM to mark RDD object â–Ş Spark notifies JVM for RDD caching â–Ş At persist/unpersist operations â–Ş Add new TeraFlag word in JVM objects â–Ş JVM creates new object, sets TeraFlag JVM heap TeraCache 14
  • 15. Spark Knocks on the JVM Door Spark Application Spark Runtime JVM rdd.persist() - Store RDD to Storage Memory - Notify JVM to mark RDD object â–Ş Spark notifies JVM for RDD caching â–Ş At persist/unpersist operations â–Ş Add new TeraFlag word in JVM objects â–Ş JVM creates new object, sets TeraFlag â–Ş Move to TeraCache during next full GC JVM heap TeraCache 15
  • 17. How to Avoid GC in TeraCache? â–Ş Disallow backward pointers to Heap â–Ş Move transitive closure in TeraCache JVM heap TeraCache 17
  • 18. How To Avoid GC in TeraCache? â–Ş Disallow backward pointers to Heap â–Ş Move transitive closure in TeraCache â–Ş Allow forward pointers from Heap â–Ş Objects in TeraCache do not move â–Ş Fence GC from following forward pointers JVM heap TeraCache JVM heap TeraCache 18
  • 19. Organize TeraCache in Regions â–Ş Objects that belong to the same RDD have similar life-time â–Ş Organize TeraCache in regions â–Ş Place objects in regions based on life-time â–Ş Dynamic size of regions â–Ş Bulk free â–Ş Reclaim entire region ... 19 JVM heap TeraCache
  • 20. Bulk Free Regions â–Ş To provide correct and bulk free â–Ş Allow only pointers within regions â–Ş Merge regions with crossing pointers when objects move to TeraCache â–Ş Keep a bit map with live regions â–Ş Track reachable regions from JVM heap in every GC â–Ş During GC marking phase identify active regions â–Ş Mark the bit array if there is a pointer from the JVM heap to a TeraCache region JVM heap TeraCache JVM heap TeraCache 20
  • 21. TeraCache Design: Avoid Serialization
  • 22. No Serialization→Memory Mapped I/O â–Ş MMIO allows same data format on memory and device â–Ş No explicit device I/O - Only accesses using load/store â–Ş Linux Kernel already supports required mechanisms for MMIO â–Ş We use FastMap [ATC'20]: Optimize scalability of Linux MMIO 22
  • 23. Competition for DRAM Resource â–Ş Execution Memory must reside in DRAM â–Ş A lot of short-lived data â–Ş We need large DR1 â–Ş Cached objects are accessed as well â–Ş E.g., Iterative jobs reuse cached data â–Ş We need large DR2 â–Ş Can we statically divide DRAM between the heaps? Execution Memory Storage Memory JVM heap TeraCache DR1 DR2DRAM JVM Executor NVMe SSD mmap() 23
  • 24. Dividing DRAM between Heaps â–Ş KMeans (KM)-jobs produce more short-lived data â–Ş More minor GCs â–Ş More space for DR1 â–Ş Linear Regression (LR)-jobs reuse more cached data â–Ş More page faults/s â–Ş More space for DR2 â–Ş Dynamic Resizing of DR1, DR2 â–Ş Based on page-fault rate in MMIO â–Ş Based on minor GCs 3x 2x 24 DR1 Size (GB) - DRAM = 32GB
  • 26. Early Prototype Implementation â–Ş We implement a prototype of TeraCache based on ParallelGC â–Ş Place New Generation on DRAM â–Ş Place Old Generation on fast storage device â–Ş Explicitly disable GC on Old Generation â–Ş Remaining to be implemented â–Ş Cached RDDs reclamation â–Ş Dynamic DR1/DR2 resizing â–Ş Evaluation â–Ş GC overhead â–Ş Serialization overhead 26
  • 27. TeraCache Improves Performance by 25% â–Ş Compared to Serialization: TC better up to 37% (on average 25%) â–Ş Compared to GC + Linux swap: TC better up to 2x 2x 37% SW – Linux Kernel Swap HY – MEMORY_AND_DISK TC - TeraCache 27
  • 28. TeraCache Reduces GC Time by up to 50% 50% HY – MEMORY_AND_DISK TC - TeraCache 28
  • 30. TeraCache: Efficient Caching over Fast Storage â–Ş Spark incurs high overhead for caching RDDs â–Ş We observe: Spark cached data follow a nomadic hypothesis â–Ş We introduce TeraCache which both reduces GC and eliminates serialization by using two heaps (generational, nomadic) â–Ş We improve performance of Spark ML workloads by 25% (avg) â–Ş Currently we are working on the full prototype 30
  • 31. Iacovos G. Kolokasis [email protected] www.csd.uoc.gr/~kolokasis Thank you for your attention This work is supported by the EU Horizon 2020 Evolve project (#825061) Anastasios Papagiannis is supported by Facebook Graduate Fellowship
  • 32. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.