SlideShare a Scribd company logo
Alluxio Community Day
Accelerating analytics workloads with Alluxio data
orchestration and Intel® Optane™ persistent memory
Ginger Gilsdorf
Software Engineer
2
Alluxio Community Day
June 24, 2021
▪ The hardware
• Xeon, PMem, Optane SSD
▪ The software
• Alluxio, TPC-DS, Spark
▪ The results
▪ The analysis
Hardware without software
just generates heat…
Agenda
Software without hardware
is just 1s and 0s…
-Doug Fisher
3
Alluxio Community Day
June 24, 2021
▪ The hardware
• Xeon, PMem, Optane SSD
▪ The software
• Alluxio, TPC-DS, Spark
▪ The results
▪ The analysis
Hardware without software
just generates heat…
Agenda
Software without hardware
is just 1s and 0s…
-Doug Fisher
Alluxio Community Day
June 24, 2021
4
Intel® Xeon® Scalable Processors
Ice Lake is here!
2nd Gen 3rd Gen
Cascade Lake Ice Lake
2019 April 2021
28 40
Up to 6 channels
@2933 MT/s
Up to 8 channels
@3200 MT/s
100 series
(Apache Pass)
200 series
(Barlow Pass)
Gen 3 Gen 4
1.00 1.461
1Please visit www.intel.com/3gen-xeon-config and use the corresponding performance
number 125 to access full system configuration and performance detail.
Cloud availability coming…
Alluxio Community Day
June 24, 2021
5
Intel Optane family
▪ PMem 200 series
• 32% higher bandwidth
• Up to 16% lower power
• Up to 4 TB per socket1
• Operating modes: Memory, App Direct,
Storage over App Direct, Mixed
Intel® Optane™ Technology
Intel Optane
Memory
Media
Intel Memory
and Storage
Controllers
Intel
Interconnect IP
Intel
Software
*Images may not reflect all current offerings.
See https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html
for details.
IntelOptane persistent memory (PMem)
• DIMM form factor
• 128, 256, 512 GB modules
• PMem; non-volatile memory (NVM)
• 200 series launched with 3rd Gen Xeon®
Scalable Processors
14 TB PMem per socket using 8x 512GB DIMMs (leaves 8x slots for DRAM).
Alluxio Community Day
June 24, 2021
6
Intel Optane family
Intel® Optane™ Technology
Intel Optane
Memory
Media
Intel Memory
and Storage
Controllers
Intel
Interconnect IP
Intel
Software
*Images may not reflect all current offerings.
See https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html
for details.
IntelOptane persistent memory
• DIMM form factor
• 128, 256, 512 GB modules
• PMem; non-volatile memory (NVM)
• 200 series launched with 3rd Gen Xeon®
Scalable Processors
Intel Optane DC SSDs
P4800X and P4801X Series
P5800X Series
Intel Optane Memory
M10, H10, H20 Series
7
Alluxio Community Day
June 24, 2021
▪ The hardware
• Xeon, PMem, Optane SSD
▪ The software
• Alluxio, TPC-DS, Spark
▪ The results
▪ The analysis
Hardware without software
just generates heat…
Agenda
Software without hardware
is just 1s and 0s…
-Doug Fisher
Alluxio Community Day
June 24, 2021
8
Alluxio & Intel
*Source: https://www.alluxio.io/intel/
Multi-year collaboration,
including 2nd & 3rd Gen
Xeon, PMem, and Optane
DC SSDs
Alluxio Community Day
June 24, 2021
9
TPC-DS
▪ Enterprise-class decision support
benchmark
▪ Measures performance of SQL-
based big data systems
▪ Models typical queries and data
maintenance tasks
▪ Performance metric: query
response time and throughput
• Scale factor options:
1, 3, 10, 30, 100 TB
Examples:
• For a given year, month, and
store manager calculate the
total store sales of any
combination of all brands.
• Select the top revenue-
generating products bought
by out-of-zip code customers
for a given year, month, and
manager.
Source: http://www.tpc.org/tpcds/
Alluxio Community Day
June 24, 2021
10
TPC-DS with Spark SQL
▪ Enterprise-class decision support
benchmark
▪ Measures performance of SQL-
based big data systems
▪ Models typical queries and data
maintenance tasks
▪ Performance metric: query
response time and throughput
• Scale factor options:
1, 3, 10, 30, 100 TB
Source: http://www.tpc.org/tpcds/
▪ Apache Spark provides the
distributed data processing
engine
▪ Version 2.0 supports all 99 TPC-
DS queries
▪ Convenient repo to setup the
TPC-DS toolkit, generate data,
and run the benchmark:
• https://github.com/IBM/spark-tpc-
ds-performance-test
11
Alluxio Community Day
June 24, 2021
▪ The hardware
• Xeon, PMem, Optane SSD
▪ The software
• Alluxio, TPC-DS, Spark
▪ The results
▪ The analysis
Hardware without software
just generates heat…
Agenda
Software without hardware
is just 1s and 0s…
-Doug Fisher
Alluxio Community Day
June 24, 2021
12
Test setup
▪ All on-premises testing
▪ Disaggregated compute & storage
• Compute running on Ice Lake
• Storage on Cascade Lake
▪ Alluxio caching layers tested:
Compute:
• 2x Xeon Platinum
8368 CPU
• 38 cores each @
2.40GHz
• Spark 2.4.7
• Alluxio 2.4.1
Compute:
• 2x Xeon Platinum
8368 CPU
• 38 cores each @
2.40GHz
• Spark 2.4.7
• Alluxio 2.4.1
Storage:
• 2x Xeon Platinum
8280 CPU
• 28 cores each @
2.70GHz
• Minio
Capacity/Compute node
(GB)
768
1024 PMem + 192 DRAM
800
Similar
cost
▪ Scale factor 3 TB;
Compressed data ~ 2.1 TB
*Storage over App Direct
Alluxio Community Day
June 24, 2021
13
Test results: Queries completed
▪ 99 TPC-DS queries total
▪ Some timed out (returned 0 rows)
DRAM PMem (SoAD) Optane SSD
1536 2048 1600
58 80 80
445 51 53
N/A 88% 88%
926.81 316.44 322.20
N/A 211.20 822.41
95.23 39.03 35.04
Samples →
14
Alluxio Community Day
June 24, 2021
▪ The hardware
• Xeon, PMem, Optane SSD
▪ The software
• Alluxio, TPC-DS, Spark
▪ The results
▪ The analysis
Hardware without software
just generates heat…
Agenda
Software without hardware
is just 1s and 0s…
-Doug Fisher
Alluxio Community Day
June 24, 2021
15
Caching layer differences
▪ Alluxio DRAM layer:
• Shares DRAM with workload
• Less capacity for caching
• More remote data access needed
▪ Alluxio PMem & Optane SSD:
• Cached data separate from DRAM
• More capacity; less remote access
▪ Clear impact to performance
Capacity/Compute node
(GB)
768
1024 PMem + 192 DRAM
800
*Storage over App Direct
Alluxio Community Day
June 24, 2021
16
PMem (SoAD) vs. Optane SSD
▪ Both using block storage
▪ PMem on memory bus;
▪ Optane SSD on PCIe
▪ Many similarities in CPU usage & hardware metrics
Average CPU utilization/
Operating frequency
PMem (SoAD) Optane SSD
14% / 3.22 13% / 3.22
20% / 3.25 21% / 3.23
22% / 3.21 19% / 3.21
26% / 3.21 20% / 3.21
PMem
Optane SSD
▪ Both measured DDR rate at 3200 MT/sec
Alluxio Community Day
June 24, 2021
17
PMem (SoAD) vs. Optane SSD
▪ A few differences…
▪ System-level metrics capturing both TPC-DS workload &
Alluxio data orchestration
Query 15 PMem (SoAD) Optane SSD
211.20 822.41
0.82 1.10
59,317 50,827
Ranges from 0
to 1,552
0 (N/A)
142 165
Alluxio Community Day
June 24, 2021
18
Conclusions & next steps
▪ Capacity matters!
• Impact to performance varies
▪ PMem offers higher capacities than similar-costing DRAM
▪ Optane SSD is also a good option in many cases
▪ Follow up work:
• More hardware metric analysis – Alluxio specific
• More testing, especially in cloud
19
Alluxio Community Day
June 24, 2021
▪ Alluxio & Intel Partnership
▪ alluxio.io/intel/
▪ Intel Optane Technology
▪ intel.com/optane
▪ Intel Developer Zone
▪ software.intel.com/persistent-
memory
▪ Persistent Memory Programming
▪ pmem.io
Questions?
Resources
20

More Related Content

What's hot (20)

PDF
Fluid: When Alluxio Meets Kubernetes
Alluxio, Inc.
 
PDF
Burst Presto & Spark workloads to AWS EMR with no data copies
Alluxio, Inc.
 
PDF
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
PDF
INTEL® XEON® SCALABLE PROCESSORS
Tyrone Systems
 
PDF
Hybrid data lake on google cloud with alluxio and dataproc
Alluxio, Inc.
 
PPTX
Walk Through a Software Defined Everything PoC
Ceph Community
 
PDF
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Databricks
 
PDF
Benefity Oracle Cloudu (4/4): Storage
MarketingArrowECS_CZ
 
PDF
How to Build a Cloud Native Stack for Analytics with Spark, Hive, and Alluxio...
Alluxio, Inc.
 
PDF
Apache Hudi: The Path Forward
Alluxio, Inc.
 
PDF
Accelerating Data Computation on Ceph Objects
Alluxio, Inc.
 
PPTX
The Importance of Fast, Scalable Storage for Today’s HPC
Intel IT Center
 
PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PDF
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
 
PDF
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
PDF
Accelerate Cloud Training with Alluxio
Alluxio, Inc.
 
PDF
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Alluxio, Inc.
 
PDF
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
inside-BigData.com
 
PDF
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
NetApp
 
PDF
Accelerating Hive with Alluxio on S3
Alluxio, Inc.
 
Fluid: When Alluxio Meets Kubernetes
Alluxio, Inc.
 
Burst Presto & Spark workloads to AWS EMR with no data copies
Alluxio, Inc.
 
From limited Hadoop compute capacity to increased data scientist efficiency
Alluxio, Inc.
 
INTEL® XEON® SCALABLE PROCESSORS
Tyrone Systems
 
Hybrid data lake on google cloud with alluxio and dataproc
Alluxio, Inc.
 
Walk Through a Software Defined Everything PoC
Ceph Community
 
Best Practices for Using Alluxio with Apache Spark with Cheng Chang and Haoyu...
Databricks
 
Benefity Oracle Cloudu (4/4): Storage
MarketingArrowECS_CZ
 
How to Build a Cloud Native Stack for Analytics with Spark, Hive, and Alluxio...
Alluxio, Inc.
 
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Accelerating Data Computation on Ceph Objects
Alluxio, Inc.
 
The Importance of Fast, Scalable Storage for Today’s HPC
Intel IT Center
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Securely Enhancing Data Access in Hybrid Cloud with Alluxio
Alluxio, Inc.
 
Architecting a Heterogeneous Data Platform Across Clusters, Regions, and Clouds
Alluxio, Inc.
 
Accelerate Cloud Training with Alluxio
Alluxio, Inc.
 
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Alluxio, Inc.
 
Performance Comparison of Intel Enterprise Edition Lustre and HDFS for MapRed...
inside-BigData.com
 
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
NetApp
 
Accelerating Hive with Alluxio on S3
Alluxio, Inc.
 

Similar to Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory (20)

PDF
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Community
 
PPTX
Impact of Intel Optane Technology on HPC
MemVerge
 
PPTX
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Tachyon Nexus, Inc.
 
PDF
Building an open memory-centric computing architecture using intel optane
UniFabric
 
PDF
April 2014 IBM announcement webcast
HELP400
 
PPTX
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Intel® Software
 
PDF
3.INTEL.Optane_on_ceph_v2.pdf
hellobank1
 
PDF
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Danielle Womboldt
 
PDF
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Community
 
PDF
Provisioning Servers Made Easy
All Things Open
 
PDF
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
PDF
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Odinot Stanislas
 
PPTX
Windows 10 Upgrade: The best bad idea you never had…
Arik Fletcher
 
PDF
Ceph Day Beijing - SPDK for Ceph
Danielle Womboldt
 
PDF
Ceph Day Beijing - SPDK in Ceph
Ceph Community
 
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
PPTX
ICSIPA 2017 presentation
MohamedShaafiee
 
PDF
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Danielle Womboldt
 
PDF
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Community
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Community
 
Impact of Intel Optane Technology on HPC
MemVerge
 
Presentation by TachyonNexus & Intel at Strata Singapore 2015
Tachyon Nexus, Inc.
 
Building an open memory-centric computing architecture using intel optane
UniFabric
 
April 2014 IBM announcement webcast
HELP400
 
Accelerate Game Development and Enhance Game Experience with Intel® Optane™ T...
Intel® Software
 
3.INTEL.Optane_on_ceph_v2.pdf
hellobank1
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Danielle Womboldt
 
Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...
Ceph Community
 
Provisioning Servers Made Easy
All Things Open
 
Evolution of Supermicro GPU Server Solution
NVIDIA Taiwan
 
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
Red_Hat_Storage
 
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Odinot Stanislas
 
Windows 10 Upgrade: The best bad idea you never had…
Arik Fletcher
 
Ceph Day Beijing - SPDK for Ceph
Danielle Womboldt
 
Ceph Day Beijing - SPDK in Ceph
Ceph Community
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
ICSIPA 2017 presentation
MohamedShaafiee
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Danielle Womboldt
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Community
 
Ad

More from Alluxio, Inc. (20)

PDF
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
PDF
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
PDF
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
PDF
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
PDF
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
PDF
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
PDF
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
PDF
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
PDF
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
PDF
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
PDF
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
Introduction to Apache Iceberg™ & Tableflow
Alluxio, Inc.
 
Optimizing Tiered Storage for Low-Latency Real-Time Analytics at AI Scale
Alluxio, Inc.
 
Meet in the Middle: Solving the Low-Latency Challenge for Agentic AI
Alluxio, Inc.
 
From Data Preparation to Inference: How Alluxio Speeds Up AI
Alluxio, Inc.
 
Best Practice for LLM Serving in the Cloud
Alluxio, Inc.
 
Meet You in the Middle: 1000x Performance for Parquet Queries on PB-Scale Dat...
Alluxio, Inc.
 
How Coupang Leverages Distributed Cache to Accelerate ML Model Training
Alluxio, Inc.
 
Alluxio Webinar | Inside Deepseek 3FS: A Deep Dive into AI-Optimized Distribu...
Alluxio, Inc.
 
AI/ML Infra Meetup | Building Production Platform for Large-Scale Recommendat...
Alluxio, Inc.
 
AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio, Inc.
 
AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, ...
Alluxio, Inc.
 
AI/ML Infra Meetup | Deployment, Discovery and Serving of LLMs at Uber Scale
Alluxio, Inc.
 
Alluxio Webinar | What’s New in Alluxio AI: 3X Faster Checkpoint File Creatio...
Alluxio, Inc.
 
AI/ML Infra Meetup | A Faster and More Cost Efficient LLM Inference Stack
Alluxio, Inc.
 
AI/ML Infra Meetup | Balancing Cost, Performance, and Scale - Running GPU/CPU...
Alluxio, Inc.
 
AI/ML Infra Meetup | RAYvolution - The Last Mile: Mastering AI Deployment wit...
Alluxio, Inc.
 
Alluxio Webinar | Accelerate AI: Alluxio 101
Alluxio, Inc.
 
AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI
Alluxio, Inc.
 
AI/ML Infra Meetup | Exploring Distributed Caching for Faster GPU Training wi...
Alluxio, Inc.
 
AI/ML Infra Meetup | Big Data and AI, Zoom Developers
Alluxio, Inc.
 
Ad

Recently uploaded (20)

PDF
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
PDF
Best Web development company in india 2025
Greenusys
 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PPTX
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
PPTX
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
PDF
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PPTX
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
PDF
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Ready Layer One: Intro to the Model Context Protocol
mmckenna1
 
Best Web development company in india 2025
Greenusys
 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Salesforce Experience Cloud Consultant.pdf
VALiNTRY360
 
Library_Management_System_PPT111111.pptx
nmtnissancrm
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
ERP Consulting Services and Solutions by Contetra Pvt Ltd
jayjani123
 
Why is partnering with a SaaS development company crucial for enterprise succ...
Nextbrain Technologies
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
BB FlashBack Pro 5.61.0.4843 With Crack Free Download
cracked shares
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Get Started with Maestro: Agent, Robot, and Human in Action – Session 5 of 5
klpathrudu
 
UITP Summit Meep Pitch may 2025 MaaS Rebooted
campoamor1
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 

Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory

  • 1. Alluxio Community Day Accelerating analytics workloads with Alluxio data orchestration and Intel® Optane™ persistent memory Ginger Gilsdorf Software Engineer
  • 2. 2 Alluxio Community Day June 24, 2021 ▪ The hardware • Xeon, PMem, Optane SSD ▪ The software • Alluxio, TPC-DS, Spark ▪ The results ▪ The analysis Hardware without software just generates heat… Agenda Software without hardware is just 1s and 0s… -Doug Fisher
  • 3. 3 Alluxio Community Day June 24, 2021 ▪ The hardware • Xeon, PMem, Optane SSD ▪ The software • Alluxio, TPC-DS, Spark ▪ The results ▪ The analysis Hardware without software just generates heat… Agenda Software without hardware is just 1s and 0s… -Doug Fisher
  • 4. Alluxio Community Day June 24, 2021 4 Intel® Xeon® Scalable Processors Ice Lake is here! 2nd Gen 3rd Gen Cascade Lake Ice Lake 2019 April 2021 28 40 Up to 6 channels @2933 MT/s Up to 8 channels @3200 MT/s 100 series (Apache Pass) 200 series (Barlow Pass) Gen 3 Gen 4 1.00 1.461 1Please visit www.intel.com/3gen-xeon-config and use the corresponding performance number 125 to access full system configuration and performance detail. Cloud availability coming…
  • 5. Alluxio Community Day June 24, 2021 5 Intel Optane family ▪ PMem 200 series • 32% higher bandwidth • Up to 16% lower power • Up to 4 TB per socket1 • Operating modes: Memory, App Direct, Storage over App Direct, Mixed Intel® Optane™ Technology Intel Optane Memory Media Intel Memory and Storage Controllers Intel Interconnect IP Intel Software *Images may not reflect all current offerings. See https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html for details. IntelOptane persistent memory (PMem) • DIMM form factor • 128, 256, 512 GB modules • PMem; non-volatile memory (NVM) • 200 series launched with 3rd Gen Xeon® Scalable Processors 14 TB PMem per socket using 8x 512GB DIMMs (leaves 8x slots for DRAM).
  • 6. Alluxio Community Day June 24, 2021 6 Intel Optane family Intel® Optane™ Technology Intel Optane Memory Media Intel Memory and Storage Controllers Intel Interconnect IP Intel Software *Images may not reflect all current offerings. See https://www.intel.com/content/www/us/en/architecture-and-technology/intel-optane-technology.html for details. IntelOptane persistent memory • DIMM form factor • 128, 256, 512 GB modules • PMem; non-volatile memory (NVM) • 200 series launched with 3rd Gen Xeon® Scalable Processors Intel Optane DC SSDs P4800X and P4801X Series P5800X Series Intel Optane Memory M10, H10, H20 Series
  • 7. 7 Alluxio Community Day June 24, 2021 ▪ The hardware • Xeon, PMem, Optane SSD ▪ The software • Alluxio, TPC-DS, Spark ▪ The results ▪ The analysis Hardware without software just generates heat… Agenda Software without hardware is just 1s and 0s… -Doug Fisher
  • 8. Alluxio Community Day June 24, 2021 8 Alluxio & Intel *Source: https://www.alluxio.io/intel/ Multi-year collaboration, including 2nd & 3rd Gen Xeon, PMem, and Optane DC SSDs
  • 9. Alluxio Community Day June 24, 2021 9 TPC-DS ▪ Enterprise-class decision support benchmark ▪ Measures performance of SQL- based big data systems ▪ Models typical queries and data maintenance tasks ▪ Performance metric: query response time and throughput • Scale factor options: 1, 3, 10, 30, 100 TB Examples: • For a given year, month, and store manager calculate the total store sales of any combination of all brands. • Select the top revenue- generating products bought by out-of-zip code customers for a given year, month, and manager. Source: http://www.tpc.org/tpcds/
  • 10. Alluxio Community Day June 24, 2021 10 TPC-DS with Spark SQL ▪ Enterprise-class decision support benchmark ▪ Measures performance of SQL- based big data systems ▪ Models typical queries and data maintenance tasks ▪ Performance metric: query response time and throughput • Scale factor options: 1, 3, 10, 30, 100 TB Source: http://www.tpc.org/tpcds/ ▪ Apache Spark provides the distributed data processing engine ▪ Version 2.0 supports all 99 TPC- DS queries ▪ Convenient repo to setup the TPC-DS toolkit, generate data, and run the benchmark: • https://github.com/IBM/spark-tpc- ds-performance-test
  • 11. 11 Alluxio Community Day June 24, 2021 ▪ The hardware • Xeon, PMem, Optane SSD ▪ The software • Alluxio, TPC-DS, Spark ▪ The results ▪ The analysis Hardware without software just generates heat… Agenda Software without hardware is just 1s and 0s… -Doug Fisher
  • 12. Alluxio Community Day June 24, 2021 12 Test setup ▪ All on-premises testing ▪ Disaggregated compute & storage • Compute running on Ice Lake • Storage on Cascade Lake ▪ Alluxio caching layers tested: Compute: • 2x Xeon Platinum 8368 CPU • 38 cores each @ 2.40GHz • Spark 2.4.7 • Alluxio 2.4.1 Compute: • 2x Xeon Platinum 8368 CPU • 38 cores each @ 2.40GHz • Spark 2.4.7 • Alluxio 2.4.1 Storage: • 2x Xeon Platinum 8280 CPU • 28 cores each @ 2.70GHz • Minio Capacity/Compute node (GB) 768 1024 PMem + 192 DRAM 800 Similar cost ▪ Scale factor 3 TB; Compressed data ~ 2.1 TB *Storage over App Direct
  • 13. Alluxio Community Day June 24, 2021 13 Test results: Queries completed ▪ 99 TPC-DS queries total ▪ Some timed out (returned 0 rows) DRAM PMem (SoAD) Optane SSD 1536 2048 1600 58 80 80 445 51 53 N/A 88% 88% 926.81 316.44 322.20 N/A 211.20 822.41 95.23 39.03 35.04 Samples →
  • 14. 14 Alluxio Community Day June 24, 2021 ▪ The hardware • Xeon, PMem, Optane SSD ▪ The software • Alluxio, TPC-DS, Spark ▪ The results ▪ The analysis Hardware without software just generates heat… Agenda Software without hardware is just 1s and 0s… -Doug Fisher
  • 15. Alluxio Community Day June 24, 2021 15 Caching layer differences ▪ Alluxio DRAM layer: • Shares DRAM with workload • Less capacity for caching • More remote data access needed ▪ Alluxio PMem & Optane SSD: • Cached data separate from DRAM • More capacity; less remote access ▪ Clear impact to performance Capacity/Compute node (GB) 768 1024 PMem + 192 DRAM 800 *Storage over App Direct
  • 16. Alluxio Community Day June 24, 2021 16 PMem (SoAD) vs. Optane SSD ▪ Both using block storage ▪ PMem on memory bus; ▪ Optane SSD on PCIe ▪ Many similarities in CPU usage & hardware metrics Average CPU utilization/ Operating frequency PMem (SoAD) Optane SSD 14% / 3.22 13% / 3.22 20% / 3.25 21% / 3.23 22% / 3.21 19% / 3.21 26% / 3.21 20% / 3.21 PMem Optane SSD ▪ Both measured DDR rate at 3200 MT/sec
  • 17. Alluxio Community Day June 24, 2021 17 PMem (SoAD) vs. Optane SSD ▪ A few differences… ▪ System-level metrics capturing both TPC-DS workload & Alluxio data orchestration Query 15 PMem (SoAD) Optane SSD 211.20 822.41 0.82 1.10 59,317 50,827 Ranges from 0 to 1,552 0 (N/A) 142 165
  • 18. Alluxio Community Day June 24, 2021 18 Conclusions & next steps ▪ Capacity matters! • Impact to performance varies ▪ PMem offers higher capacities than similar-costing DRAM ▪ Optane SSD is also a good option in many cases ▪ Follow up work: • More hardware metric analysis – Alluxio specific • More testing, especially in cloud
  • 19. 19 Alluxio Community Day June 24, 2021 ▪ Alluxio & Intel Partnership ▪ alluxio.io/intel/ ▪ Intel Optane Technology ▪ intel.com/optane ▪ Intel Developer Zone ▪ software.intel.com/persistent- memory ▪ Persistent Memory Programming ▪ pmem.io Questions? Resources
  • 20. 20