SlideShare a Scribd company logo
© 2018 Cray Inc. 1
Solving I/O the Slowdown:
The "Noisy Neighbor" Problem
Rice Oil and Gas Conference 2019
John Fragalla
Principal Engineer
Cray, Inc.
© 2018 Cray Inc. 2
• Today’s I/O challenge with shared I/O
• New Lustre Features enabling Flash to improve Shared Application Performance
• Performance results isolating “Noisy Neighbor” Applications
• Summary
Agenda
© 2018 Cray Inc. 3
Today’s I/O Challenge
• When multiple users share a high speed parallel filesystem, ”bad applications”
will effect “good application” performance
• Bad applications: Lots of Small Files, Random Small I/O, Unaligned I/O
• Good applications: Stream large I/O, Sequential performance, Aligned I/O
• Recent features available in Lustre help automate I/O isolation and placement
with transparent use of Flash and HDD Devices in a Single Namespace
• Progressive File Layout with Lustre Storage Pools
• Data on Metadata
• Distributed Namespace (DNE) 2 – Clustered Metadata
© 2018 Cray Inc. 4
Hybrid File System Architecture
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
CN
HDD HDD HDD HDD
HDD HDD HDD HDD
SSD SSD SSD SSD
SSD SSD SSD SSD
Parallel File System
Single Namespace
Compute Cluster
Flash Tier :
• Optimize for throughput and IOPs - $/GB/sec.
• Improved Performance for Intermediate Results
• Small/random I/O performance improved
High Performance HDD Tier:
• Optimize Throughput/Capacity - $/GB/Sec
• Optional Flash/Cache to accelerate small block
IO within the HDD Tier
HDD HDD HDD HDD
HDD HDD HDD HDD
HDD HDD HDD HDD
Capacity HDD Tier:
• Optimize cost - $/GB
• Lower performance, longer term data retention
MDS
MDS
MDS
MDS
Scalable MDS Flash Tier :
• Large number of Inode Support per FS
• Improved Metadata Operations
• Improved Small I/O Latency
© 2018 Cray Inc. 5
• Progressive File Layouts (PFL)
• Optimized striping based on file size
• Layout changes at specific thresholds
• Can locate components on specific pools
• Fixed amount of file on flash, the rest on disk
Lustre | Flexibility and Usability
© 2018 Cray Inc. 6
• Lustre Pools historically was used for debugging to isolate performance issues,
for example, for a subset of OSTs or OSS Nodes.
• Now, with PFL, Lustre Storage Pools can be a powerful tool to create automatic
data placement on different storage medias
• Flash Pool
• High Performing Disk Pool
• Slow Performing Disk Tier (e.g. focus on capacity)
Lustre Storage Pools are more relevant Now
© 2018 Cray Inc. 7
Lustre | Improved Small File Performance
• Data on Metadata
• Ideal for small file workloads
• File data stored directly on metadata storage
• Lower communication overhead for data access
• Scales with Distributed Namespace (DNE)
• Avoids contention by not placing small files on OSTs
© 2018 Cray Inc. 8
Lustre | Data on Metadata and PFL
• Leverage DoM and PFL for more flexible solution
• Small files land on MDT Component
• Medium files land on flash with larger files growing to disk (for example)
• Compatible with Progressive File Layouts
© 2018 Cray Inc. 9
DNE Phase 2
• Allows a user to spread a single large directory across multiple MDTs using the
DNE striped directory feature
• Note due to some overhead, this should only be done to very large
directories with file counts in +50K range.
DNE phase 1 DNE phase 2
© 2018 Cray Inc. 10
System Setup for Benchmarks
Hardware:
• 4 Flash MDTs with RAID-10
• 2 Flash IOPS Optimized OSTs with
RAID-10
• 4 GridRAID OSTs (Parity De-
Clustered RAID-6 equivalent data
protection)
• Up to 64 Client nodes (FDR
Connectivity)
• EDR InfiniBand Non-Blocking Fabric
Software:
• Lustre 2.11.0 clients and server
• CentOS Linux release 7.5 (server and
client)
• Spectre/Meltdown-enabled kernels on
Clients, S/M disabled on Server
• Client: 3.10.0-862.el7.x86_64
• Server: 3.10.0-693.21.1.x3.1.9.x86_64
© 2018 Cray Inc. 11
LUSTRE PFL STREAMING PERFORMANCE
Flash MDTs -> HDD OST Tier (DoM)
0
5,000
10,000
15,000
20,000
25,000
30,000
No DoM DoM=64K DoM=256K DoM=1024K DoM=4096K
Performance(MB/sec)
Progressive File Layout Small Component Size
Write Mean
Read Mean
Progressive File Layout maintains peak performance for streaming workloads
We want no change in performance across various sizes
© 2018 Cray Inc. 12
LUSTRE PFL NOISY NEIGHBOR ISOLATION
Lustre
HDD OST
File 1 File 2 File 3 File 4
File
Small File Workload
Streaming Workload
HDD OST
File 1 File 2 File 3 File 4
File
Small File Workload
Streaming Workload
Flash
OST or MDT
Two Competing Workloads on Same Resources
Two Workloads Separated using PFL
© 2018 Cray Inc. 13
12,000
14,000
16,000
18,000
20,000
22,000
24,000
None None (1MB) 1024K (1MB)
MB/s
1 MB FILES
Write Mean Read Mean
12,000
14,000
16,000
18,000
20,000
22,000
24,000
None None (4MB) 1024K (4MB) 4096K (4MB)
MB/s
4 MB FILES
Write Mean Read Mean
LUSTRE PFL NOISY NEIGHBOR ISOLATION
Flash Tier (OST or DoM with MDTs) -> HDD OST Tier
PFL ISOLATION OF IOPS FROM STREAMING IMPROVES PERFORMANCE
Baseline
Interfered
Isolated Baseline Isolated
Interfered
X-Axis Legend
PFL Size on Flash (Noisy Neighbor File Size)
© 2018 Cray Inc. 14
• New Lustre Features such as PFL, DoM, and DNE2 help improve mixed I/O
performance on high speed shared parallel filesystem
• Transparent data placement on Flash MDTs and/or OSTs and HDDs for various
I/O sizes to optimize throughput and IOPS
• Isolate small files or small I/Os from streaming I/O to solve the “Noisy Neighbor”
slow down for sequential performance
Summary
THANK YOU
Q U E S T I O N S ?
cray.com

More Related Content

What's hot (19)

PDF
Using Catalogic DPX with Microsoft Azure Cloud
Catalogic Software
 
PDF
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
PDF
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
EDB
 
PPTX
DDN EXA 5 - Innovation at Scale
inside-BigData.com
 
PPTX
SnapVault SE presentation
Robbie Rikard
 
PDF
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Red_Hat_Storage
 
PPTX
Storage virtualization
ramya1591
 
PDF
Clustered ONTAP for Cloud
NetApp
 
PDF
DataKeeper_SAN-SANLess_Clusters_Windows_Product_Brief(RaxcoBE)
Peter Vervaene
 
PDF
Building modern data lakes
Minio
 
PDF
Dell Poweredge FX Infographic
Richard Nicholson
 
PDF
Deduplication to cloud with Backup Exec 16 FP2
Veritas Technologies LLC
 
PPTX
Achieving higher IOPS for NAS at reasonable cost
Tyrone Systems India
 
PPT
StorageArchitecturesForCloudVDI
Vinay Rao
 
PDF
Scaling Up vs. Scaling-out
Christopher Nadeau
 
PPTX
Bringing NetApp Data ONTAP & Apache CloudStack Together
David La Motta
 
PPTX
Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Storage Switzerland
 
PPTX
Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)
jlchatelain
 
PDF
Primend praktiline pilveseminar 2014 - Simplivity Omnicube, esimene samm pilve
Primend
 
Using Catalogic DPX with Microsoft Azure Cloud
Catalogic Software
 
Dell Lustre Storage Architecture Presentation - MBUG 2016
Andrew Underwood
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
EDB
 
DDN EXA 5 - Innovation at Scale
inside-BigData.com
 
SnapVault SE presentation
Robbie Rikard
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Red_Hat_Storage
 
Storage virtualization
ramya1591
 
Clustered ONTAP for Cloud
NetApp
 
DataKeeper_SAN-SANLess_Clusters_Windows_Product_Brief(RaxcoBE)
Peter Vervaene
 
Building modern data lakes
Minio
 
Dell Poweredge FX Infographic
Richard Nicholson
 
Deduplication to cloud with Backup Exec 16 FP2
Veritas Technologies LLC
 
Achieving higher IOPS for NAS at reasonable cost
Tyrone Systems India
 
StorageArchitecturesForCloudVDI
Vinay Rao
 
Scaling Up vs. Scaling-out
Christopher Nadeau
 
Bringing NetApp Data ONTAP & Apache CloudStack Together
David La Motta
 
Webinar: Hyperconvergence is Broken, Learn How to Fix it!
Storage Switzerland
 
Hype, Hopes, Hell & Hadoop (#bigdata and the enterprise of everything)
jlchatelain
 
Primend praktiline pilveseminar 2014 - Simplivity Omnicube, esimene samm pilve
Primend
 

Similar to Data Storage & I/O Performance: Solving I/O Slowdown: The "Noisy Neighbor" Problem (20)

PDF
Tacc Infinite Memory Engine
inside-BigData.com
 
PDF
Storage solutions for High Performance Computing
gmateesc
 
PDF
LUG 2014
Hitoshi Sato
 
PDF
Lustre Generational Performance Improvements & New Features
inside-BigData.com
 
PPTX
Corralling Big Data at TACC
inside-BigData.com
 
PDF
optimizing_ceph_flash
Vijayendra Shamanna
 
PDF
Burst Buffer: From Alpha to Omega
George Markomanolis
 
PDF
Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale...
Globus
 
PPTX
UKOUG, Lies, Damn Lies and I/O Statistics
Kyle Hailey
 
PDF
Lustre Best Practices
George Markomanolis
 
PDF
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
PDF
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Odinot Stanislas
 
PDF
제3회난공불락 오픈소스 인프라세미나 - lustre
Tommy Lee
 
PDF
pnfs status
bergwolf
 
PDF
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
inside-BigData.com
 
PDF
Scalar Decisions: Emerging Trends and Technologies in Storage
patmisasi
 
PDF
Panasas pNFS Status (September 2010)
Panasas
 
PDF
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
PDF
Architecting a 35 PB distributed parallel file system for science
Speck&Tech
 
ODP
Cluster Filesystems and the next 1000 human genomes
Guy Coates
 
Tacc Infinite Memory Engine
inside-BigData.com
 
Storage solutions for High Performance Computing
gmateesc
 
LUG 2014
Hitoshi Sato
 
Lustre Generational Performance Improvements & New Features
inside-BigData.com
 
Corralling Big Data at TACC
inside-BigData.com
 
optimizing_ceph_flash
Vijayendra Shamanna
 
Burst Buffer: From Alpha to Omega
George Markomanolis
 
Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale...
Globus
 
UKOUG, Lies, Damn Lies and I/O Statistics
Kyle Hailey
 
Lustre Best Practices
George Markomanolis
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY
 
Scale-out Storage on Intel® Architecture Based Platforms: Characterizing and ...
Odinot Stanislas
 
제3회난공불락 오픈소스 인프라세미나 - lustre
Tommy Lee
 
pnfs status
bergwolf
 
Big Lab Problems Solved with Spectrum Scale: Innovations for the Coral Program
inside-BigData.com
 
Scalar Decisions: Emerging Trends and Technologies in Storage
patmisasi
 
Panasas pNFS Status (September 2010)
Panasas
 
Vanquishing Latency Outliers in the Lightbits LightOS Software Defined Storag...
ScyllaDB
 
Architecting a 35 PB distributed parallel file system for science
Speck&Tech
 
Cluster Filesystems and the next 1000 human genomes
Guy Coates
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
PDF
State of ARM-based HPC
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Ad

Recently uploaded (20)

PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 

Data Storage & I/O Performance: Solving I/O Slowdown: The "Noisy Neighbor" Problem

  • 1. © 2018 Cray Inc. 1 Solving I/O the Slowdown: The "Noisy Neighbor" Problem Rice Oil and Gas Conference 2019 John Fragalla Principal Engineer Cray, Inc.
  • 2. © 2018 Cray Inc. 2 • Today’s I/O challenge with shared I/O • New Lustre Features enabling Flash to improve Shared Application Performance • Performance results isolating “Noisy Neighbor” Applications • Summary Agenda
  • 3. © 2018 Cray Inc. 3 Today’s I/O Challenge • When multiple users share a high speed parallel filesystem, ”bad applications” will effect “good application” performance • Bad applications: Lots of Small Files, Random Small I/O, Unaligned I/O • Good applications: Stream large I/O, Sequential performance, Aligned I/O • Recent features available in Lustre help automate I/O isolation and placement with transparent use of Flash and HDD Devices in a Single Namespace • Progressive File Layout with Lustre Storage Pools • Data on Metadata • Distributed Namespace (DNE) 2 – Clustered Metadata
  • 4. © 2018 Cray Inc. 4 Hybrid File System Architecture CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN HDD HDD HDD HDD HDD HDD HDD HDD SSD SSD SSD SSD SSD SSD SSD SSD Parallel File System Single Namespace Compute Cluster Flash Tier : • Optimize for throughput and IOPs - $/GB/sec. • Improved Performance for Intermediate Results • Small/random I/O performance improved High Performance HDD Tier: • Optimize Throughput/Capacity - $/GB/Sec • Optional Flash/Cache to accelerate small block IO within the HDD Tier HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD HDD Capacity HDD Tier: • Optimize cost - $/GB • Lower performance, longer term data retention MDS MDS MDS MDS Scalable MDS Flash Tier : • Large number of Inode Support per FS • Improved Metadata Operations • Improved Small I/O Latency
  • 5. © 2018 Cray Inc. 5 • Progressive File Layouts (PFL) • Optimized striping based on file size • Layout changes at specific thresholds • Can locate components on specific pools • Fixed amount of file on flash, the rest on disk Lustre | Flexibility and Usability
  • 6. © 2018 Cray Inc. 6 • Lustre Pools historically was used for debugging to isolate performance issues, for example, for a subset of OSTs or OSS Nodes. • Now, with PFL, Lustre Storage Pools can be a powerful tool to create automatic data placement on different storage medias • Flash Pool • High Performing Disk Pool • Slow Performing Disk Tier (e.g. focus on capacity) Lustre Storage Pools are more relevant Now
  • 7. © 2018 Cray Inc. 7 Lustre | Improved Small File Performance • Data on Metadata • Ideal for small file workloads • File data stored directly on metadata storage • Lower communication overhead for data access • Scales with Distributed Namespace (DNE) • Avoids contention by not placing small files on OSTs
  • 8. © 2018 Cray Inc. 8 Lustre | Data on Metadata and PFL • Leverage DoM and PFL for more flexible solution • Small files land on MDT Component • Medium files land on flash with larger files growing to disk (for example) • Compatible with Progressive File Layouts
  • 9. © 2018 Cray Inc. 9 DNE Phase 2 • Allows a user to spread a single large directory across multiple MDTs using the DNE striped directory feature • Note due to some overhead, this should only be done to very large directories with file counts in +50K range. DNE phase 1 DNE phase 2
  • 10. © 2018 Cray Inc. 10 System Setup for Benchmarks Hardware: • 4 Flash MDTs with RAID-10 • 2 Flash IOPS Optimized OSTs with RAID-10 • 4 GridRAID OSTs (Parity De- Clustered RAID-6 equivalent data protection) • Up to 64 Client nodes (FDR Connectivity) • EDR InfiniBand Non-Blocking Fabric Software: • Lustre 2.11.0 clients and server • CentOS Linux release 7.5 (server and client) • Spectre/Meltdown-enabled kernels on Clients, S/M disabled on Server • Client: 3.10.0-862.el7.x86_64 • Server: 3.10.0-693.21.1.x3.1.9.x86_64
  • 11. © 2018 Cray Inc. 11 LUSTRE PFL STREAMING PERFORMANCE Flash MDTs -> HDD OST Tier (DoM) 0 5,000 10,000 15,000 20,000 25,000 30,000 No DoM DoM=64K DoM=256K DoM=1024K DoM=4096K Performance(MB/sec) Progressive File Layout Small Component Size Write Mean Read Mean Progressive File Layout maintains peak performance for streaming workloads We want no change in performance across various sizes
  • 12. © 2018 Cray Inc. 12 LUSTRE PFL NOISY NEIGHBOR ISOLATION Lustre HDD OST File 1 File 2 File 3 File 4 File Small File Workload Streaming Workload HDD OST File 1 File 2 File 3 File 4 File Small File Workload Streaming Workload Flash OST or MDT Two Competing Workloads on Same Resources Two Workloads Separated using PFL
  • 13. © 2018 Cray Inc. 13 12,000 14,000 16,000 18,000 20,000 22,000 24,000 None None (1MB) 1024K (1MB) MB/s 1 MB FILES Write Mean Read Mean 12,000 14,000 16,000 18,000 20,000 22,000 24,000 None None (4MB) 1024K (4MB) 4096K (4MB) MB/s 4 MB FILES Write Mean Read Mean LUSTRE PFL NOISY NEIGHBOR ISOLATION Flash Tier (OST or DoM with MDTs) -> HDD OST Tier PFL ISOLATION OF IOPS FROM STREAMING IMPROVES PERFORMANCE Baseline Interfered Isolated Baseline Isolated Interfered X-Axis Legend PFL Size on Flash (Noisy Neighbor File Size)
  • 14. © 2018 Cray Inc. 14 • New Lustre Features such as PFL, DoM, and DNE2 help improve mixed I/O performance on high speed shared parallel filesystem • Transparent data placement on Flash MDTs and/or OSTs and HDDs for various I/O sizes to optimize throughput and IOPS • Isolate small files or small I/Os from streaming I/O to solve the “Noisy Neighbor” slow down for sequential performance Summary
  • 15. THANK YOU Q U E S T I O N S ? cray.com