SlideShare a Scribd company logo
Proactive Data Containers (PDC): An Object-centric Data
Store for Large-scale Computing Systems
Suren Byna
Lawrence Berkeley National Lab (LBNL), Berkeley
Co-authors
Quincey Koziol (LBNL), Venkat Vishwanath (ANL), Jerome Soumagne (THG), Houjun Tang (LBNL),
Kimmy Mu (THG), Bin Dong (LBNL), Richard Warren (THG), François Tessier (ANL, now @ CSCS),
Teng Wang (LBNL), and Jialin Liu (LBNL)
▪  Extreme parallelism

▪  Massive Data

▪  Hierarchical storage

Scalable data management – Three disrupting trends
2
3
Extreme parallelism
Summit, ORNL Sierra, LLNL Sunway Taihulight
NSC Wuxi, China
Trinity, LANL Cori, LBNL
Summit
- ~2.4M cores
- ~143 PFlops
- 9.7 MW
Sierra
- ~1.5M cores
- ~94 PFlops
- 7.4 MW
Taihulight
- ~10.6M cores
- ~93 PFlops
- 15 MW
§ Simulations
–  Multi-physics (FLASH) – 10 PB
–  Cosmology (NyX) – 10 PB
–  Plasma physics (VPIC) – 1 PB
§ Experimental and observational
data (EOD)
–  LHC (100 PB),
–  LSST (60 PB),
–  Genomics (100 TB to 1 PB)
Massive scientific data
FLASH
NyX
VPIC
4
LHC
LSST
Genomics
Hierarchical and heterogeneous storage
5
IO Gap
Memory
Parallel file system
(Lustre, GPFS)
Archival Storage (HPSS
tape)
IO Gap
Shared burst buffer
Memory
Parallel file system
(Lustre, GPFS)
Archival Storage (HPSS
tape)
Memory
Parallel file system
Archival storage (HPSS
tape)
Shared burst buffer
Node-local storage
Conventional
Current
Eg. Cori @ NERSC Upcoming
Campaign storage
Reading and writing data on scalable systems
6
▪  Types of parallel I/O
•  1 writer/reader, 1 file
•  N writers/readers, N files (File-per-process)
•  N writers/readers, 1 file
•  M writers/readers, 1 file
–  Aggregators
–  Two-phase I/O
•  M aggregators, M files (file-per-aggregator)
–  Variations of this mode
P0 P1 Pn-1 Pn
…
file.0
1 Writer/Reader, 1 File
P0 P1 Pn-1 Pn
…
file.0
n Writers/Readers, n Files
file.1 file.n-1 file.n
P0 P1 Pn-1 Pn
…
n Writers/Readers, 1 File
File.1
P0 P1 Pn-1 Pn
…
file.0
M Writers/Readers, M Files
file.m
P0 P1 Pn-1 Pn
…
M Writers/Readers, 1 File
File.1
Scalable Storage Systems: Challenges
7
Memory
Disk-based storage
Archival storage (HPSS
tape)
Shared burst buffer
Hardware
Node-local storage
Campaign storage
Software
High-level I/O lib
(netCDF, HDF5, etc.)
IO middleware
(POSIX, MPI-IO)
IO forwarding
Parallel file
systems
Applications
Usage
… Data (in memory)
IO software
… Files in file system
•  Challenges
–  POSIX-IO semantics hinder scalability and performance of file systems and IO software
–  Multi-level hierarchy complicates data movement, especially if user has to be involved
Tune middleware
Tune file systems
Scalable data management requirements
Use case Domain Sim/EOD/
analysis
Data size I/O Requirements
FLASH High-energy density
physics
Simulation ~1PB Data transformations, scalable I/O
interfaces, correlation among simulation
and experimental data
CMB / Planck Cosmology Simulation, EOD/
Analysis
10PB Automatic data movement optimizations
DECam & LSST Cosmology EOD/Analysis ~10TB Easy interfaces, data transformations
E3SM Climate Simulation ~10PB Async I/O, derived variables, automatic
data movement
TECA Climate Analysis ~10PB Data organization and efficient data
movement
HipMer Genomics EOD/Analysis ~100TB Scalable I/O interfaces, efficient and
automatic data movement
8
Easy interfaces and superior performance
Transparent data management
Information capture and management
8
Next Gen Storage – Proactive Data Containers (PDC)
Memory
Disk-based storage
Archival storage (HPSS
tape)
Shared burst buffer
Hardware
Node-local storage
Campaign storage
Software
High-level API Applications
Usage
… Data (in memory)
9
▪  Object-centric data access interface
§  Simple put, get interface
§  Array-based variable access
▪  Transparent data management
§  Data placement in storage hierarchy
§  Automatic data movement
▪  Information capture and
management
§  Rich metadata
§  Connection of results and raw data with
relationships
Persistent Storage API
BB FS Lustre DAOS
…
PDC System – High-level Architecture
10
▪ Object-level interface
–  Create – containers and objects
–  Add attributes
–  Put object
–  Get object
–  Delete object
▪  Array-specific interface
–  Create regions
–  Map regions in PDC objects
–  Lock
–  Release
11
Object-centric PDC Interface
J. Mu, J. Soumagne, et al., “A Transparent Server-managed Object Storage
System for HPC”, IEEE Cluster 2018
Proactive Data Container
Container
Dataset
KV-Store
Group
<root>
A B C
D E F
PDC Locus
Dataset
KV-Store
Group
Container
Collection
Locus
Container: X
<root>
A B C
D E F
Container: W
<root>
A B C
D E F
Container: Z
<root>
A B C
D E F
Collection: P
Collection: Q
PDC Collection
Container: X
<root>
A B C
D E F
Container: W
<root>
A B C
D E F
Container: Z
<root>
A B C
D E F
Container: Y
<root>
A B C
D E F
Proactive Data Container
Container
<root>
A B C
D E F
Key
▪ Object-level interface
–  Create – containers and objects
–  Add attributes
–  Put object
–  Get object
–  Delete object
▪  Array-specific interface
–  Create regions
–  Map regions in PDC objects
–  Lock
–  Release
12
Object-centric PDC Interface
J. Mu, J. Soumagne, et al., “A Transparent Server-managed Object Storage
System for HPC”, IEEE Cluster 2018
▪ Usage of compute resources for I/O
–  Shared mode – Compute nodes are shared
between applications and I/O services
–  Dedicated mode – I/O services on separate
nodes
▪  Transparent data movement by PDC
servers
–  Apps map data buffers to objects and PDC
servers place and manage data
–  Apps query for data objects using attributes
▪  Superior I/O performance
13
Transparent data movement in storage hierarchy
H. Tang, S. Byna, et al., “Toward Scalable and Asynchronous Object-centric Data Management for HPC”,
IEEE/ACM CCGrid 2018
0
350
700
1050
124 248 496 992 1984 3968 7936 15872
Time	in	seconds
Number	of	processes
HDF5	read		(Lustre) PLFS	read		(Lustre)
PDC	read		(Lustre) HDF5	read		(BB)
PDC	read		(BB)
0
250
500
750
124 248 496 992 1984 3968 7936 15872
Time	in	seconds
Number	of	processes
HDF5	write		(Lustre) PLFS	write		(Lustre)
PDC	write		(Lustre) HDF5	write		(BB)
PDC	write		(BB)
▪ Flat name space
▪ Rich metadata
–  Pre-defined tags that includes
provenance
–  User-defined tags for capturing
relationships between data objects
▪  Distributed in memory metadata
management
–  Distributed hash table and bloom
filters used for faster access
14
Metadata management
H. Tang, S. Byna, et al., “SoMeta: Scalable Object-centric Metadata Management for High Performance
Computing”, to be presented at IEEE Cluster 2017
▪ Take home message
–  Scalable storage systems impacted by:
•  Extreme level of parallelism
•  Massive amounts of scientific data
•  Transforming storage architectures
–  Proactive data containers
•  Object-centric interfaces
•  Transparent data movement in storage hierarchies
•  Scalable management of extensive metadata
15
Conclusions
16
Thanks
https://sdm.lbl.gov/pdc
Contact: Suren Byna (SByna@lbl.gov)

More Related Content

What's hot (20)

PPTX
SPD and KEA: HDF5 based file formats for Earth Observation
The HDF-EOS Tools and Information Center
 
PPTX
Transient and persistent RDF views over relational databases in the context o...
Nikolaos Konstantinou
 
PPTX
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
PDF
E-ARK-iPRES2016-Bern-October-2016
Sven Schlarb
 
PPTX
Geo data analytics
Daniel Marcous
 
PDF
C0312023
iosrjournals
 
PDF
Time series database by Harshil Ambagade
Sigmoid
 
PDF
Big data distributed processing: Spark introduction
Hektor Jacynycz García
 
PDF
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
ODP
Google's Dremel
Maria Stylianou
 
PPTX
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
DOCX
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
Nexgen Technology
 
PPTX
GraphQL & DGraph with Go
James Tan
 
PDF
Dgraph: Graph database for production environment
openCypher
 
PPTX
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
PPTX
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
PPTX
Introduction to DGraph - A Graph Database
Knoldus Inc.
 
PPTX
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
PDF
Partitioning SKA Dataflows for Optimal Graph Execution
Chen Wu
 
PPT
BDAS RDD study report v1.2
Stefanie Zhao
 
SPD and KEA: HDF5 based file formats for Earth Observation
The HDF-EOS Tools and Information Center
 
Transient and persistent RDF views over relational databases in the context o...
Nikolaos Konstantinou
 
Next Generation Grid: Integrating Parallel and Distributed Computing Runtimes...
Geoffrey Fox
 
E-ARK-iPRES2016-Bern-October-2016
Sven Schlarb
 
Geo data analytics
Daniel Marcous
 
C0312023
iosrjournals
 
Time series database by Harshil Ambagade
Sigmoid
 
Big data distributed processing: Spark introduction
Hektor Jacynycz García
 
HPC-ABDS: The Case for an Integrating Apache Big Data Stack with HPC
Geoffrey Fox
 
Google's Dremel
Maria Stylianou
 
Working with Scientific Data in MATLAB
The HDF-EOS Tools and Information Center
 
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
Nexgen Technology
 
GraphQL & DGraph with Go
James Tan
 
Dgraph: Graph database for production environment
openCypher
 
A 3 dimensional data model in hbase for large time-series dataset-20120915
Dan Han
 
Materials Data Facility: Streamlined and automated data sharing, discovery, ...
Ian Foster
 
Introduction to DGraph - A Graph Database
Knoldus Inc.
 
Classification of Big Data Use Cases by different Facets
Geoffrey Fox
 
Partitioning SKA Dataflows for Optimal Graph Execution
Chen Wu
 
BDAS RDD study report v1.2
Stefanie Zhao
 

Similar to Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale Computing Systems (20)

PDF
Long Live Posix - HPC Storage and the HPC Datacenter
inside-BigData.com
 
PPTX
Solving Challenges With 'Huge Data'
IBM Sverige
 
PDF
Webinar 5-reasons-object-storage.pptx
Cloudian
 
DOCX
cloud service management.Details of classic data center
shamaparveen503126
 
PPTX
Analytics with unified file and object
Sandeep Patil
 
PDF
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
 
PPTX
ECS/Cloud Object Storage - DevOps Day
Bob Sokol
 
PDF
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
 
PDF
Architecting a 35 PB distributed parallel file system for science
Speck&Tech
 
PDF
Scalar Decisions: Emerging Trends and Technologies in Storage
patmisasi
 
PDF
Chip ICT | Hgst storage brochure
Marco van der Hart
 
PDF
Study on Composable Infrastructure – Breakdown of Composable Memory
IRJET Journal
 
PDF
Cloud Computing Big Data Is Future Of It
Aman Ghei
 
PDF
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
PDF
Whitepaper_Cassandra_Datastax_Final
Michele Hunter
 
PDF
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
PPTX
storage system, iscsi,file storage, NAS, SAS
AishwaryaSwami10
 
PDF
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
KGMGROUP
 
ODP
Next-generation sequencing: Data mangement
Guy Coates
 
PDF
Object Based Storage
EMC
 
Long Live Posix - HPC Storage and the HPC Datacenter
inside-BigData.com
 
Solving Challenges With 'Huge Data'
IBM Sverige
 
Webinar 5-reasons-object-storage.pptx
Cloudian
 
cloud service management.Details of classic data center
shamaparveen503126
 
Analytics with unified file and object
Sandeep Patil
 
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
 
ECS/Cloud Object Storage - DevOps Day
Bob Sokol
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Big Data Spain
 
Architecting a 35 PB distributed parallel file system for science
Speck&Tech
 
Scalar Decisions: Emerging Trends and Technologies in Storage
patmisasi
 
Chip ICT | Hgst storage brochure
Marco van der Hart
 
Study on Composable Infrastructure – Breakdown of Composable Memory
IRJET Journal
 
Cloud Computing Big Data Is Future Of It
Aman Ghei
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red_Hat_Storage
 
Whitepaper_Cassandra_Datastax_Final
Michele Hunter
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
StampedeCon
 
storage system, iscsi,file storage, NAS, SAS
AishwaryaSwami10
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
KGMGROUP
 
Next-generation sequencing: Data mangement
Guy Coates
 
Object Based Storage
EMC
 
Ad

More from Globus (20)

PDF
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
PDF
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
PDF
Globus Compute Introduction - GlobusWorld 2024
Globus
 
PDF
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
PDF
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
PDF
First Steps with Globus Compute Multi-User Endpoints
Globus
 
PDF
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
PDF
Understanding Globus Data Transfers with NetSage
Globus
 
PDF
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
PDF
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
PDF
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
PDF
The Department of Energy's Integrated Research Infrastructure (IRI)
Globus
 
PDF
GlobusWorld 2024 Opening Keynote session
Globus
 
PDF
Enhancing Performance with Globus and the Science DMZ
Globus
 
PDF
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus
 
PDF
Globus at the United States Geological Survey
Globus
 
PDF
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
PDF
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus
 
PDF
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Understanding Globus Data Transfers with NetSage
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
The Department of Energy's Integrated Research Infrastructure (IRI)
Globus
 
GlobusWorld 2024 Opening Keynote session
Globus
 
Enhancing Performance with Globus and the Science DMZ
Globus
 
Extending Globus into a Site-wide Automated Data Infrastructure.pdf
Globus
 
Globus at the United States Geological Survey
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Globus Compute with Integrated Research Infrastructure (IRI) workflows
Globus
 
Reactive Documents and Computational Pipelines - Bridging the Gap
Globus
 
Ad

Recently uploaded (20)

PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 

Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale Computing Systems

  • 1. Proactive Data Containers (PDC): An Object-centric Data Store for Large-scale Computing Systems Suren Byna Lawrence Berkeley National Lab (LBNL), Berkeley Co-authors Quincey Koziol (LBNL), Venkat Vishwanath (ANL), Jerome Soumagne (THG), Houjun Tang (LBNL), Kimmy Mu (THG), Bin Dong (LBNL), Richard Warren (THG), François Tessier (ANL, now @ CSCS), Teng Wang (LBNL), and Jialin Liu (LBNL)
  • 2. ▪  Extreme parallelism ▪  Massive Data ▪  Hierarchical storage Scalable data management – Three disrupting trends 2
  • 3. 3 Extreme parallelism Summit, ORNL Sierra, LLNL Sunway Taihulight NSC Wuxi, China Trinity, LANL Cori, LBNL Summit - ~2.4M cores - ~143 PFlops - 9.7 MW Sierra - ~1.5M cores - ~94 PFlops - 7.4 MW Taihulight - ~10.6M cores - ~93 PFlops - 15 MW
  • 4. § Simulations –  Multi-physics (FLASH) – 10 PB –  Cosmology (NyX) – 10 PB –  Plasma physics (VPIC) – 1 PB § Experimental and observational data (EOD) –  LHC (100 PB), –  LSST (60 PB), –  Genomics (100 TB to 1 PB) Massive scientific data FLASH NyX VPIC 4 LHC LSST Genomics
  • 5. Hierarchical and heterogeneous storage 5 IO Gap Memory Parallel file system (Lustre, GPFS) Archival Storage (HPSS tape) IO Gap Shared burst buffer Memory Parallel file system (Lustre, GPFS) Archival Storage (HPSS tape) Memory Parallel file system Archival storage (HPSS tape) Shared burst buffer Node-local storage Conventional Current Eg. Cori @ NERSC Upcoming Campaign storage
  • 6. Reading and writing data on scalable systems 6 ▪  Types of parallel I/O •  1 writer/reader, 1 file •  N writers/readers, N files (File-per-process) •  N writers/readers, 1 file •  M writers/readers, 1 file –  Aggregators –  Two-phase I/O •  M aggregators, M files (file-per-aggregator) –  Variations of this mode P0 P1 Pn-1 Pn … file.0 1 Writer/Reader, 1 File P0 P1 Pn-1 Pn … file.0 n Writers/Readers, n Files file.1 file.n-1 file.n P0 P1 Pn-1 Pn … n Writers/Readers, 1 File File.1 P0 P1 Pn-1 Pn … file.0 M Writers/Readers, M Files file.m P0 P1 Pn-1 Pn … M Writers/Readers, 1 File File.1
  • 7. Scalable Storage Systems: Challenges 7 Memory Disk-based storage Archival storage (HPSS tape) Shared burst buffer Hardware Node-local storage Campaign storage Software High-level I/O lib (netCDF, HDF5, etc.) IO middleware (POSIX, MPI-IO) IO forwarding Parallel file systems Applications Usage … Data (in memory) IO software … Files in file system •  Challenges –  POSIX-IO semantics hinder scalability and performance of file systems and IO software –  Multi-level hierarchy complicates data movement, especially if user has to be involved Tune middleware Tune file systems
  • 8. Scalable data management requirements Use case Domain Sim/EOD/ analysis Data size I/O Requirements FLASH High-energy density physics Simulation ~1PB Data transformations, scalable I/O interfaces, correlation among simulation and experimental data CMB / Planck Cosmology Simulation, EOD/ Analysis 10PB Automatic data movement optimizations DECam & LSST Cosmology EOD/Analysis ~10TB Easy interfaces, data transformations E3SM Climate Simulation ~10PB Async I/O, derived variables, automatic data movement TECA Climate Analysis ~10PB Data organization and efficient data movement HipMer Genomics EOD/Analysis ~100TB Scalable I/O interfaces, efficient and automatic data movement 8 Easy interfaces and superior performance Transparent data management Information capture and management 8
  • 9. Next Gen Storage – Proactive Data Containers (PDC) Memory Disk-based storage Archival storage (HPSS tape) Shared burst buffer Hardware Node-local storage Campaign storage Software High-level API Applications Usage … Data (in memory) 9
  • 10. ▪  Object-centric data access interface §  Simple put, get interface §  Array-based variable access ▪  Transparent data management §  Data placement in storage hierarchy §  Automatic data movement ▪  Information capture and management §  Rich metadata §  Connection of results and raw data with relationships Persistent Storage API BB FS Lustre DAOS … PDC System – High-level Architecture 10
  • 11. ▪ Object-level interface –  Create – containers and objects –  Add attributes –  Put object –  Get object –  Delete object ▪  Array-specific interface –  Create regions –  Map regions in PDC objects –  Lock –  Release 11 Object-centric PDC Interface J. Mu, J. Soumagne, et al., “A Transparent Server-managed Object Storage System for HPC”, IEEE Cluster 2018 Proactive Data Container Container Dataset KV-Store Group <root> A B C D E F PDC Locus Dataset KV-Store Group Container Collection Locus Container: X <root> A B C D E F Container: W <root> A B C D E F Container: Z <root> A B C D E F Collection: P Collection: Q PDC Collection Container: X <root> A B C D E F Container: W <root> A B C D E F Container: Z <root> A B C D E F Container: Y <root> A B C D E F Proactive Data Container Container <root> A B C D E F Key
  • 12. ▪ Object-level interface –  Create – containers and objects –  Add attributes –  Put object –  Get object –  Delete object ▪  Array-specific interface –  Create regions –  Map regions in PDC objects –  Lock –  Release 12 Object-centric PDC Interface J. Mu, J. Soumagne, et al., “A Transparent Server-managed Object Storage System for HPC”, IEEE Cluster 2018
  • 13. ▪ Usage of compute resources for I/O –  Shared mode – Compute nodes are shared between applications and I/O services –  Dedicated mode – I/O services on separate nodes ▪  Transparent data movement by PDC servers –  Apps map data buffers to objects and PDC servers place and manage data –  Apps query for data objects using attributes ▪  Superior I/O performance 13 Transparent data movement in storage hierarchy H. Tang, S. Byna, et al., “Toward Scalable and Asynchronous Object-centric Data Management for HPC”, IEEE/ACM CCGrid 2018 0 350 700 1050 124 248 496 992 1984 3968 7936 15872 Time in seconds Number of processes HDF5 read (Lustre) PLFS read (Lustre) PDC read (Lustre) HDF5 read (BB) PDC read (BB) 0 250 500 750 124 248 496 992 1984 3968 7936 15872 Time in seconds Number of processes HDF5 write (Lustre) PLFS write (Lustre) PDC write (Lustre) HDF5 write (BB) PDC write (BB)
  • 14. ▪ Flat name space ▪ Rich metadata –  Pre-defined tags that includes provenance –  User-defined tags for capturing relationships between data objects ▪  Distributed in memory metadata management –  Distributed hash table and bloom filters used for faster access 14 Metadata management H. Tang, S. Byna, et al., “SoMeta: Scalable Object-centric Metadata Management for High Performance Computing”, to be presented at IEEE Cluster 2017
  • 15. ▪ Take home message –  Scalable storage systems impacted by: •  Extreme level of parallelism •  Massive amounts of scientific data •  Transforming storage architectures –  Proactive data containers •  Object-centric interfaces •  Transparent data movement in storage hierarchies •  Scalable management of extensive metadata 15 Conclusions