Proprietary and Confidential. Copyright 2018, The HDF Group.
HDF for the Cloud:
New HDF Server Features
John Readey
2
• HDF storage schema for the cloud
• HDF Server features
• What’s new
• What’s next
• Demo
Overview
3What is HDF5?
Depends on your point of view:
• a C-API
• a data model
• a file format
Let’s imagine keeping the API and
Data model, but with a different (cloud friendly)
Storage format
4HDF Sharded Schema
Big Idea: Map individual
HDF5 objects (datasets,
groups, chunks) as Object
Storage Objects
• Limit maximum size of any object
• Support parallelism for read/write
• Only data that is modified needs to be
updated
• Multiple clients can be reading/updating
the same “file”
• Don’t need to manage free space
Legend:
• Dataset is partitioned into chunks
• Each chunk stored as an object (file)
• Dataset meta data (type, shape,
attributes, etc.) stored in a separate
object (as JSON text)
Why a sharded data format?
Each chunk (heavy outlines) get
persisted as a separate object
5Sharded format example
root_obj_id/
group.json
obj1_id/
group.json
obj2_id/
dataset.json
0_0
0_1
obj3_id/
dataset.json
0_0_2
0_0_3
Observations:
• Metadata is stored as JSON
• Chunk data stored as binary blobs
• Self-explanatory
• One HDF5 file can translate to lots of
objects
• Flat hierarchy – supports HDF5
multilinking
• Can limit maximum size of an object
• Can be used with Posix or object
storage
Schema is documented here:
https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md
6Implementations of the sharded schema
A storage format specification is nice, but it would
be useful to have some software that can actually
write and read to the format…
As it happens, we’ve created a software service that uses the
schema: HSDS (Highly Scalable Data Service)
Software is available at:
https://github.com/HDFGroup/hsds
Note: HSDS was originally developed as a NASA ACCESS 2015 project:
https://earthdata.nasa.gov/esds/competitive-programs/access/hsds
7Server Features
• Simple + familiar API
• Clients can interact with service using REST API
• SDKs provide language specific interface (e.g. h5pyd for Python)
• Can read/write just the data they need (as opposed to transferring entire files)
• Support for compression
• Container based
• Run in Docker or Kubernetes or DC/OS
• Scalable performance:
• Can cache recently accessed data in RAM
• Can parallelize requests across multiple nodes
• More nodes  better performance
• Cluster based – any number of machines can be used to constitute the server
• Multiple clients can read/write to same data source
• No limit to the amount of data that can be stored by the service
8Architecture
Legend:
• Client: Any user of the service
• Load balancer – distributes requests to Service nodes
• Service Nodes – processes requests from clients (with help from Data Nodes)
• Data Nodes – responsible for partition of Object Store
• Object Store: Base storage service (e.g. AWS S3)
9HDF API Compatibility
The sharded storage schema captures the
HDF data model, and REST service interface
is nice, but it would be great if the existing
HDF based applications and libraries could
use the new storage format without requiring a
bunch of code changes…
Two related projects provide a solution:
• H5pyd – h5py compatible package for Python
• REST VOL – HDF5 library plugin for C/C++
10H5pyd – Python client
• H5py is a popular Python package that provide a Pythonic interface to the HDF5 library
• H5pyd (for h5py distributed) provides a h5py compatible h5py for accessing the server
• Pure Python – uses requests package to make http calls to server
• Include several extensions to h5py:
• List content in folders
• Get/Set ACLs (access control list)
• Pytables-like query interface
• H5netcdf and xarray packages will use h5pyd when http:// is prepended to the file path
• Installable from PyPI: $ pip install h5pyd
• Source code: https://github.com/HDFGroup/h5pyd
11Supporting the Python Analytics Stack
Many Python users
don’t use h5py, but
tools higher up the
stack: h5netcdf,
xarray, pandas, etc.
HDF5Lib
H5PY
H5NETCDF
Xarray
Since h5pyd is
compatible with h5py,
we should be able to
support the same stack
for HDF Cloud
HDF5Lib
H5PY
H5NETCDF
Xarray
H5PYD
HDFServer
Disk
Applications can
switch between
local and cloud
access just by
changing file path.
12REST VOL Plugin
• The HDF5 VOL architecture is a plugin layer for HDF5
• Public API stays the same, but different back ends can be implemented
• REST VOL substitutes REST API requests for file i/o actions
• C/Fortran applications should be able to run as is
• Some features not implemented yet:
• VLEN support
• Large read/write support (selections >100mb)
• Downloadable from: https://github.com/HDFGroup/vol-rest
13Command Line Interface (CLI)
• Accessing HDF via a service means one can’t utilize usual shell commands: ls, rm, chmod, etc.
• Command line tools are a set of simple apps to use instead:
• hsinfo: display server version, connect info
• hsls: list content of folder or file
• hstouch: create folder or file
• hsdel: delete a file
• hsload: upload an HDF5 file
• hsget: download content from server to an HDF5 file
• hsacl: create/list/update ACLs (Access Control Lists)
• Hsdiff: compare HDF5 file with sharded representation
• Implemented in Python & uses h5pyd
• Note: data is round-tripable:
• HDF5 File hsload  HSDS store  hsget  HDF5 file
14Supporting traditional HDF5 files
• If you have HDF5 files already stored in the cloud, they can be
accessed by HDF Server
• Rather than converting the entire file to the HDF Schema, just the
metadata needs to be imported (typically <1% of the file)
• Dataset reads are converted to S3 Range Gets on the stored file
• The hsload CLI tool has an option (--link ) for loading file metadata
• It is also possible to construct a server file that aggregates multiple
stored files (similar to how the HDF5 library VDS feature works)
We’ve discussed three aspects of HDF: the data model, API, and file
format. With HSDS we’ve kept the data model and API, but the file
format is radically different. But maybe you have a PB or two of HDF5
files you’d like to use…
15New HSDS features
HSDS version 0.6 is coming soon…
What’s new:
• POSIX Support – Store content on regular disk drives
• Azure
• Azure Blob support – Support for Azure’s object storage format
• AKS (Azure Kubernetes) – Run in Azure’s managed Kubernetes
• Active Directory authentication – Authenticate via AD
• AWS
• Added support for AWS Lambda
• DC/OS – support for DC/OS (Apache Mesos) distributed system
• Domain checksums – verify when any content changes
• Role Based Access Control (RBAC) – manage ACLs for user groups
Complete list is here: https://github.com/HDFGroup/hsds/issues/47
16HSDS Platforms
POSIX
Filesystem
HSDS can be run on most container management systems:
Using different supported storage systems:
17AWS Lambda Functions
• HSDS can parallelize requests across all the
available backend (“DN”) nodes on the server
• AWS Lambda is a new service that enables you to
run requests ”serverless”
• Pay for just cpu-seconds the function runs
• By incorporating Lambda, some HDF Server
requests can parallelize across a 1000 Lambda
functions (equivalent to a 1000 container server)
• Will dramatically speed up time-series selections
18Kita Lab
• Kita Lab is a JupyterLab and HDF server environment hosted by the HDF Group on AWS
• Kita Lab users can create Python notebooks that use h5pyd to connect to HDF Server
• Each user gets equivalent to 2-core Xeon Server and 10GB local storage
• Users can use up to 100GB of data on HDF Server
• Sign up here: https://www.hdfgroup.org/hdfkitalab/
User’s container
and EBS volume
User
User logs into
Jupyter Hub
JupyterHub spawns
new container at
login
HSDS on Kubernetes
S3 Bucket
19Futures
• Sometimes you’d rather do without a server and talk to the storage system directly:
• Don’t want to deal with setting up service
• Don’t want to worry about scaling service up and down with client load
• You don’t need the synchronization (e.g. managing multiple clients writing to the
same dataset) that a service provides
• HS Direct Access will be a new VOL connector that enables this for the HDF5 library
• Will take advantage of multiple cores
• Uses same schema as HSDS (and can be used in conjunction with HSDS)
Design doc is here:
https://github.com/HDFGroup/hsds/blob/master/docs/design/direct_access/direct_access.md
20Questions?

More Related Content

PPTX
HDF for the Cloud - Serverless HDF
PPTX
HDF5 and Ecosystem: What Is New?
PPTX
HDF - Current status and Future Directions
PPTX
Parallel Computing with HDF Server
PDF
H5Coro: The Cloud-Optimized Read-Only Library
PPSX
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
PPT
Caching and Buffering in HDF5
HDF for the Cloud - Serverless HDF
HDF5 and Ecosystem: What Is New?
HDF - Current status and Future Directions
Parallel Computing with HDF Server
H5Coro: The Cloud-Optimized Read-Only Library
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
Caching and Buffering in HDF5

What's hot (20)

PPTX
PPSX
HDFEOS.org User Analsys, Updates, and Future
PPTX
MATLAB Modernization on HDF5 1.10
PPT
HDF-EOS 2/5 to netCDF Converter
PPTX
PPTX
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
PDF
HDFS Analysis for Small Files
PPTX
Parallel HDF5 Developments
PPTX
MATLAB and Scientific Data: New Features and Capabilities
PPTX
Ozone and HDFS’s evolution
PPT
PPTX
Easy Access of NASA HDF data via OPeNDAP
PPT
Status of HDF-EOS, Related Software and Tools
PPT
Performance Tuning in HDF5
PPTX
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
PPTX
Putting some Spark into HDF5
HDFEOS.org User Analsys, Updates, and Future
MATLAB Modernization on HDF5 1.10
HDF-EOS 2/5 to netCDF Converter
STARE-PODS: A Versatile Data Store Leveraging the HDF Virtual Object Layer fo...
HDFS Analysis for Small Files
Parallel HDF5 Developments
MATLAB and Scientific Data: New Features and Capabilities
Ozone and HDFS’s evolution
Easy Access of NASA HDF data via OPeNDAP
Status of HDF-EOS, Related Software and Tools
Performance Tuning in HDF5
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Putting some Spark into HDF5
Ad

Similar to HDF for the Cloud - New HDF Server Features (20)

PPTX
Highly Scalable Data Service (HSDS) Performance Features
PDF
HDFCloud Workshop: HDF5 in the Cloud
PDF
Accessing HDF5 data in the cloud with HSDS
PDF
hadoop distributed file systems complete information
PPTX
Hadoop in the cloud – The what, why and how from the experts
PPTX
Hadoop and Big data in Big data and cloud.pptx
PDF
Hadoop Primer
PPTX
Big data - Online Training
PPTX
HADOOP TECHNOLOGY ppt
PDF
9.-dados e processamento distribuido-hadoop.pdf
PPTX
HADOOP TECHNOLOGY ppt
PPTX
HDF Update for DAAC Managers (2017-02-27)
PDF
Aziksa hadoop architecture santosh jha
PPTX
Hadoop.pptx
PPTX
Hadoop.pptx
PPTX
List of Engineering Colleges in Uttarakhand
PPTX
Hdf5 parallel
Highly Scalable Data Service (HSDS) Performance Features
HDFCloud Workshop: HDF5 in the Cloud
Accessing HDF5 data in the cloud with HSDS
hadoop distributed file systems complete information
Hadoop in the cloud – The what, why and how from the experts
Hadoop and Big data in Big data and cloud.pptx
Hadoop Primer
Big data - Online Training
HADOOP TECHNOLOGY ppt
9.-dados e processamento distribuido-hadoop.pdf
HADOOP TECHNOLOGY ppt
HDF Update for DAAC Managers (2017-02-27)
Aziksa hadoop architecture santosh jha
Hadoop.pptx
Hadoop.pptx
List of Engineering Colleges in Uttarakhand
Hdf5 parallel
Ad

More from The HDF-EOS Tools and Information Center (17)

PDF
HDF5 2.0: Cloud Optimized from the Start
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
PDF
Cloud-Optimized HDF5 Files - Current Status
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
PPTX
Access HDF Data in the Cloud via OPeNDAP Web Service
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
PDF
Cloud-Optimized HDF5 Files
PDF
Creating Cloud-Optimized HDF5 Files
PPTX
HDF5 OPeNDAP Handler Updates, and Performance Discussion
PPTX
Hyrax: Serving Data from S3
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
PDF
HDF - Current status and Future Directions
PPTX
Leveraging the Cloud for HDF Software Testing
PPTX
Google Colaboratory for HDF-EOS
PPTX
HDF-EOS Data Product Developer's Guide
HDF5 2.0: Cloud Optimized from the Start
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
Cloud-Optimized HDF5 Files - Current Status
Cloud Optimized HDF5 for the ICESat-2 mission
Access HDF Data in the Cloud via OPeNDAP Web Service
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The State of HDF5 / Dana Robinson / The HDF Group
Cloud-Optimized HDF5 Files
Creating Cloud-Optimized HDF5 Files
HDF5 OPeNDAP Handler Updates, and Performance Discussion
Hyrax: Serving Data from S3
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
HDF - Current status and Future Directions
Leveraging the Cloud for HDF Software Testing
Google Colaboratory for HDF-EOS
HDF-EOS Data Product Developer's Guide

Recently uploaded (20)

PPTX
Introduction to Windows Operating System
PPTX
Advanced SystemCare Ultimate Crack + Portable (2025)
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PDF
Salesforce Agentforce AI Implementation.pdf
DOCX
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
PDF
Cost to Outsource Software Development in 2025
PPTX
Patient Appointment Booking in Odoo with online payment
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PDF
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
AI Guide for Business Growth - Arna Softech
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PPTX
Tech Workshop Escape Room Tech Workshop
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
Introduction to Windows Operating System
Advanced SystemCare Ultimate Crack + Portable (2025)
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Salesforce Agentforce AI Implementation.pdf
Modern SharePoint Intranet Templates That Boost Employee Engagement in 2025.docx
Cost to Outsource Software Development in 2025
Patient Appointment Booking in Odoo with online payment
MCP Security Tutorial - Beginner to Advanced
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
Ableton Live Suite for MacOS Crack Full Download (Latest 2025)
Designing Intelligence for the Shop Floor.pdf
AI Guide for Business Growth - Arna Softech
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Tech Workshop Escape Room Tech Workshop
Wondershare Recoverit Full Crack New Version (Latest 2025)
CCleaner 6.39.11548 Crack 2025 License Key
Weekly report ppt - harsh dattuprasad patel.pptx

HDF for the Cloud - New HDF Server Features

  • 1. Proprietary and Confidential. Copyright 2018, The HDF Group. HDF for the Cloud: New HDF Server Features John Readey
  • 2. 2 • HDF storage schema for the cloud • HDF Server features • What’s new • What’s next • Demo Overview
  • 3. 3What is HDF5? Depends on your point of view: • a C-API • a data model • a file format Let’s imagine keeping the API and Data model, but with a different (cloud friendly) Storage format
  • 4. 4HDF Sharded Schema Big Idea: Map individual HDF5 objects (datasets, groups, chunks) as Object Storage Objects • Limit maximum size of any object • Support parallelism for read/write • Only data that is modified needs to be updated • Multiple clients can be reading/updating the same “file” • Don’t need to manage free space Legend: • Dataset is partitioned into chunks • Each chunk stored as an object (file) • Dataset meta data (type, shape, attributes, etc.) stored in a separate object (as JSON text) Why a sharded data format? Each chunk (heavy outlines) get persisted as a separate object
  • 5. 5Sharded format example root_obj_id/ group.json obj1_id/ group.json obj2_id/ dataset.json 0_0 0_1 obj3_id/ dataset.json 0_0_2 0_0_3 Observations: • Metadata is stored as JSON • Chunk data stored as binary blobs • Self-explanatory • One HDF5 file can translate to lots of objects • Flat hierarchy – supports HDF5 multilinking • Can limit maximum size of an object • Can be used with Posix or object storage Schema is documented here: https://github.com/HDFGroup/hsds/blob/master/docs/design/obj_store_schema/obj_store_schema_v2.md
  • 6. 6Implementations of the sharded schema A storage format specification is nice, but it would be useful to have some software that can actually write and read to the format… As it happens, we’ve created a software service that uses the schema: HSDS (Highly Scalable Data Service) Software is available at: https://github.com/HDFGroup/hsds Note: HSDS was originally developed as a NASA ACCESS 2015 project: https://earthdata.nasa.gov/esds/competitive-programs/access/hsds
  • 7. 7Server Features • Simple + familiar API • Clients can interact with service using REST API • SDKs provide language specific interface (e.g. h5pyd for Python) • Can read/write just the data they need (as opposed to transferring entire files) • Support for compression • Container based • Run in Docker or Kubernetes or DC/OS • Scalable performance: • Can cache recently accessed data in RAM • Can parallelize requests across multiple nodes • More nodes  better performance • Cluster based – any number of machines can be used to constitute the server • Multiple clients can read/write to same data source • No limit to the amount of data that can be stored by the service
  • 8. 8Architecture Legend: • Client: Any user of the service • Load balancer – distributes requests to Service nodes • Service Nodes – processes requests from clients (with help from Data Nodes) • Data Nodes – responsible for partition of Object Store • Object Store: Base storage service (e.g. AWS S3)
  • 9. 9HDF API Compatibility The sharded storage schema captures the HDF data model, and REST service interface is nice, but it would be great if the existing HDF based applications and libraries could use the new storage format without requiring a bunch of code changes… Two related projects provide a solution: • H5pyd – h5py compatible package for Python • REST VOL – HDF5 library plugin for C/C++
  • 10. 10H5pyd – Python client • H5py is a popular Python package that provide a Pythonic interface to the HDF5 library • H5pyd (for h5py distributed) provides a h5py compatible h5py for accessing the server • Pure Python – uses requests package to make http calls to server • Include several extensions to h5py: • List content in folders • Get/Set ACLs (access control list) • Pytables-like query interface • H5netcdf and xarray packages will use h5pyd when http:// is prepended to the file path • Installable from PyPI: $ pip install h5pyd • Source code: https://github.com/HDFGroup/h5pyd
  • 11. 11Supporting the Python Analytics Stack Many Python users don’t use h5py, but tools higher up the stack: h5netcdf, xarray, pandas, etc. HDF5Lib H5PY H5NETCDF Xarray Since h5pyd is compatible with h5py, we should be able to support the same stack for HDF Cloud HDF5Lib H5PY H5NETCDF Xarray H5PYD HDFServer Disk Applications can switch between local and cloud access just by changing file path.
  • 12. 12REST VOL Plugin • The HDF5 VOL architecture is a plugin layer for HDF5 • Public API stays the same, but different back ends can be implemented • REST VOL substitutes REST API requests for file i/o actions • C/Fortran applications should be able to run as is • Some features not implemented yet: • VLEN support • Large read/write support (selections >100mb) • Downloadable from: https://github.com/HDFGroup/vol-rest
  • 13. 13Command Line Interface (CLI) • Accessing HDF via a service means one can’t utilize usual shell commands: ls, rm, chmod, etc. • Command line tools are a set of simple apps to use instead: • hsinfo: display server version, connect info • hsls: list content of folder or file • hstouch: create folder or file • hsdel: delete a file • hsload: upload an HDF5 file • hsget: download content from server to an HDF5 file • hsacl: create/list/update ACLs (Access Control Lists) • Hsdiff: compare HDF5 file with sharded representation • Implemented in Python & uses h5pyd • Note: data is round-tripable: • HDF5 File hsload  HSDS store  hsget  HDF5 file
  • 14. 14Supporting traditional HDF5 files • If you have HDF5 files already stored in the cloud, they can be accessed by HDF Server • Rather than converting the entire file to the HDF Schema, just the metadata needs to be imported (typically <1% of the file) • Dataset reads are converted to S3 Range Gets on the stored file • The hsload CLI tool has an option (--link ) for loading file metadata • It is also possible to construct a server file that aggregates multiple stored files (similar to how the HDF5 library VDS feature works) We’ve discussed three aspects of HDF: the data model, API, and file format. With HSDS we’ve kept the data model and API, but the file format is radically different. But maybe you have a PB or two of HDF5 files you’d like to use…
  • 15. 15New HSDS features HSDS version 0.6 is coming soon… What’s new: • POSIX Support – Store content on regular disk drives • Azure • Azure Blob support – Support for Azure’s object storage format • AKS (Azure Kubernetes) – Run in Azure’s managed Kubernetes • Active Directory authentication – Authenticate via AD • AWS • Added support for AWS Lambda • DC/OS – support for DC/OS (Apache Mesos) distributed system • Domain checksums – verify when any content changes • Role Based Access Control (RBAC) – manage ACLs for user groups Complete list is here: https://github.com/HDFGroup/hsds/issues/47
  • 16. 16HSDS Platforms POSIX Filesystem HSDS can be run on most container management systems: Using different supported storage systems:
  • 17. 17AWS Lambda Functions • HSDS can parallelize requests across all the available backend (“DN”) nodes on the server • AWS Lambda is a new service that enables you to run requests ”serverless” • Pay for just cpu-seconds the function runs • By incorporating Lambda, some HDF Server requests can parallelize across a 1000 Lambda functions (equivalent to a 1000 container server) • Will dramatically speed up time-series selections
  • 18. 18Kita Lab • Kita Lab is a JupyterLab and HDF server environment hosted by the HDF Group on AWS • Kita Lab users can create Python notebooks that use h5pyd to connect to HDF Server • Each user gets equivalent to 2-core Xeon Server and 10GB local storage • Users can use up to 100GB of data on HDF Server • Sign up here: https://www.hdfgroup.org/hdfkitalab/ User’s container and EBS volume User User logs into Jupyter Hub JupyterHub spawns new container at login HSDS on Kubernetes S3 Bucket
  • 19. 19Futures • Sometimes you’d rather do without a server and talk to the storage system directly: • Don’t want to deal with setting up service • Don’t want to worry about scaling service up and down with client load • You don’t need the synchronization (e.g. managing multiple clients writing to the same dataset) that a service provides • HS Direct Access will be a new VOL connector that enables this for the HDF5 library • Will take advantage of multiple cores • Uses same schema as HSDS (and can be used in conjunction with HSDS) Design doc is here: https://github.com/HDFGroup/hsds/blob/master/docs/design/direct_access/direct_access.md

Editor's Notes

  • #9: Each node is implemented as a docker container