SlideShare a Scribd company logo
2024 ESIP Summer Meeting
Accessing HDF Data in
the Cloud via OPeNDAP
Web Service
Kent Yang
Software Engineer/NASA EED-3 contractor
myang6@hdfgroup.org
GOVERNMENT RIGHTS NOTICE
This work was authored by employees of The HDF Group under Contract No. 80GSFC21CA001 with the National Aeronautics and Space
Administration. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United
States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to reproduce, prepare derivative works, distribute copies to
the public, and perform publicly and display publicly, or allow others to do so, for United States Government purposes. All other rights are
reserved by the copyright owner.
©2024 Raytheon Company. All rights reserved.
Topics Overview
● Accessing HDF* Data in the Cloud via dmrpp**
● Direct IO*** Performance Improvement
● Work in progress to access NASA HDF4 and HDF-EOS****2 files
*Hierarchical Data Format
** Dataset Metadata Response Plus Plus
*** Input Output
**** Earth Observing System
Direct IO Performance Improvement Concept
HDF5
File dmrpp File NetCDF NetCDF*
File
Decompress Compress
Hyrax
Core
Pass through the data Pass through the data
HDF5
File dmrpp File NetCDF NetCDF
File
Hyrax
Core
* Network Common Data Form
General Approach
Approach with Direct IO
Hyrax Server Response Time Speed-up With Direct IO
Product
Sample
File
File Size
(MB)
Response
Time without
Direct IO
(Seconds)
Response
Time with
Direct IO
(Seconds)
Speed-up in Response
Time by using Direct IO
GHRSST* 9 2.8 0.2 14 X
TROPOMI** 292 26.6 1.8 15 X
SSMI*** 1.4 0.5 0.3 1.7 X
*: Group for High Resolution Sea Surface Temperature
**: TROPOspheric Monitoring Instrument
***: Special Sensor Microwave Imager
Big Files With Direct IO
Product
Sample
File
File Size
(GB)
Response Time
with Direct IO
(Second)
Server Response Message
Without using Direct IO
Daymet 3.5 45 The maximum response time limit(165
seconds) is exceeded.
MODIS* Derived 4 65 Insufficient memory
CH4 Level 4 11 Direct IO feature is not
used because it doesn’t
contain any compressed
variable.
The maximum response time limit(165
seconds) is exceeded.
* Moderate Resolution Imaging Spectroradiometer
Facts for the Direct IO Feature
● Hyrax will use direct IO automatically for those cases when end users
request to obtain the whole array of the selected variable(s) and
those variable(s) are compressed.
● This process is entirely transparent to the end users.
● Direct IO doesn’t work for some old dmrpp files if they don’t contain
the key information needed for using the Direct IO feature. These
dmrpp files need to be regenerated to take advantage of the Direct IO
feature.
Direct IO Performance Improvement Summary
● Can greatly reduce server computation time
● Can greatly reduce server memory usage
● Use HDF5 direct chunk IO API*s
● The feature is in the current Hyrax release
* Application Programming Interface
Accessing HDF4 and HDF-EOS2 via dmrpp
● Map HDF4 to DMR*
● Access HDF4 via dmrpp
○ Not only handle data stored in chunking and contiguous layouts
○ Also need to handle data stored in linked blocks
○ Handle HDF-EOS2/HDF4 geolocation data
■ Not stored as HDF4 variables
■ Need to calculate them based on the metadata information
■ Save the data in a proper way
* Dataset Metadata Response
Current Status
● We can successfully map and access the sample NASA HDF4 and HDF-
EOS2 products via dmrpp.
● We are still working on a better way to store HDF-EOS2/HDF4
geolocation data.
● Panoply screenshots of variable Topography
○ The identical plots show the dmrpp module can successfully access this HDF4 file
AIRS* Local HDF4 file via dmrpp’s netCDF-4 file
Local HDF4 file netCDF-4 file via dmrpp
* Atmospheric Infrared Sounder
Thank you!
This work was supported by NASA/GSFC under
Raytheon Company contract number
80GSFC21CA001

More Related Content

Similar to Access HDF Data in the Cloud via OPeNDAP Web Service (20)

PPTX
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
PDF
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
James Anderson
 
PPTX
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 
PPT
The Best Come from Fresh Ingredients: Creating CAD Files from an Enterprise S...
Safe Software
 
PPT
Status of HDF-EOS, Related Software, and Tools
The HDF-EOS Tools and Information Center
 
PPTX
HDF Update for DAAC Managers (2017-02-27)
The HDF-EOS Tools and Information Center
 
PPTX
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
PPTX
Instantaneous Replication of Build Artifacts with NetApp
NetApp
 
PDF
Dfl ddp usb3.0 data recovery equipment is upgraded again
Dolphin Data Lab
 
PPT
Real IO and Parallel NetCDF4 Performance
The HDF-EOS Tools and Information Center
 
PDF
Nicholas:hdfs what is new in hadoop 2
hdhappy001
 
PPTX
Easy Access of NASA HDF data via OPeNDAP
The HDF-EOS Tools and Information Center
 
PDF
Greenplum feature
Ahmad Yani Emrizal
 
PPT
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
The HDF-EOS Tools and Information Center
 
PDF
Understanding Hadoop
Ahmed Ossama
 
PPTX
How to Use Telegraf and Its Plugin Ecosystem
InfluxData
 
PPT
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
The HDF-EOS Tools and Information Center
 
PPTX
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
PPTX
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Intel® Software
 
PDF
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
NTT DATA OSS Professional Services
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
DataWorks Summit
 
GDG Cloud Southlake #8 Steve Cravens: Infrastructure as-Code (IaC) in 2022: ...
James Anderson
 
Hyrax: Serving Data from S3
The HDF-EOS Tools and Information Center
 
The Best Come from Fresh Ingredients: Creating CAD Files from an Enterprise S...
Safe Software
 
Status of HDF-EOS, Related Software, and Tools
The HDF-EOS Tools and Information Center
 
HDF Update for DAAC Managers (2017-02-27)
The HDF-EOS Tools and Information Center
 
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
 
Instantaneous Replication of Build Artifacts with NetApp
NetApp
 
Dfl ddp usb3.0 data recovery equipment is upgraded again
Dolphin Data Lab
 
Real IO and Parallel NetCDF4 Performance
The HDF-EOS Tools and Information Center
 
Nicholas:hdfs what is new in hadoop 2
hdhappy001
 
Easy Access of NASA HDF data via OPeNDAP
The HDF-EOS Tools and Information Center
 
Greenplum feature
Ahmad Yani Emrizal
 
Content Framework for Operational Environmental Remote Sensing Data Sets: NPO...
The HDF-EOS Tools and Information Center
 
Understanding Hadoop
Ahmed Ossama
 
How to Use Telegraf and Its Plugin Ecosystem
InfluxData
 
Access HDF5 Datasets via OPeNDAP's Data Access Protocol (DAP)
The HDF-EOS Tools and Information Center
 
Manage democratization of the data - Data Replication in Hadoop
DataWorks Summit
 
Dynamic Resolution Techniques for Intel® Processor Graphics | SIGGRAPH 2018 T...
Intel® Software
 
Spark1.0での動作検証 - Hadoopユーザ・デベロッパから見たSparkへの期待 (Hadoop Conference Japan 2014)
NTT DATA OSS Professional Services
 

More from The HDF-EOS Tools and Information Center (20)

PDF
HDF5 2.0: Cloud Optimized from the Start
The HDF-EOS Tools and Information Center
 
PDF
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
The HDF-EOS Tools and Information Center
 
PDF
Cloud-Optimized HDF5 Files - Current Status
The HDF-EOS Tools and Information Center
 
PDF
Cloud Optimized HDF5 for the ICESat-2 mission
The HDF-EOS Tools and Information Center
 
PPTX
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The HDF-EOS Tools and Information Center
 
PPTX
The State of HDF5 / Dana Robinson / The HDF Group
The HDF-EOS Tools and Information Center
 
PDF
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
PDF
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
PPTX
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
PDF
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
PPSX
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
PDF
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
PPTX
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
PDF
H5Coro: The Cloud-Optimized Read-Only Library
The HDF-EOS Tools and Information Center
 
PPTX
MATLAB Modernization on HDF5 1.10
The HDF-EOS Tools and Information Center
 
PPTX
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
PPTX
HDF for the Cloud - New HDF Server Features
The HDF-EOS Tools and Information Center
 
PPSX
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
The HDF-EOS Tools and Information Center
 
HDF5 2.0: Cloud Optimized from the Start
The HDF-EOS Tools and Information Center
 
Using a Hierarchical Data Format v5 file as Zarr v3 Shard
The HDF-EOS Tools and Information Center
 
Cloud-Optimized HDF5 Files - Current Status
The HDF-EOS Tools and Information Center
 
Cloud Optimized HDF5 for the ICESat-2 mission
The HDF-EOS Tools and Information Center
 
Upcoming New HDF5 Features: Multi-threading, sparse data storage, and encrypt...
The HDF-EOS Tools and Information Center
 
The State of HDF5 / Dana Robinson / The HDF Group
The HDF-EOS Tools and Information Center
 
Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Accessing HDF5 data in the cloud with HSDS
The HDF-EOS Tools and Information Center
 
Highly Scalable Data Service (HSDS) Performance Features
The HDF-EOS Tools and Information Center
 
Creating Cloud-Optimized HDF5 Files
The HDF-EOS Tools and Information Center
 
Accessing Cloud Data and Services Using EDL, Pydap, MATLAB
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
HDF - Current status and Future Directions
The HDF-EOS Tools and Information Center
 
H5Coro: The Cloud-Optimized Read-Only Library
The HDF-EOS Tools and Information Center
 
MATLAB Modernization on HDF5 1.10
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - Serverless HDF
The HDF-EOS Tools and Information Center
 
HDF for the Cloud - New HDF Server Features
The HDF-EOS Tools and Information Center
 
Apache Drill and Unidata THREDDS Data Server for NASA HDF-EOS on S3
The HDF-EOS Tools and Information Center
 
Ad

Recently uploaded (20)

PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
The Future of Artificial Intelligence (AI)
Mukul
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
AI Code Generation Risks (Ramkumar Dilli, CIO, Myridius)
Priyanka Aash
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Ad

Access HDF Data in the Cloud via OPeNDAP Web Service

  • 1. 2024 ESIP Summer Meeting Accessing HDF Data in the Cloud via OPeNDAP Web Service Kent Yang Software Engineer/NASA EED-3 contractor [email protected] GOVERNMENT RIGHTS NOTICE This work was authored by employees of The HDF Group under Contract No. 80GSFC21CA001 with the National Aeronautics and Space Administration. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, worldwide license to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, or allow others to do so, for United States Government purposes. All other rights are reserved by the copyright owner. ©2024 Raytheon Company. All rights reserved.
  • 2. Topics Overview ● Accessing HDF* Data in the Cloud via dmrpp** ● Direct IO*** Performance Improvement ● Work in progress to access NASA HDF4 and HDF-EOS****2 files *Hierarchical Data Format ** Dataset Metadata Response Plus Plus *** Input Output **** Earth Observing System
  • 3. Direct IO Performance Improvement Concept HDF5 File dmrpp File NetCDF NetCDF* File Decompress Compress Hyrax Core Pass through the data Pass through the data HDF5 File dmrpp File NetCDF NetCDF File Hyrax Core * Network Common Data Form General Approach Approach with Direct IO
  • 4. Hyrax Server Response Time Speed-up With Direct IO Product Sample File File Size (MB) Response Time without Direct IO (Seconds) Response Time with Direct IO (Seconds) Speed-up in Response Time by using Direct IO GHRSST* 9 2.8 0.2 14 X TROPOMI** 292 26.6 1.8 15 X SSMI*** 1.4 0.5 0.3 1.7 X *: Group for High Resolution Sea Surface Temperature **: TROPOspheric Monitoring Instrument ***: Special Sensor Microwave Imager
  • 5. Big Files With Direct IO Product Sample File File Size (GB) Response Time with Direct IO (Second) Server Response Message Without using Direct IO Daymet 3.5 45 The maximum response time limit(165 seconds) is exceeded. MODIS* Derived 4 65 Insufficient memory CH4 Level 4 11 Direct IO feature is not used because it doesn’t contain any compressed variable. The maximum response time limit(165 seconds) is exceeded. * Moderate Resolution Imaging Spectroradiometer
  • 6. Facts for the Direct IO Feature ● Hyrax will use direct IO automatically for those cases when end users request to obtain the whole array of the selected variable(s) and those variable(s) are compressed. ● This process is entirely transparent to the end users. ● Direct IO doesn’t work for some old dmrpp files if they don’t contain the key information needed for using the Direct IO feature. These dmrpp files need to be regenerated to take advantage of the Direct IO feature.
  • 7. Direct IO Performance Improvement Summary ● Can greatly reduce server computation time ● Can greatly reduce server memory usage ● Use HDF5 direct chunk IO API*s ● The feature is in the current Hyrax release * Application Programming Interface
  • 8. Accessing HDF4 and HDF-EOS2 via dmrpp ● Map HDF4 to DMR* ● Access HDF4 via dmrpp ○ Not only handle data stored in chunking and contiguous layouts ○ Also need to handle data stored in linked blocks ○ Handle HDF-EOS2/HDF4 geolocation data ■ Not stored as HDF4 variables ■ Need to calculate them based on the metadata information ■ Save the data in a proper way * Dataset Metadata Response
  • 9. Current Status ● We can successfully map and access the sample NASA HDF4 and HDF- EOS2 products via dmrpp. ● We are still working on a better way to store HDF-EOS2/HDF4 geolocation data.
  • 10. ● Panoply screenshots of variable Topography ○ The identical plots show the dmrpp module can successfully access this HDF4 file AIRS* Local HDF4 file via dmrpp’s netCDF-4 file Local HDF4 file netCDF-4 file via dmrpp * Atmospheric Infrared Sounder
  • 12. This work was supported by NASA/GSFC under Raytheon Company contract number 80GSFC21CA001