hpc.nih.gov
Data Sharing via Globus
in the NIH Intramural Program
Susan Chacko
High Performance Computing
National Institutes of Health
hpc.nih.gov
hpc.nih.gov
The NIH intramural program’s large-scale high-performance computing resource
completely dedicated to biomedical computing
• High availability and high data durability
• Designed for general-purpose scientific computing (not dedicated to any single application type)
• Dedicated staff with expertise in high-performance computing and computational biology
Biowulf: the NIH Intramural Program HPC system
hpc.nih.gov
95,000 compute cores
560 GPUs
35 PB storage
640 Principal Investigators (labs)
2200 Users
650+ Scientific Applications
Biowulf in 2019
hpc.nih.gov
Globus -- 2014
hpc.nih.gov
TB
Outbound data
Inbound data
8 DTNsSingle host
Globus Transfers since 2014
20192018201720162015
20192018201720162015
50
100
150
200
250
50
100
150
200
250
Outbound data
Inbound data
20192018201720162015
hpc.nih.gov
Globus Transfers in the last year
~ 3 PB of biomedical data transferred
450 unique users
2000 unique hosts
High Points
24 million files in Oct 2018
300 TB in March 2019
NIH site license
~20 Endpoints at NIH
hpc.nih.gov
Data Sharing via Globus
Many NIH researchers have outside & international collaborators
Globus shares
Oct 2018: use Globus SDK -> get list of user shares
1900+ user shares via Globus on the NIH HPC Systems!!
hpc.nih.gov
Globus Shares on NIH HPC
~ 50 new
shares/week
35% defunct
shares
hpc.nih.gov
Shares per User
~100 users with
1 share each
8 users with
> 100 shares each
hpc.nih.gov
Data Sharing via Globus
NCI Sequencing Core Facility
- serves 150 labs and collaborators
NICHD Sequencing Facility
- serves 11 labs
- 10,000 samples sequenced and shared since 2014
- 150 TB data shared off NIH HPC in 2018
- additional data shared off their own Globus endpoint
- transfers ~ 15 TB /year
hpc.nih.gov
Wishlist
• Admin ability to delete endpoints
• Admin ability to prohibit ‘world-write’ shared endpoints
(and maybe ‘world-read’ as well)
• Admin ability to get ‘create date’ for share
• Users who set up a shared endpoint would like to know when data has
been downloaded

Data Sharing via Globus in the NIH Intramural Program

  • 1.
    hpc.nih.gov Data Sharing viaGlobus in the NIH Intramural Program Susan Chacko High Performance Computing National Institutes of Health
  • 2.
  • 3.
    hpc.nih.gov The NIH intramuralprogram’s large-scale high-performance computing resource completely dedicated to biomedical computing • High availability and high data durability • Designed for general-purpose scientific computing (not dedicated to any single application type) • Dedicated staff with expertise in high-performance computing and computational biology Biowulf: the NIH Intramural Program HPC system
  • 4.
    hpc.nih.gov 95,000 compute cores 560GPUs 35 PB storage 640 Principal Investigators (labs) 2200 Users 650+ Scientific Applications Biowulf in 2019
  • 5.
  • 6.
    hpc.nih.gov TB Outbound data Inbound data 8DTNsSingle host Globus Transfers since 2014 20192018201720162015 20192018201720162015 50 100 150 200 250 50 100 150 200 250 Outbound data Inbound data 20192018201720162015
  • 7.
    hpc.nih.gov Globus Transfers inthe last year ~ 3 PB of biomedical data transferred 450 unique users 2000 unique hosts High Points 24 million files in Oct 2018 300 TB in March 2019 NIH site license ~20 Endpoints at NIH
  • 8.
    hpc.nih.gov Data Sharing viaGlobus Many NIH researchers have outside & international collaborators Globus shares Oct 2018: use Globus SDK -> get list of user shares 1900+ user shares via Globus on the NIH HPC Systems!!
  • 9.
    hpc.nih.gov Globus Shares onNIH HPC ~ 50 new shares/week 35% defunct shares
  • 10.
    hpc.nih.gov Shares per User ~100users with 1 share each 8 users with > 100 shares each
  • 11.
    hpc.nih.gov Data Sharing viaGlobus NCI Sequencing Core Facility - serves 150 labs and collaborators NICHD Sequencing Facility - serves 11 labs - 10,000 samples sequenced and shared since 2014 - 150 TB data shared off NIH HPC in 2018 - additional data shared off their own Globus endpoint - transfers ~ 15 TB /year
  • 12.
    hpc.nih.gov Wishlist • Admin abilityto delete endpoints • Admin ability to prohibit ‘world-write’ shared endpoints (and maybe ‘world-read’ as well) • Admin ability to get ‘create date’ for share • Users who set up a shared endpoint would like to know when data has been downloaded