“NRP Application Drivers”
Presentation
4th
National Research Platform (4NRP) Workshop
February 9, 2023
1
Dr. Larry Smarr
Founding Director Emeritus, California Institute for Telecommunications and Information Technology;
Distinguished Professor Emeritus, Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
http://lsmarr.calit2.net
Rotating Storage
4000 TB
2023: NRP’s Nautilus is a Multi-Institution National to Global Scale Hypercluster
Connected by Optical Networks
~200 FIONAs on 25 Partner Campuses
Networked Together at 10-100Gbps
Feb 9, 2023
Grafana Graphs Nautilus Namespaces Usage
Calendar 2022 GPUs
900
Grafana Graphs Nautilus Namespaces Usage
Calendar 2022 CPU Cores
7,000
2022 Nautilus Namespace Users:
Largest User is One Million Times Smallest!
osg-opportunistic
ucsd-haosulab
osg-icecube
ucsd-ravigroup
cms-ml
braingeneers
Nautilus Namespaces
Using >10 GPU-hrs/year
Or >10 CPU-hrs/year
wifire-quicfire
I Will Look in Detail at the
Namespaces in Red
digits
The New Pacific Research Platform Video
Highlights 3 Different Applications Out of 800 Nautilus Namespace Projects
Pacific Research Platform Video:
https://nationalresearchplatform.org/media/pacific-research-platform-video/
2015 PRP Grant Was Science-Driven:
Connecting Multi-Campus Application Teams and Devices
Earth
Sciences
UC San Diego UCBerkeley UC Merced
What Are
The Largest 2022
PRP Users
in Each Area?
The Open Science Grid (OSG)
Has Been Integrated With the PRP
In aggregate ~ 200,000 Intel x86 cores
used by ~400 projects
Source: Frank Würthwein,
OSG Exec Director; PRP co-PI; UCSD/SDSC OSG Federates ~100 Clusters Worldwide
All OSG User
Communities
Use HTCondor for
Resource Orchestration
SDSC
U.Chicago
FNAL
Caltech
Distributed
OSG Petabyte
Storage Caches
The Open Science Grid (OSG) Delivers to Over 50 Fields of Science
2.6 Billion Core-Hours Per Year of Distributed High Throughput Computing
NCSA Delivered
~35,000 Core-Hours
Per Year in 1990
https://gracc.opensciencegrid.org/dashboard/db/gracc-home
CMS
ATLAS
PRP’s Nautilus Appears
as Just Another OSG Resource
Nautilus Namespace osg-opportunistic Supported a Wide Set of Applications
As the Largest Consumer of CPU Core-Hours in 2022
3,500
Source: Igor Sfiligoi, SDSC
3.7 Million CPU Core-Hours
Peaking at 3500 CPU Cores
osg-opportunistic runs fully in low-priority mode,
using only PRP CPU cycles
that would otherwise be unused.
Particle Physics
Bringing Machine Learning to Particle Physics
A new particle was
discovered in 2012
The “holy grail” of the LHC program today is measurement of di-higgs
production to infer the hhh coupling that determines the higgs potential
𝛌
Source: Frank Wuerthwein, SDSC
ML Inference as a Service on NRP
13
Raghav Kansal (grad. Stud. UCSD) runs ~1,000 CPU jobs calling out to
~10 GPUs on NRP for inference for his ML model for hh search.
80M events inferenced, sending 1.3TB of data from CPUs to GPUs in 3h
The ML model is too large to fit into the DRAM of the CPUs.
Fastest way to get the job done is “ML Inference as a service” on NRP
~4MB/s output from GPUs
~200MB/s input to GPUs
See Talk by
Shih-Chieh Hsu
4NRP Friday
Source: Frank Wuerthwein, SDSC
Namespace cms-ml Was the
4th
Largest Consumer of Nautilus GPU-Hours in 2022
157,571 GPU-Hours
Peaking at 130 GPU
PI Frank Wuerthwein, UCSD
Telescopes
Co-Existence of Interactive and
Non-Interactive Computing on PRP
GPU Simulations Needed to Improve Ice Model.
=> Results in Significant Improvement
in Pointing Resolution for Multi-Messenger Astrophysics
NSF Large-Scale Observatories Are Using PRP and OSG
as a Cohesive, Federated, National-Scale Research Data Infrastructure
IceCube Peaked at
560 GPUs in 2022!
Namespace osg-icecube
Was the Largest Consumer of Nautilus GPU-Hours in 2022
0.8 Million GPU-Hours
Peaking at 560 GPUs
osg-icecube also runs fully in low-priority mode,
using only PRP GPU cycles
that would otherwise be unused.
OSG GPU
Consumers
OSG GPU
Providers
In 2022 Icecube was the Largest consumer of OSG GPU-Hours
and PRP was the Largest Supplier of GPU-Hours to OSG
https://gracc.opensciencegrid.org/d/ujFlp3vVz/gpu-payload-jobs
Laser Interferometer Gravitational-Wave Observatory (LIGO)
Uses Nautilus/OSG Data Cyberinfrastructure
• LIGO Runs Their Production Rucio Data Management System on Nautilus
– Rucio is the De-Facto Data Management System for Many Large Instruments, LIGO, LHC, …
– LIGO Continues to be One of the Major Users of the OSG Caching Infrastructure (A.K.A.
Stashcache), Which is Deployed Mostly as PRP-Managed Kubernetes Pods.
• LIGO Does Not Use Much PRP Compute Given Their Dedicated Infrastructure
PRP Supports Radio Telescope Through Partnering with
CASPER: the Collaboration for Astronomy Signal Processing and Electronics Research
PRP Access Has Allowed CASPER
to Expand in Several Aspects:
• PRP Portal to CASPER Tools/Libraries
Was Developed by PRP’s John Graham
• The PRP Team Added FPGAs to Nautilus
FIONAs with the CASPER Software Stack
• Nautilus JupyterHub Used for FPGA Training
• Optical Fiber Connected Data Storage
Source: Dan Werthimer
SETI Chief Scientist, UC Berkeley
SETI.berkeley.edu, CASPER.berkeley.edu
Xilinx, Intel, Fujitsu, HP,
Nvidia,
NSF, NASA, NRAO, NAIC
The CASPER Collaboration of ~1000 Members
and 50 Radio-Astronomy Instruments Worldwide
to Develop Open-Source
Signal Processing and Instrumentation Pipelines,
Primarily using FPGAs and GPUs.
Radio Telescopes include:
• Event Horizon Telescope
• Square Kilometer Array
• Very Large Array
https://casper.berkeley.edu/
PRP Portal to CASPER Tools/Libraries
Developed by PRP’s John Graham, UCSD
See John Graham’s CASPER 2021 Workshop Talk and Tutorial:
https://casper.berkeley.edu/index.php/casper-workshop-2021/agenda/
CASPER designs,
compiles, tests
and evaluates
instrumentation
on the PRP,
then deploys
dedicated FPGA
and GPU clusters
at the
observatories
Discoveries Made with CASPER-Enabled Instrumentation
Radio Image
of a Black Hole
Fast Radio Bursts
Weighing the Universe
Pulsar Timing
Gravitational Waves
Diamond Planet Protheses Control
Neutron Imaging
Source: Dan Werthimer, UC Berkeley
Biomedical
OpenForceField Uses OPEN Software, OPEN Data, OPEN Science
and PRP to Generate Quantum Chemistry Datasets for Druglike Molecules
www.openforcefield.
OFF Open-Source Models are Used in Drug Discovery,
Including in the COVID-19 Computing on Folding@Home.
OFF Runs Quantum Mechanical Computations on Many Molecules
to Determine Their Optimized Force Fields
50% of OFF compute is run on Nautilus.
PRP is Capable of Running Millions of Quantum Chemistry Workloads
www.openforcefield.org
OpenFF-1.0.0 released OpenFF-2.0.0 released
OpenFF begins using Nautilus
We run "workers" that pull down QC jobs
for computation from a central project queue.
These jobs require between minutes and hours,
and results are uploaded to the
central, public QCArchive server.
Workers are deployed from Docker images and
scheduled on PRP's Kubernetes system. Due to
the short job duration, these deployments can still
be effective if interrupted every few hours.
OFF Was the Top Nautilus CPU Core Consumer
in 2020 & 2021, 4th
Highest in 2022
7.6 Million CPU Core-Hours
(2020-2022)
Peaking at 1300 CPU Cores
OFF Datasets Consist of Hundreds to Millions of Jobs,
Each Requiring Tens to Thousands of CPU-Hours and 8-32 GB of RAM
Dataset listing: https://qcarchive.molssi.org/apps/ml_datasets/
Python example notebooks for data access: https://qcarchive.molssi.org/examples/
OpenFF’s dataset lifecycle: https://github.com/openforcefield/qca-dataset-submission/projects/1
The OFF Datasets on QCArchive
are Fully Open!
Nautilus Namespace tempredict Utilized PRP to Compute
COVID-19 and Vaccine Responses ~65K Participants
Purawat et al., IEEE Big Data, 2021
Mason et al., Sci Rep, 2021
Mason et al., Vaccines, 2022
Source: Prof. Benjamin Smarr, UCSD
Nautilus Namespace braingeneers: One of the Most Advanced PRP projects -
Uses Optical Fiber Connected Shared Storage, CPUs & GPUs
https://cenic.org/blog/prp-boosts-inter-campus-collaboration-on-brain-research
UCSC/Hengenlab Data Analysis Pipeline Using PRP
Hengenlab
UWSL
PRP/S3
Results
PRP
Compute
CNN
Source: David Parks, UCSC; braingeneers PI David Haussler
Multiple Worker Processes
Circulate Data
in a 50GB Cache
Sampling Strategy
for braingeneers TB+ data
PRP/S3
PRP
Compute
Jobs Local
NVMe
Model Training
Operates
on the Local Cache
Results
are Returned
to S3
Source: David Parks, UCSC; braingeneers PI David Haussler
UCSC, UCSF & WUSL Are Collaborating
To Grow Human Cerebral Organoids and Measure Their Neural Activity
Tetrodes
Multi Electrode Array Silicon Probes
Source: David Parks, UCSC; braingeneers PI David Haussler
Goal: For Every Human Brain Slice, Grow 1000 Organoids,
And For Every Organoid, Compute 1000 Simulated Organoids
From Neural Activity in Living Mouse Brain
Human
To Neural Activity in Human Brain Organoids
Source: David Parks, UCSC; braingeneers PI David Haussler
Nautilus Namespace braingeneers
Was The 3rd
Largest Consumer of CPU Core-Hours in 2022
57,000 GPU-Hours
Peaking at 110 GPUs
950,000 CPU Core-Hours
Peaking at 2000 CPU Cores
https://braingeneers.ucsc.edu/
NeuroKube: An Automated Neuroscience Reconstruction Framework
Uses Nautilus for Large-Scale Processing & Labeling of Neuroimage Volumes
Figures 2, 4, & 5 in “NeuroKube:
An Automated and Autoscaling Neuroimaging Reconstruction Framework
Using Cloud Native Computing and A.I.,”
Matthew Madany, et al. (IEEE Big Data ’20, pp. 320-330)
Computer Vision-Based Approach
Provides the Potential to Automatically Generate Labels Using ML
Subset of Neurites from
Cerebellum Neuropil
Extracted & Rendered
in 3D with Structures
of Interest Labeled
Figures 1 & 14 in “NeuroKube:
An Automated and Autoscaling
Neuroimaging Reconstruction
Framework using
Cloud Native Computing
and A.I.,”
Matthew Madany, et al.
(accepted to IEEE Big Data ’20)
Volumetric Electron Microscopy (VEM)
Data with Colorized Labels
Earth Sciences
NSF-Funded WIFIRE Uses PRP/CENIC to Couple Wireless Edge Sensors
With Supercomputers, Enabling Fire Modeling Workflows
Landscape data
WIFIRE Firemap
Fire Perimeter
Source: Ilkay Altintas, SDSC
Real-Time
Meteorological Sensors
Weather Forecasts
Work Flow
PRP
WIFIRE’s Firemap Provides Public Website
Combining Satellite Fire Detections with GIS
SoCal Wildfires Sept 6,
2022
PRP is Building on NSF-Funded SAGE Technology
to Bring ML/AI to the Edge For Smoke Plume Detection
Source: Charlie Catlett, Pete Beckman, Argonne National Lab
Source: Ilkay Altinas, SDSC, HDSI
Training Data: Archive of
25,000 Labeled Wireless Camera Images
of Wildland Fires
www.mdpi.com/2072-4292/14/4/1007
PRP namespace digits
Nautilus Namespace wifire-quicfire was the 25th
Largest 2022 Consumer of CPU Core-Hours;
digits was the 14th Largest GPU Consumer
wifire-quicfire
108,000 CPU Core-Hours
Peaking at 360 CPU Cores
digits
40,700 GPU-Hours
Peaking at 18 GPUs
Visualization and Virtual Reality
2017: PRP 20Gbps Connection of UCSD SunCAVE and UCM WAVE Over CENIC
2018-2019: Added Their 90 GPUs to PRP for Machine Learning Computations
Leveraging UCM Campus Funds and NSF CNS-1456638 & CNS-1730158 at UCSD
UC Merced WAVE (20 Screens, 20 GPUs) UCSD SunCAVE (70 Screens, 70 GPUs)
See These VR Facilities in Action in the PRP Video
PRP Has Been Bringing Machine Learning to Building Virtual Worlds,
Including Robotics and Autonomous Vehicles
• Goal: Train Robots That Can Manipulate Arbitrary Objects
o Open Drawer, Turn Faucet, Stack Cube, Pull Chair,
Pour Water, Pick And Place, Hang Ropes, Make Dough, …
(video)
UCSD’s Hao Su Lab
Uses NRP GPUs for
3D Deep Learning and
Embodied AI
(Robotics and
Self-Driving Cars)
Namespace ucsd-haosulab
Consumed the 2nd
Most Nautilus GPU-Hours in 2022 (1st
is Icecube)
585,170 GPU-Hours
Peaking at 150 GPUs
A Major Project in UCSD’s Hao Su Lab
is Large-Scale Robot Learning
• We Build A Digital Twin of The Real World in Virtual Reality (VR) For Object
Manipulation
• Agents Evolve In VR
o Specialists (Neural Nets) Learn Specific Skills
by Trial and Error
o Generalists (Neural Nets) Distill Knowledge
to Solve Arbitrary Tasks
• On Nautilus:
o Hundreds of specialists
have been trained
o Each specialist is trained in
millions of environment
variants
o ~10,000 GPU hours per run
Source: Prof. Hao Su, UCSD
UCSD’s Ravi Group: How to Create Visually Realistic
3D Objects or Dynamic Scenes in VR or the Metaverse
Source: Prof. Ravi Ramamoorthi, UCSD
Machine Learning Using NRP GPUs
Transforms a Series of 2D Images
Into a 3D View Synthesis
Machine Learning-Based
Neural Radiance Fields for View Synthesis (NeRFs) Are Transformational!
BY JARED LINDZON
NOVEMBER 10, 2022
A neural radiance field (NeRF) is
a fully-connected neural network
that can generate
novel views of complex 3D scenes,
based on a partial set of 2D images.
https://datagen.tech/guides/synthetic-data/neural-radiance-field-nerf/ Source: Prof. Ravi Ramamoorthi, UCSD
https://youtu.be/hvfV-iGwYX8
Namespace ucsd-ravigroup
Consumed the 3nd
Most Nautilus GPU-Hours in 2022
200,000 GPU-Hours
Peaking at 122 GPUs
• Much of the compute involves training computationally expensive NeRFs.
• Training time to learn a representation of a single scene on a GPU can vary from seconds to a day.
• NeRFs that can see behind occlusions may require a week of training on 8 GPUs simultaneously.
Source: Alexander Trevithick, UCSD Ravi Group
2022-2026 NRP Future: PRP Federates with
NSF-Funded Prototype National Research Platform
NSF Award OAC #2112167 (June 2021) [$5M Over 5 Years]
PI Frank Wuerthwein (UCSD, SDSC)
Co-PIs Tajana Rosing (UCSD), Thomas DeFanti (UCSD),
Mahidhar Tatineni (SDSC), Derek Weitzel (UNL)
https://nationalresearchplatform.org/

More Related Content

PPTX
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
PPTX
The National Research Platform Enables a Growing Diversity of Users and Appl...
PPTX
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
PPTX
PRP, CHASE-CI, TNRP and OSG
PPTX
Looking Back, Looking Forward NSF CI Funding 1985-2025
PPTX
Toward a National Research Platform to Enable Data-Intensive Computing
PPTX
The PRP and Its Applications
PPTX
Rise of AI/ML applications on the National Research Platform
Toward a National Research Platform to Enable Data-Intensive Open-Source Sci...
The National Research Platform Enables a Growing Diversity of Users and Appl...
The Pacific Research Platform- a High-Bandwidth Distributed Supercomputer
PRP, CHASE-CI, TNRP and OSG
Looking Back, Looking Forward NSF CI Funding 1985-2025
Toward a National Research Platform to Enable Data-Intensive Computing
The PRP and Its Applications
Rise of AI/ML applications on the National Research Platform

Similar to National Research Platform: Application Drivers (20)

PPTX
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
PPTX
The Pacific Research Platform Connects to CSU San Bernardino
PPTX
The Pacific Research Platform: The First Six Years
PPTX
The Pacific Research Platform
 Two Years In
PPTX
Creating a Science-Driven Big Data Superhighway
PPTX
The Pacific Research Platform
PPTX
National Federated Compute Platforms: The Pacific Research Platform
PDF
GRP 19 - Nautilus, IceCube and LIGO
PPTX
Toward a National Research Platform
PPTX
PRP, NRP, GRP & the Path Forward
PPTX
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
PPTX
The PRP and Its Applications - Nautilus and the National Research Platform
PPTX
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
PPTX
Berkeley cloud computing meetup may 2020
PPTX
Security Challenges and the Pacific Research Platform
PPTX
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
PPTX
The Pacific Research Platform: Leading Up to the National Research Platform
PPTX
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
PPTX
Global Research Platforms: Past, Present, Future
PPTX
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Connects to CSU San Bernardino
The Pacific Research Platform: The First Six Years
The Pacific Research Platform
 Two Years In
Creating a Science-Driven Big Data Superhighway
The Pacific Research Platform
National Federated Compute Platforms: The Pacific Research Platform
GRP 19 - Nautilus, IceCube and LIGO
Toward a National Research Platform
PRP, NRP, GRP & the Path Forward
The Pacific Research Platform: Building a Distributed Big Data Machine Learni...
The PRP and Its Applications - Nautilus and the National Research Platform
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Berkeley cloud computing meetup may 2020
Security Challenges and the Pacific Research Platform
The Pacific Research Platform: A Regional-Scale Big Data Analytics Cyberinfra...
The Pacific Research Platform: Leading Up to the National Research Platform
The Pacific Research Platform: Building a Distributed Big-Data Machine-Learni...
Global Research Platforms: Past, Present, Future
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Ad

More from Larry Smarr (20)

PPTX
Revealing the Dynamics of an Individual’s Gut Microbiome Dynamics
PPTX
Smart Patients, Big Data, NextGen Primary Care
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPTX
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
PPT
From Supercomputing to the Grid - Larry Smarr
PPTX
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
PPT
Redefining Collaboration through Groupware - From Groupware to Societyware
PPT
The Coming of the Grid - September 8-10,1997
PPT
Supercomputers: Directions in Technology, Architecture, and Applications
PPT
High Performance Geographic Information Systems
PPT
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
PPT
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
PPTX
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
PPTX
The CENIC-AI Resource: The Right Connection
PPTX
The NSF Grants Leading Up to CHASE-CI ENS
PPTX
Digital Twins of Physical Reality - Future in Review
PPTX
Larry Smarr’s Prostate Cancer Early Detection and Focal Therapy
PPTX
The Increasing Use of the National Research Platform by the CSU Campuses
PPTX
The CENIC-Connected Cyberinfrastructure Commons: Enabling AI for Research and...
PPTX
The Rise of Supernetwork Data Intensive Computing
Revealing the Dynamics of an Individual’s Gut Microbiome Dynamics
Smart Patients, Big Data, NextGen Primary Care
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
Internet2 and QUILT Initiatives with Regional Networks -6NRP Larry Smarr and ...
From Supercomputing to the Grid - Larry Smarr
The CENIC-AI Resource - Los Angeles Community College District (LACCD)
Redefining Collaboration through Groupware - From Groupware to Societyware
The Coming of the Grid - September 8-10,1997
Supercomputers: Directions in Technology, Architecture, and Applications
High Performance Geographic Information Systems
Data Intensive Applications at UCSD: Driving a Campus Research Cyberinfrastru...
Enhanced Telepresence and Green IT — The Next Evolution in the Internet
The CENIC AI Resource CENIC AIR - CENIC Retreat 2024
The CENIC-AI Resource: The Right Connection
The NSF Grants Leading Up to CHASE-CI ENS
Digital Twins of Physical Reality - Future in Review
Larry Smarr’s Prostate Cancer Early Detection and Focal Therapy
The Increasing Use of the National Research Platform by the CSU Campuses
The CENIC-Connected Cyberinfrastructure Commons: Enabling AI for Research and...
The Rise of Supernetwork Data Intensive Computing
Ad

Recently uploaded (20)

PPTX
AI-Reporting for Emerging Technologies(BS Computer Engineering)
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PPTX
Wireless sensor networks (WSN) SRM unit 2
PDF
Design of Material Handling Equipment Lecture Note
PDF
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
PDF
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
PDF
August 2025 - Top 10 Read Articles in Network Security & Its Applications
PDF
MLpara ingenieira CIVIL, meca Y AMBIENTAL
PDF
Beginners-Guide-to-Artificial-Intelligence.pdf
PPTX
Software Engineering and software moduleing
PPTX
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
PPTX
CT Generations and Image Reconstruction methods
PDF
Computer System Architecture 3rd Edition-M Morris Mano.pdf
PPTX
Principal presentation for NAAC (1).pptx
PDF
Present and Future of Systems Engineering: Air Combat Systems
DOCX
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
PPTX
Unit_1_introduction to surveying for diploma.pptx
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
AI-Reporting for Emerging Technologies(BS Computer Engineering)
Management Information system : MIS-e-Business Systems.pptx
UEFA_Carbon_Footprint_Calculator_Methology_2.0.pdf
distributed database system" (DDBS) is often used to refer to both the distri...
Wireless sensor networks (WSN) SRM unit 2
Design of Material Handling Equipment Lecture Note
Unit I -OPERATING SYSTEMS_SRM_KATTANKULATHUR.pptx.pdf
AIGA 012_04 Cleaning of equipment for oxygen service_reformat Jan 12.pdf
August 2025 - Top 10 Read Articles in Network Security & Its Applications
MLpara ingenieira CIVIL, meca Y AMBIENTAL
Beginners-Guide-to-Artificial-Intelligence.pdf
Software Engineering and software moduleing
A Brief Introduction to IoT- Smart Objects: The "Things" in IoT
CT Generations and Image Reconstruction methods
Computer System Architecture 3rd Edition-M Morris Mano.pdf
Principal presentation for NAAC (1).pptx
Present and Future of Systems Engineering: Air Combat Systems
ENVIRONMENTAL PROTECTION AND MANAGEMENT (18CVL756)
Unit_1_introduction to surveying for diploma.pptx
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK

National Research Platform: Application Drivers

  • 1. “NRP Application Drivers” Presentation 4th National Research Platform (4NRP) Workshop February 9, 2023 1 Dr. Larry Smarr Founding Director Emeritus, California Institute for Telecommunications and Information Technology; Distinguished Professor Emeritus, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD http://lsmarr.calit2.net
  • 2. Rotating Storage 4000 TB 2023: NRP’s Nautilus is a Multi-Institution National to Global Scale Hypercluster Connected by Optical Networks ~200 FIONAs on 25 Partner Campuses Networked Together at 10-100Gbps Feb 9, 2023
  • 3. Grafana Graphs Nautilus Namespaces Usage Calendar 2022 GPUs 900
  • 4. Grafana Graphs Nautilus Namespaces Usage Calendar 2022 CPU Cores 7,000
  • 5. 2022 Nautilus Namespace Users: Largest User is One Million Times Smallest! osg-opportunistic ucsd-haosulab osg-icecube ucsd-ravigroup cms-ml braingeneers Nautilus Namespaces Using >10 GPU-hrs/year Or >10 CPU-hrs/year wifire-quicfire I Will Look in Detail at the Namespaces in Red digits
  • 6. The New Pacific Research Platform Video Highlights 3 Different Applications Out of 800 Nautilus Namespace Projects Pacific Research Platform Video: https://nationalresearchplatform.org/media/pacific-research-platform-video/
  • 7. 2015 PRP Grant Was Science-Driven: Connecting Multi-Campus Application Teams and Devices Earth Sciences UC San Diego UCBerkeley UC Merced What Are The Largest 2022 PRP Users in Each Area?
  • 8. The Open Science Grid (OSG) Has Been Integrated With the PRP In aggregate ~ 200,000 Intel x86 cores used by ~400 projects Source: Frank Würthwein, OSG Exec Director; PRP co-PI; UCSD/SDSC OSG Federates ~100 Clusters Worldwide All OSG User Communities Use HTCondor for Resource Orchestration SDSC U.Chicago FNAL Caltech Distributed OSG Petabyte Storage Caches
  • 9. The Open Science Grid (OSG) Delivers to Over 50 Fields of Science 2.6 Billion Core-Hours Per Year of Distributed High Throughput Computing NCSA Delivered ~35,000 Core-Hours Per Year in 1990 https://gracc.opensciencegrid.org/dashboard/db/gracc-home CMS ATLAS PRP’s Nautilus Appears as Just Another OSG Resource
  • 10. Nautilus Namespace osg-opportunistic Supported a Wide Set of Applications As the Largest Consumer of CPU Core-Hours in 2022 3,500 Source: Igor Sfiligoi, SDSC 3.7 Million CPU Core-Hours Peaking at 3500 CPU Cores osg-opportunistic runs fully in low-priority mode, using only PRP CPU cycles that would otherwise be unused.
  • 12. Bringing Machine Learning to Particle Physics A new particle was discovered in 2012 The “holy grail” of the LHC program today is measurement of di-higgs production to infer the hhh coupling that determines the higgs potential 𝛌 Source: Frank Wuerthwein, SDSC
  • 13. ML Inference as a Service on NRP 13 Raghav Kansal (grad. Stud. UCSD) runs ~1,000 CPU jobs calling out to ~10 GPUs on NRP for inference for his ML model for hh search. 80M events inferenced, sending 1.3TB of data from CPUs to GPUs in 3h The ML model is too large to fit into the DRAM of the CPUs. Fastest way to get the job done is “ML Inference as a service” on NRP ~4MB/s output from GPUs ~200MB/s input to GPUs See Talk by Shih-Chieh Hsu 4NRP Friday Source: Frank Wuerthwein, SDSC
  • 14. Namespace cms-ml Was the 4th Largest Consumer of Nautilus GPU-Hours in 2022 157,571 GPU-Hours Peaking at 130 GPU PI Frank Wuerthwein, UCSD
  • 16. Co-Existence of Interactive and Non-Interactive Computing on PRP GPU Simulations Needed to Improve Ice Model. => Results in Significant Improvement in Pointing Resolution for Multi-Messenger Astrophysics NSF Large-Scale Observatories Are Using PRP and OSG as a Cohesive, Federated, National-Scale Research Data Infrastructure IceCube Peaked at 560 GPUs in 2022!
  • 17. Namespace osg-icecube Was the Largest Consumer of Nautilus GPU-Hours in 2022 0.8 Million GPU-Hours Peaking at 560 GPUs osg-icecube also runs fully in low-priority mode, using only PRP GPU cycles that would otherwise be unused. OSG GPU Consumers OSG GPU Providers In 2022 Icecube was the Largest consumer of OSG GPU-Hours and PRP was the Largest Supplier of GPU-Hours to OSG https://gracc.opensciencegrid.org/d/ujFlp3vVz/gpu-payload-jobs
  • 18. Laser Interferometer Gravitational-Wave Observatory (LIGO) Uses Nautilus/OSG Data Cyberinfrastructure • LIGO Runs Their Production Rucio Data Management System on Nautilus – Rucio is the De-Facto Data Management System for Many Large Instruments, LIGO, LHC, … – LIGO Continues to be One of the Major Users of the OSG Caching Infrastructure (A.K.A. Stashcache), Which is Deployed Mostly as PRP-Managed Kubernetes Pods. • LIGO Does Not Use Much PRP Compute Given Their Dedicated Infrastructure
  • 19. PRP Supports Radio Telescope Through Partnering with CASPER: the Collaboration for Astronomy Signal Processing and Electronics Research PRP Access Has Allowed CASPER to Expand in Several Aspects: • PRP Portal to CASPER Tools/Libraries Was Developed by PRP’s John Graham • The PRP Team Added FPGAs to Nautilus FIONAs with the CASPER Software Stack • Nautilus JupyterHub Used for FPGA Training • Optical Fiber Connected Data Storage Source: Dan Werthimer SETI Chief Scientist, UC Berkeley SETI.berkeley.edu, CASPER.berkeley.edu Xilinx, Intel, Fujitsu, HP, Nvidia, NSF, NASA, NRAO, NAIC The CASPER Collaboration of ~1000 Members and 50 Radio-Astronomy Instruments Worldwide to Develop Open-Source Signal Processing and Instrumentation Pipelines, Primarily using FPGAs and GPUs. Radio Telescopes include: • Event Horizon Telescope • Square Kilometer Array • Very Large Array https://casper.berkeley.edu/
  • 20. PRP Portal to CASPER Tools/Libraries Developed by PRP’s John Graham, UCSD See John Graham’s CASPER 2021 Workshop Talk and Tutorial: https://casper.berkeley.edu/index.php/casper-workshop-2021/agenda/ CASPER designs, compiles, tests and evaluates instrumentation on the PRP, then deploys dedicated FPGA and GPU clusters at the observatories
  • 21. Discoveries Made with CASPER-Enabled Instrumentation Radio Image of a Black Hole Fast Radio Bursts Weighing the Universe Pulsar Timing Gravitational Waves Diamond Planet Protheses Control Neutron Imaging Source: Dan Werthimer, UC Berkeley
  • 23. OpenForceField Uses OPEN Software, OPEN Data, OPEN Science and PRP to Generate Quantum Chemistry Datasets for Druglike Molecules www.openforcefield. OFF Open-Source Models are Used in Drug Discovery, Including in the COVID-19 Computing on Folding@Home.
  • 24. OFF Runs Quantum Mechanical Computations on Many Molecules to Determine Their Optimized Force Fields
  • 25. 50% of OFF compute is run on Nautilus. PRP is Capable of Running Millions of Quantum Chemistry Workloads www.openforcefield.org OpenFF-1.0.0 released OpenFF-2.0.0 released OpenFF begins using Nautilus We run "workers" that pull down QC jobs for computation from a central project queue. These jobs require between minutes and hours, and results are uploaded to the central, public QCArchive server. Workers are deployed from Docker images and scheduled on PRP's Kubernetes system. Due to the short job duration, these deployments can still be effective if interrupted every few hours.
  • 26. OFF Was the Top Nautilus CPU Core Consumer in 2020 & 2021, 4th Highest in 2022 7.6 Million CPU Core-Hours (2020-2022) Peaking at 1300 CPU Cores OFF Datasets Consist of Hundreds to Millions of Jobs, Each Requiring Tens to Thousands of CPU-Hours and 8-32 GB of RAM
  • 27. Dataset listing: https://qcarchive.molssi.org/apps/ml_datasets/ Python example notebooks for data access: https://qcarchive.molssi.org/examples/ OpenFF’s dataset lifecycle: https://github.com/openforcefield/qca-dataset-submission/projects/1 The OFF Datasets on QCArchive are Fully Open!
  • 28. Nautilus Namespace tempredict Utilized PRP to Compute COVID-19 and Vaccine Responses ~65K Participants Purawat et al., IEEE Big Data, 2021 Mason et al., Sci Rep, 2021 Mason et al., Vaccines, 2022 Source: Prof. Benjamin Smarr, UCSD
  • 29. Nautilus Namespace braingeneers: One of the Most Advanced PRP projects - Uses Optical Fiber Connected Shared Storage, CPUs & GPUs https://cenic.org/blog/prp-boosts-inter-campus-collaboration-on-brain-research
  • 30. UCSC/Hengenlab Data Analysis Pipeline Using PRP Hengenlab UWSL PRP/S3 Results PRP Compute CNN Source: David Parks, UCSC; braingeneers PI David Haussler
  • 31. Multiple Worker Processes Circulate Data in a 50GB Cache Sampling Strategy for braingeneers TB+ data PRP/S3 PRP Compute Jobs Local NVMe Model Training Operates on the Local Cache Results are Returned to S3 Source: David Parks, UCSC; braingeneers PI David Haussler
  • 32. UCSC, UCSF & WUSL Are Collaborating To Grow Human Cerebral Organoids and Measure Their Neural Activity Tetrodes Multi Electrode Array Silicon Probes Source: David Parks, UCSC; braingeneers PI David Haussler
  • 33. Goal: For Every Human Brain Slice, Grow 1000 Organoids, And For Every Organoid, Compute 1000 Simulated Organoids From Neural Activity in Living Mouse Brain Human To Neural Activity in Human Brain Organoids Source: David Parks, UCSC; braingeneers PI David Haussler
  • 34. Nautilus Namespace braingeneers Was The 3rd Largest Consumer of CPU Core-Hours in 2022 57,000 GPU-Hours Peaking at 110 GPUs 950,000 CPU Core-Hours Peaking at 2000 CPU Cores https://braingeneers.ucsc.edu/
  • 35. NeuroKube: An Automated Neuroscience Reconstruction Framework Uses Nautilus for Large-Scale Processing & Labeling of Neuroimage Volumes Figures 2, 4, & 5 in “NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework Using Cloud Native Computing and A.I.,” Matthew Madany, et al. (IEEE Big Data ’20, pp. 320-330)
  • 36. Computer Vision-Based Approach Provides the Potential to Automatically Generate Labels Using ML Subset of Neurites from Cerebellum Neuropil Extracted & Rendered in 3D with Structures of Interest Labeled Figures 1 & 14 in “NeuroKube: An Automated and Autoscaling Neuroimaging Reconstruction Framework using Cloud Native Computing and A.I.,” Matthew Madany, et al. (accepted to IEEE Big Data ’20) Volumetric Electron Microscopy (VEM) Data with Colorized Labels
  • 38. NSF-Funded WIFIRE Uses PRP/CENIC to Couple Wireless Edge Sensors With Supercomputers, Enabling Fire Modeling Workflows Landscape data WIFIRE Firemap Fire Perimeter Source: Ilkay Altintas, SDSC Real-Time Meteorological Sensors Weather Forecasts Work Flow PRP
  • 39. WIFIRE’s Firemap Provides Public Website Combining Satellite Fire Detections with GIS SoCal Wildfires Sept 6, 2022
  • 40. PRP is Building on NSF-Funded SAGE Technology to Bring ML/AI to the Edge For Smoke Plume Detection Source: Charlie Catlett, Pete Beckman, Argonne National Lab Source: Ilkay Altinas, SDSC, HDSI Training Data: Archive of 25,000 Labeled Wireless Camera Images of Wildland Fires www.mdpi.com/2072-4292/14/4/1007 PRP namespace digits
  • 41. Nautilus Namespace wifire-quicfire was the 25th Largest 2022 Consumer of CPU Core-Hours; digits was the 14th Largest GPU Consumer wifire-quicfire 108,000 CPU Core-Hours Peaking at 360 CPU Cores digits 40,700 GPU-Hours Peaking at 18 GPUs
  • 43. 2017: PRP 20Gbps Connection of UCSD SunCAVE and UCM WAVE Over CENIC 2018-2019: Added Their 90 GPUs to PRP for Machine Learning Computations Leveraging UCM Campus Funds and NSF CNS-1456638 & CNS-1730158 at UCSD UC Merced WAVE (20 Screens, 20 GPUs) UCSD SunCAVE (70 Screens, 70 GPUs) See These VR Facilities in Action in the PRP Video
  • 44. PRP Has Been Bringing Machine Learning to Building Virtual Worlds, Including Robotics and Autonomous Vehicles • Goal: Train Robots That Can Manipulate Arbitrary Objects o Open Drawer, Turn Faucet, Stack Cube, Pull Chair, Pour Water, Pick And Place, Hang Ropes, Make Dough, … (video) UCSD’s Hao Su Lab Uses NRP GPUs for 3D Deep Learning and Embodied AI (Robotics and Self-Driving Cars)
  • 45. Namespace ucsd-haosulab Consumed the 2nd Most Nautilus GPU-Hours in 2022 (1st is Icecube) 585,170 GPU-Hours Peaking at 150 GPUs
  • 46. A Major Project in UCSD’s Hao Su Lab is Large-Scale Robot Learning • We Build A Digital Twin of The Real World in Virtual Reality (VR) For Object Manipulation • Agents Evolve In VR o Specialists (Neural Nets) Learn Specific Skills by Trial and Error o Generalists (Neural Nets) Distill Knowledge to Solve Arbitrary Tasks • On Nautilus: o Hundreds of specialists have been trained o Each specialist is trained in millions of environment variants o ~10,000 GPU hours per run Source: Prof. Hao Su, UCSD
  • 47. UCSD’s Ravi Group: How to Create Visually Realistic 3D Objects or Dynamic Scenes in VR or the Metaverse Source: Prof. Ravi Ramamoorthi, UCSD Machine Learning Using NRP GPUs Transforms a Series of 2D Images Into a 3D View Synthesis
  • 48. Machine Learning-Based Neural Radiance Fields for View Synthesis (NeRFs) Are Transformational! BY JARED LINDZON NOVEMBER 10, 2022 A neural radiance field (NeRF) is a fully-connected neural network that can generate novel views of complex 3D scenes, based on a partial set of 2D images. https://datagen.tech/guides/synthetic-data/neural-radiance-field-nerf/ Source: Prof. Ravi Ramamoorthi, UCSD https://youtu.be/hvfV-iGwYX8
  • 49. Namespace ucsd-ravigroup Consumed the 3nd Most Nautilus GPU-Hours in 2022 200,000 GPU-Hours Peaking at 122 GPUs • Much of the compute involves training computationally expensive NeRFs. • Training time to learn a representation of a single scene on a GPU can vary from seconds to a day. • NeRFs that can see behind occlusions may require a week of training on 8 GPUs simultaneously. Source: Alexander Trevithick, UCSD Ravi Group
  • 50. 2022-2026 NRP Future: PRP Federates with NSF-Funded Prototype National Research Platform NSF Award OAC #2112167 (June 2021) [$5M Over 5 Years] PI Frank Wuerthwein (UCSD, SDSC) Co-PIs Tajana Rosing (UCSD), Thomas DeFanti (UCSD), Mahidhar Tatineni (SDSC), Derek Weitzel (UNL)

Editor's Notes

  • #24: PRP is used heavily in this force field creation workflow during the QC data generation stage.
  • #25: This is all a pretty tall task, but fortunately it’s mostly a solved problem. We use QCArchive, which is a project out of the molecular software sciences institute at virginia tech. It’s basically a public archive of QM calculations but also includes a lot of infrastructure for generating new data, including talking to different QM engines via a unified interface (QCEngine). Two of the key contributors to the projects (Daniel Smith and Lori Burns) gave a SciPy talk about this project a few years ago. The scale of the project has grown and the backend has been partially rewritten since then, but the talk holds up today. QCArchive handles most of the hard stuff - storing results in a database, running QM calculations with Psi4 - and we built a tool that makes our communication with it a little easier, since there’s some pre-processing we need to do before sending stuff off to QM. This tool is called QCSubmit and pretty elegantly handles the tasks of “I have a bunch of molecules, please run QM calculations on these” and, later, “please go fetch for me QM calculations on these molecules” Some of our calculations are run by grad students and postdocs running “QC managers” as cluster jobs at their respective universities. But something like half of our total compute is run on Nautilus. It’s a uniquely suitable compute backend to pair with the NSF-funded MolSSI QCArchive project, which was designed to take advantage of preemptible compute. Our datasets consist of hundreds to millions of jobs, each requiring tens to thousands of CPU-hours and 8-32 GB of RAM.
  • #33: The current approach in neuroscience will reach diminishing returns because it will not be economically feasible to scale it, or to test enough genetic manipulations to figure out how things actually work. It will remain a cottage industry. Organoids are modular, scalable and experimentally tractable. Human cortical slice image from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3564513/ should be replaced with one of our own if possible, or at least something better