SlideShare a Scribd company logo
Duncan Poole, NVIDIA
ISC 2018
2
NVIDIA POWERS WORLD'S FASTEST
SUPERCOMPUTER
27,648
Volta Tensor Core GPUs
Summit Becomes First System To Scale The 100 Petaflops Milestone
122 PF 3 EF
HPC AI
3
NVIDIA POWERS FASTEST SUPERCOMPUTERS
IN US, EUROPE, JAPAN, INDUSTRY
17 of World’s 20 Most Energy-efficient Supercomputers
Piz Daint
Europe’s Fastest
5,320 GPUs| 20 PF
ORNL Summit
World’s Fastest
27,648 GPUs| 122 PF
ABCI
Japan’s Fastest
4,352 GPUs| 20 PF
ENI HPC4
Fastest Industrial
3,200 GPUs| 12 PF
LLNL Sierra
US 2nd Fastest
17,280 GPUs| 72 PF
4
ALL TOP 15 APPLICATIONS
ACCELERATED
550+ Applications Accelerated
8X CUDA DOWNLOADS
2018
8M
1M
2012
DEFINING THE NEXT GIANT WAVE IN
HPC
OAK RIDGE SUMMIT
World’s fastest supercomputer
120+ Petaflop HPC; 3+ Exaflop of AI
ABCI Supercomputer (AIST)
Japan’s fastest AI supercomputer
Piz Daint
Europe’s fastest supercomputer
MOST ADOPTED PLATFORM FOR ACCELERATING HPC
259
319
400
470
554
2014 2015 2016 2017 2018
#of GPU-Acc elerat ed Apps
5
NVIDIA SDK & LIBRARIES
INDUSTRY FRAMEWORKS
& APPLICATIONS
CUSTOMER USECASES
SUPERCOMPUTING
+550
Applications
CUDA
NCCLcuDNN TensorRTcuBLAS DeepStreamcuSPARSEcuFFT
Amber
NAMDLAMMPS
CHROMA
ENTERPRISE APPLICATIONSCONSUMER INTERNET
ManufacturingHealthcare EngineeringSpeech Translate Recommender
Molecular
Simulations
Weather
Forecasting
Seismic
Mapping
cuRAND
NVIDIA TESLA PLATFORM
World’s Leading Data Center Platform for Accelerating HPC and AI
TESLA GPUs & SYSTEMS
SYSTEM OEM CLOUDTESLA GPU NVIDIA HGXNVIDIA DGX FAMILY
6
END-TO-END PRODUCT FAMILY
HPC/TRAINING INFERENCE
EMBEDDED
Jetson TX1
DATA CENTER
Tesla P4
AUTOMOTIVE
Drive PX2
Tesla P100Tesla V100Titan V
DATA CENTERDESKTOP
FULLY INTERGRATED DL SUPERCOMPUTER
Tesla V100TITAN V
DESKTOP WORKSTATION DATA
CENTER
Tesla V100
DGX StationTITAN Quadro
DGX-1 DGX-2
V100 PCIE
FULLY INTEGRATED AI SYSTEMS
7
GPU-Accelerated
Server Platform
Dell EM C Fujitsu HPE IBM Lenovo Superm icro
SCX-E4
› 4x V100 NVLINK
• Pow erEdge C4140 • Prim ergy CX400 M4
• Pow er System s
AC922*
• SYS-1028GQ-TVRT
SCX-E3
› 8x V100 PCIE
• Apollo 6500
(XL270d Gen10)
SCX-E2
› 4x V100 PCIE
• Pow erEdge C4140
• Pow erEdge T640
• Pow erEdge R940xa
• Prim ergy CX400 M4 • SD530/D2 • SYS-1029GQ
SCX-E1
› 2x V100 PCIE
• Pow erEdge R840
• Pow erEdge R740xd*
• Pow erEdge R740*
• Prim ergy RX2540 M4 • SD650
HGX-T1
› 8x V100 NVLINK
• Apollo 6500
(XL270d Gen10)
• SYS-4029GP-TVRT
V100 32GB SERVERS AVAILABLE FROM OEMS
Server Catalog
*reduced GPU configuration
8
TESLA V100
Form Factor
Performance 7.8TF DP, 15.7TF SP, 125TF FP16 7TF DP, 14TF SP, 112TF FP16
Memory Size 16GB/32GB HBM2
Memory Bandwidth 900GB/s
GPU Peer to Peer
NVLink (up to 300 GB/s) +
PCIe Gen3 (up to 32 GB/s)
PCIe Gen3 (up to 32 GB/s)
Power 300W 250W
Available From All
Major OEMs
S X M 2 3 2 G B P / N = 9 0 0 - 2 G 5 0 3 - 0 0 1 0 - 0 0 0 , P C IE 3 2 G B P / N = 9 0 0 - 2 G 5 0 0 - 0 0 1 0 - 0 0 0 , S X M 2 1 6 G B P / N = 9 0 0 - 2 G 5 0 3 - 0 0 0 0 - 0 0 0 , P C IE 1 6 G B P / N = 9 0 0 - 2 G 5 0 0 - 0 0 0 0 - 0 0 0
SXM2 PCIe
9
NVIDIA DGX-2 AND HGX-2
10
NVSWITCH
WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH
7.2 Terabits/sec or 900 GB/sec
18 NVLINK ports | 50GB/s per port bi-directional
Fully-connected crossbar
2 billion transistors | 47.5mm x 47.5mm package
11
NVSWITCH
ENABLES THE WORLD’S LARGEST GPU
16 Tesla V100 32GB Connected by New NVSwitch
2 petaFLOPS of DL Compute
Unified 512GB HBM2 GPU Memory Space
300GB/sec Every GPU-to-GPU
2.4TB/sec of Total Cross-section Bandwidth
12
THE LARGEST, FASTEST SHARED
MEMORY SUPERNODE FOR THE MOST
DIFFICULT HPC CHALLENGES
• 125 TFLOPS DP, 250 TF SP
• 512 GB shared memory
• 14.4 TB/s aggregate HBM BW
• 2.4 TB/s bisection BW
• 8x EDR network
• 30 TB SSD
INTRODUCING
NVIDIA DGX-2
THE WORLD’S MOST
POWERFUL HPC
SUPERNODE
13
INSIDE DGX-2: “WORLD’S LARGEST GPU”
1
2
3
5
4
6 Two Intel Xeon Platinum CPUs
7 1.5 TB System Memory
13
30 TB NVME SSDs
Internal Storage
NVIDIA Tesla V100 32GB
Two GPU Boards
8 V100 32GB GPUs per board
6 NVSwitches per board
512GB Total HBM2 Memory
interconnected by
Plane Card
Twelve NVSwitches
2.4 TB/sec bi-section
bandwidth
Eight EDR Infiniband/100 GigE
1600 Gb/sec Total
Bi-directional Bandwidth
PCIe Switch Complex
8
14
OVER 2X HIGHER PERFORMANCE WITH NVSWITCH
Two DGX-1 Compared to DGX-2
2 H G X - 1 V s e r v e r s h a v e d u a l s o c k e t X e o n E 5 2 6 9 8 v 4 P r o c e s s o r . 8 x V 1 0 0 G P U s . S e r v e r s c o n n e c t e d v ia 4 X 1 0 0 G b IB p o r t s ( r u n o n D G X - 1 ) | H G X - 2 s e r v e r h a s d u a l- s o c k e t X e o n P la t in u m 8 1 6 8 P r o c e s s o r . 1 6 V 1 0 0 G P U s ( r u n o n D G X - 2 )
Physics
(MILC benchmark)
4D Grid
Weather
(ECMWF benchmark)
All-to-all
Recommender
(Sparse Embedding)
Reduce & Broadcast
Language Model
(Transformer with MoE)
All-to-all
DGX-2 with NVSwitchTwo DGX-1 (Volta)
2X FASTER 2.4X FASTER 2X FASTER 2.7X FASTER
AI TrainingHPC
15
TESLA HGX-2
FUSING HPC AND AI INTO
ONE UNIFIED COMPUTING ARCHITECTURE
Multi-precision Computing
2 PFLOPS AI | 250 TFLOPS FP32 | 125 TFLOPS FP64
16 Tesla V100 GPUs | 0.5TB Memory | 2.4 TB/s
Building Block for Partner Systems & DGX-2
16
NVIDIA SOFTWARE PLATFORM UPDATES
17
NVIDIA GPU CLOUD (NGC)
Simple Access to GPU-Accelerated Software
Cloud Servers
Workstations
Deploy Applications In
Minutes, Not Days
Discover 35 Optimized
Containers
Run Anywhere with Maximum
Performance
GPU-Powered
Accelerate
Time to Market
18
CONTAINERS SIMPLIFY APPLICATION DEPLOYMENTS
DRIVERS + OPERATING SYSTEM
CONTAINER RUNTIME
NAMD 2.12
CUDA
libraries
VMD
CUDA
libraries
GROMACS
CUDA
libraries
NAMD 2.13
CUDA
libraries
Environment modules simplified/eliminated
Performance equivalent to bare metal
Deploy applications in minutes
Higher productivity for sys admins & users
SHARED CLUSTER
Portable on various systems Reproducible results
19
NGC CONTAINER REGISTRY
10 Containers at Launch, 35 Containers Today
bigdft
candle
chroma
gamess
gromacs
lammps
lattice-microbes
MILC
namd
pgi
picongpu
relion
vmd
caffe
caffe2
cntk
cuda
digits
inferenceserver
mxnet
pytorch
tensorflow
tensorrt
theano
torch
index
paraview-holodeck
paraview-index
paraview-optix
chainer
h20ai-driverless
kinetica
mapd
paddlepaddle
Deep Learning HPC HPC Visualization PartnersNVIDIA/K8s
Kubernetes
on NVIDIA GPUs
*NewContainers since SC 17
20
CUDA TOOLKIT 9.2
Optimized for Volta:
• Tensor Cores
• Second-Generation NVLink
• HBM2 Stacked Memory
UNLEASHES POWER OF VOLTA
COOPERATIVE THREAD GROUPS
Flexible Thread Groups
Efficient Parallel Algorithms
• Synchronize Across
Thread Blocks in a Single
GPU or Multi-GPUs
• RNN and CNN Optimizations (cuBLAS)
• >20x Faster Image Processing (NPP)
• Speed up FFT of prime size matrices
(cuFFT)
FASTER LIBRARIES
DEVELOPER TOOLS & PLATFORM UPDATES
• CUTLASS 1.0 accelerate custom
linear algebra algorithms
• 2x faster CUDA kernel launch
• New OS & Compiler Support
• Unified Memory Profiling
• NVLink Visualization
21
CUDA 9.2 PLATFORM SUPPORT
New OS and Host Compilers
PLATFORM OS VERSION COMPILERS
Windows Windows Server 2016
2012 R2
Microsoft
Visual Studio 2017 (15.6)
Linux
16.04.4 LTS
17.10 non
GCC 7.x
PGI 18.x
Clang 5.0.x
ICC 17
XLC 13.1.6 (POWER)
7.5
7.5 POWER LE
SLES 12 SP3
27
Leap 42.3
Mac macOS 10.13.4 Xcode 9.2
22
silica IFPEN, RMM-DIIS on P100
OPENACC GROWING MOMENTUM
Wide Adoption Across Key HPC Codes
ANSYS Fluent
Gaussian
VASP
LSDalton
MPAS
GAMERA
GTC
XGC
ACME
FLASH
COSMO
Numeca
Over 100 Apps* Using OpenACC
Prof. Georg Kresse
Computational Materials Physics
University of Vienna
For VASP, OpenACC is the way forward for GPU
acceleration. Performance is similar to CUDA, and
OpenACC dramatically decreases GPU
development and maintenance efforts. We’re
excited to collaborate with NVIDIA and PGI as an
early adopter of Unified Memory.
VASP
Top Quantum Chemistry and Material Science Code
* Applications in production and development
23
DCGM
Active Health Monitoring
• Run-time health checks
• Prologue check: Quick health check of
the GPU
• Epilogue check: online GPU diagnostic
tests to determine root cause issues
NVIDIA Data Center GPU
Manager
developer.nvidia.com/cuda-toolkit
Diagnostics & System Validation
• GPU Compute Performance
• Interconnect BW & Latency
• Power & Thermals
Policy Framework
• Assists in Recovery Action Automation
• Group Control over Power & Clock Policy
• Dynamic Page Retirement Policy
Latest HPC News from NVIDIA

More Related Content

What's hot (20)

PDF
NVIDIA at Computex 2019
NVIDIA
 
PDF
計算力学シミュレーションに GPU は役立つのか?
Shinnosuke Furuya
 
PDF
Experiences with Power 9 at A*STAR CRC
Ganesan Narayanasamy
 
PDF
Cuda 6 performance_report
Michael Zhang
 
PDF
RAPIDS Overview
NVIDIA Japan
 
PDF
한컴MDS_NVIDIA Jetson Platform
HANCOM MDS
 
PDF
A Look Inside Google’s Data Center Networks
Ryousei Takano
 
PDF
SGI HPC DAY 2011 Kiev
Volodymyr Saviak
 
PPTX
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD
 
PDF
SGI HPC Update for June 2013
inside-BigData.com
 
PDF
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
NVIDIA Taiwan
 
PPTX
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Akihiro Hayashi
 
PDF
GTC Taiwan 2017 企業端深度學習與人工智慧應用
NVIDIA Taiwan
 
PDF
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
NTT Communications Technology Development
 
PDF
Artificial intelligence on the Edge
Usman Qayyum
 
PDF
100Gbps OpenStack For Providing High-Performance NFV
NTT Communications Technology Development
 
PDF
Vacuum more efficient than ever
Masahiko Sawada
 
PDF
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
PDF
Nvidia tesla-k80-overview
Communication Progress
 
PDF
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Danny Abukalam
 
NVIDIA at Computex 2019
NVIDIA
 
計算力学シミュレーションに GPU は役立つのか?
Shinnosuke Furuya
 
Experiences with Power 9 at A*STAR CRC
Ganesan Narayanasamy
 
Cuda 6 performance_report
Michael Zhang
 
RAPIDS Overview
NVIDIA Japan
 
한컴MDS_NVIDIA Jetson Platform
HANCOM MDS
 
A Look Inside Google’s Data Center Networks
Ryousei Takano
 
SGI HPC DAY 2011 Kiev
Volodymyr Saviak
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD
 
SGI HPC Update for June 2013
inside-BigData.com
 
Recent Progress in SCCS on GPU Simulation of Biomedical and Hydrodynamic Prob...
NVIDIA Taiwan
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
Akihiro Hayashi
 
GTC Taiwan 2017 企業端深度學習與人工智慧應用
NVIDIA Taiwan
 
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
NTT Communications Technology Development
 
Artificial intelligence on the Edge
Usman Qayyum
 
100Gbps OpenStack For Providing High-Performance NFV
NTT Communications Technology Development
 
Vacuum more efficient than ever
Masahiko Sawada
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
Kohei KaiGai
 
Nvidia tesla-k80-overview
Communication Progress
 
Bruno Silva - eMedLab: Merging HPC and Cloud for Biomedical Research
Danny Abukalam
 

Similar to Latest HPC News from NVIDIA (20)

PDF
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
PDF
Tesla Accelerated Computing Platform
inside-BigData.com
 
PDF
GTC 2017: Powering the AI Revolution
NVIDIA
 
PDF
NVIDIA Tesla V100 GPU Architecture Whitepaper : NOTES
Subhajit Sahu
 
PDF
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA Taiwan
 
PDF
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA
 
PPTX
Building the World's Largest GPU
Renee Yao
 
PDF
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
MuhammadAbdullah311866
 
PDF
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
PPTX
HPC Top 5 Stories: January 12, 2018
NVIDIA
 
PPTX
HPC Top 5 Stories: Nov. 21, 2016
NVIDIA
 
PDF
Talk on commercialising space data
Alison B. Lowndes
 
PDF
Accelerated Computing: The Path Forward
NVIDIA
 
PDF
GTC 2018: A New AI Era Dawns
NVIDIA
 
PPTX
Presentation (1).pptx
AryanDhage1
 
PDF
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
PDF
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
Sri Ambati
 
PPTX
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
PPTX
Tesla personal super computer
Priya Manik
 
PDF
Gpu Systems
jpaugh
 
Hardware & Software Platforms for HPC, AI and ML
inside-BigData.com
 
Tesla Accelerated Computing Platform
inside-BigData.com
 
GTC 2017: Powering the AI Revolution
NVIDIA
 
NVIDIA Tesla V100 GPU Architecture Whitepaper : NOTES
Subhajit Sahu
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA Taiwan
 
NVIDIA CEO Jensen Huang Presentation at Supercomputing 2019
NVIDIA
 
Building the World's Largest GPU
Renee Yao
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
MuhammadAbdullah311866
 
組み込みから HPC まで ARM コアで実現するエコシステム
Shinnosuke Furuya
 
HPC Top 5 Stories: January 12, 2018
NVIDIA
 
HPC Top 5 Stories: Nov. 21, 2016
NVIDIA
 
Talk on commercialising space data
Alison B. Lowndes
 
Accelerated Computing: The Path Forward
NVIDIA
 
GTC 2018: A New AI Era Dawns
NVIDIA
 
Presentation (1).pptx
AryanDhage1
 
GIST AI-X Computing Cluster
Jax Jargalsaikhan
 
H2O World 2017 Keynote - Jim McHugh, VP & GM of Data Center, NVIDIA
Sri Ambati
 
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
Tesla personal super computer
Priya Manik
 
Gpu Systems
jpaugh
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
PDF
State of ARM-based HPC
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Ad

Recently uploaded (20)

PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
July Patch Tuesday
Ivanti
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Python basic programing language for automation
DanialHabibi2
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
July Patch Tuesday
Ivanti
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 

Latest HPC News from NVIDIA

  • 2. 2 NVIDIA POWERS WORLD'S FASTEST SUPERCOMPUTER 27,648 Volta Tensor Core GPUs Summit Becomes First System To Scale The 100 Petaflops Milestone 122 PF 3 EF HPC AI
  • 3. 3 NVIDIA POWERS FASTEST SUPERCOMPUTERS IN US, EUROPE, JAPAN, INDUSTRY 17 of World’s 20 Most Energy-efficient Supercomputers Piz Daint Europe’s Fastest 5,320 GPUs| 20 PF ORNL Summit World’s Fastest 27,648 GPUs| 122 PF ABCI Japan’s Fastest 4,352 GPUs| 20 PF ENI HPC4 Fastest Industrial 3,200 GPUs| 12 PF LLNL Sierra US 2nd Fastest 17,280 GPUs| 72 PF
  • 4. 4 ALL TOP 15 APPLICATIONS ACCELERATED 550+ Applications Accelerated 8X CUDA DOWNLOADS 2018 8M 1M 2012 DEFINING THE NEXT GIANT WAVE IN HPC OAK RIDGE SUMMIT World’s fastest supercomputer 120+ Petaflop HPC; 3+ Exaflop of AI ABCI Supercomputer (AIST) Japan’s fastest AI supercomputer Piz Daint Europe’s fastest supercomputer MOST ADOPTED PLATFORM FOR ACCELERATING HPC 259 319 400 470 554 2014 2015 2016 2017 2018 #of GPU-Acc elerat ed Apps
  • 5. 5 NVIDIA SDK & LIBRARIES INDUSTRY FRAMEWORKS & APPLICATIONS CUSTOMER USECASES SUPERCOMPUTING +550 Applications CUDA NCCLcuDNN TensorRTcuBLAS DeepStreamcuSPARSEcuFFT Amber NAMDLAMMPS CHROMA ENTERPRISE APPLICATIONSCONSUMER INTERNET ManufacturingHealthcare EngineeringSpeech Translate Recommender Molecular Simulations Weather Forecasting Seismic Mapping cuRAND NVIDIA TESLA PLATFORM World’s Leading Data Center Platform for Accelerating HPC and AI TESLA GPUs & SYSTEMS SYSTEM OEM CLOUDTESLA GPU NVIDIA HGXNVIDIA DGX FAMILY
  • 6. 6 END-TO-END PRODUCT FAMILY HPC/TRAINING INFERENCE EMBEDDED Jetson TX1 DATA CENTER Tesla P4 AUTOMOTIVE Drive PX2 Tesla P100Tesla V100Titan V DATA CENTERDESKTOP FULLY INTERGRATED DL SUPERCOMPUTER Tesla V100TITAN V DESKTOP WORKSTATION DATA CENTER Tesla V100 DGX StationTITAN Quadro DGX-1 DGX-2 V100 PCIE FULLY INTEGRATED AI SYSTEMS
  • 7. 7 GPU-Accelerated Server Platform Dell EM C Fujitsu HPE IBM Lenovo Superm icro SCX-E4 › 4x V100 NVLINK • Pow erEdge C4140 • Prim ergy CX400 M4 • Pow er System s AC922* • SYS-1028GQ-TVRT SCX-E3 › 8x V100 PCIE • Apollo 6500 (XL270d Gen10) SCX-E2 › 4x V100 PCIE • Pow erEdge C4140 • Pow erEdge T640 • Pow erEdge R940xa • Prim ergy CX400 M4 • SD530/D2 • SYS-1029GQ SCX-E1 › 2x V100 PCIE • Pow erEdge R840 • Pow erEdge R740xd* • Pow erEdge R740* • Prim ergy RX2540 M4 • SD650 HGX-T1 › 8x V100 NVLINK • Apollo 6500 (XL270d Gen10) • SYS-4029GP-TVRT V100 32GB SERVERS AVAILABLE FROM OEMS Server Catalog *reduced GPU configuration
  • 8. 8 TESLA V100 Form Factor Performance 7.8TF DP, 15.7TF SP, 125TF FP16 7TF DP, 14TF SP, 112TF FP16 Memory Size 16GB/32GB HBM2 Memory Bandwidth 900GB/s GPU Peer to Peer NVLink (up to 300 GB/s) + PCIe Gen3 (up to 32 GB/s) PCIe Gen3 (up to 32 GB/s) Power 300W 250W Available From All Major OEMs S X M 2 3 2 G B P / N = 9 0 0 - 2 G 5 0 3 - 0 0 1 0 - 0 0 0 , P C IE 3 2 G B P / N = 9 0 0 - 2 G 5 0 0 - 0 0 1 0 - 0 0 0 , S X M 2 1 6 G B P / N = 9 0 0 - 2 G 5 0 3 - 0 0 0 0 - 0 0 0 , P C IE 1 6 G B P / N = 9 0 0 - 2 G 5 0 0 - 0 0 0 0 - 0 0 0 SXM2 PCIe
  • 10. 10 NVSWITCH WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH 7.2 Terabits/sec or 900 GB/sec 18 NVLINK ports | 50GB/s per port bi-directional Fully-connected crossbar 2 billion transistors | 47.5mm x 47.5mm package
  • 11. 11 NVSWITCH ENABLES THE WORLD’S LARGEST GPU 16 Tesla V100 32GB Connected by New NVSwitch 2 petaFLOPS of DL Compute Unified 512GB HBM2 GPU Memory Space 300GB/sec Every GPU-to-GPU 2.4TB/sec of Total Cross-section Bandwidth
  • 12. 12 THE LARGEST, FASTEST SHARED MEMORY SUPERNODE FOR THE MOST DIFFICULT HPC CHALLENGES • 125 TFLOPS DP, 250 TF SP • 512 GB shared memory • 14.4 TB/s aggregate HBM BW • 2.4 TB/s bisection BW • 8x EDR network • 30 TB SSD INTRODUCING NVIDIA DGX-2 THE WORLD’S MOST POWERFUL HPC SUPERNODE
  • 13. 13 INSIDE DGX-2: “WORLD’S LARGEST GPU” 1 2 3 5 4 6 Two Intel Xeon Platinum CPUs 7 1.5 TB System Memory 13 30 TB NVME SSDs Internal Storage NVIDIA Tesla V100 32GB Two GPU Boards 8 V100 32GB GPUs per board 6 NVSwitches per board 512GB Total HBM2 Memory interconnected by Plane Card Twelve NVSwitches 2.4 TB/sec bi-section bandwidth Eight EDR Infiniband/100 GigE 1600 Gb/sec Total Bi-directional Bandwidth PCIe Switch Complex 8
  • 14. 14 OVER 2X HIGHER PERFORMANCE WITH NVSWITCH Two DGX-1 Compared to DGX-2 2 H G X - 1 V s e r v e r s h a v e d u a l s o c k e t X e o n E 5 2 6 9 8 v 4 P r o c e s s o r . 8 x V 1 0 0 G P U s . S e r v e r s c o n n e c t e d v ia 4 X 1 0 0 G b IB p o r t s ( r u n o n D G X - 1 ) | H G X - 2 s e r v e r h a s d u a l- s o c k e t X e o n P la t in u m 8 1 6 8 P r o c e s s o r . 1 6 V 1 0 0 G P U s ( r u n o n D G X - 2 ) Physics (MILC benchmark) 4D Grid Weather (ECMWF benchmark) All-to-all Recommender (Sparse Embedding) Reduce & Broadcast Language Model (Transformer with MoE) All-to-all DGX-2 with NVSwitchTwo DGX-1 (Volta) 2X FASTER 2.4X FASTER 2X FASTER 2.7X FASTER AI TrainingHPC
  • 15. 15 TESLA HGX-2 FUSING HPC AND AI INTO ONE UNIFIED COMPUTING ARCHITECTURE Multi-precision Computing 2 PFLOPS AI | 250 TFLOPS FP32 | 125 TFLOPS FP64 16 Tesla V100 GPUs | 0.5TB Memory | 2.4 TB/s Building Block for Partner Systems & DGX-2
  • 17. 17 NVIDIA GPU CLOUD (NGC) Simple Access to GPU-Accelerated Software Cloud Servers Workstations Deploy Applications In Minutes, Not Days Discover 35 Optimized Containers Run Anywhere with Maximum Performance GPU-Powered Accelerate Time to Market
  • 18. 18 CONTAINERS SIMPLIFY APPLICATION DEPLOYMENTS DRIVERS + OPERATING SYSTEM CONTAINER RUNTIME NAMD 2.12 CUDA libraries VMD CUDA libraries GROMACS CUDA libraries NAMD 2.13 CUDA libraries Environment modules simplified/eliminated Performance equivalent to bare metal Deploy applications in minutes Higher productivity for sys admins & users SHARED CLUSTER Portable on various systems Reproducible results
  • 19. 19 NGC CONTAINER REGISTRY 10 Containers at Launch, 35 Containers Today bigdft candle chroma gamess gromacs lammps lattice-microbes MILC namd pgi picongpu relion vmd caffe caffe2 cntk cuda digits inferenceserver mxnet pytorch tensorflow tensorrt theano torch index paraview-holodeck paraview-index paraview-optix chainer h20ai-driverless kinetica mapd paddlepaddle Deep Learning HPC HPC Visualization PartnersNVIDIA/K8s Kubernetes on NVIDIA GPUs *NewContainers since SC 17
  • 20. 20 CUDA TOOLKIT 9.2 Optimized for Volta: • Tensor Cores • Second-Generation NVLink • HBM2 Stacked Memory UNLEASHES POWER OF VOLTA COOPERATIVE THREAD GROUPS Flexible Thread Groups Efficient Parallel Algorithms • Synchronize Across Thread Blocks in a Single GPU or Multi-GPUs • RNN and CNN Optimizations (cuBLAS) • >20x Faster Image Processing (NPP) • Speed up FFT of prime size matrices (cuFFT) FASTER LIBRARIES DEVELOPER TOOLS & PLATFORM UPDATES • CUTLASS 1.0 accelerate custom linear algebra algorithms • 2x faster CUDA kernel launch • New OS & Compiler Support • Unified Memory Profiling • NVLink Visualization
  • 21. 21 CUDA 9.2 PLATFORM SUPPORT New OS and Host Compilers PLATFORM OS VERSION COMPILERS Windows Windows Server 2016 2012 R2 Microsoft Visual Studio 2017 (15.6) Linux 16.04.4 LTS 17.10 non GCC 7.x PGI 18.x Clang 5.0.x ICC 17 XLC 13.1.6 (POWER) 7.5 7.5 POWER LE SLES 12 SP3 27 Leap 42.3 Mac macOS 10.13.4 Xcode 9.2
  • 22. 22 silica IFPEN, RMM-DIIS on P100 OPENACC GROWING MOMENTUM Wide Adoption Across Key HPC Codes ANSYS Fluent Gaussian VASP LSDalton MPAS GAMERA GTC XGC ACME FLASH COSMO Numeca Over 100 Apps* Using OpenACC Prof. Georg Kresse Computational Materials Physics University of Vienna For VASP, OpenACC is the way forward for GPU acceleration. Performance is similar to CUDA, and OpenACC dramatically decreases GPU development and maintenance efforts. We’re excited to collaborate with NVIDIA and PGI as an early adopter of Unified Memory. VASP Top Quantum Chemistry and Material Science Code * Applications in production and development
  • 23. 23 DCGM Active Health Monitoring • Run-time health checks • Prologue check: Quick health check of the GPU • Epilogue check: online GPU diagnostic tests to determine root cause issues NVIDIA Data Center GPU Manager developer.nvidia.com/cuda-toolkit Diagnostics & System Validation • GPU Compute Performance • Interconnect BW & Latency • Power & Thermals Policy Framework • Assists in Recovery Action Automation • Group Control over Power & Clock Policy • Dynamic Page Retirement Policy