SlideShare a Scribd company logo
HPC Environments for
Leading Edge Simulations
1
Greg Clifford
Manufacturing Segment Manager
clifford@cray.com
Topics:
● Cray, Inc and some customer
examples
● Application scaling examples
● The Manufacturing segment is set
for a significant step forward in
performance
2
Manufacturing
Earth Sciences
Energy Life Sciences
Financial
Services
Cray Industry Solutions
Cray Inc. – 2013
3
Anything That Can Be Simulated Needs a Cray
Computation Analysis Storage/Data
Capacity Focus
(Highly Configurable Solutions)
Supercomputing Solutions to Match the Needs of the Application
Supercomputing Solutions
Cray Inc. – 2013
4
CapabilityFocus
(TightlyIntegratedSolutions)
Cray CS300 Series:
Flexible Performance
Cray XC30 Series:
Scalable Performance
Cray Specializes in Large Systems…
Over 45 PF’s
in XE6 and XK7
Systems
5
Cray Higher-Ed Roundtable, July 22, 2013
New Clothes: NERSC - Edison
10/3/13
6
Cray Higher-Ed Roundtable, July 22, 2013
Running Large Jobs…
NERSC “Now Computing” Snapshot (taken Sept. 4th 2013)
10/3/13 Cray Higher-Ed Roundtable, July 22, 2013
7
WRF Hurricane Sandy Simulation on Blue
Waters
Cray Confidential
8
●  Initial analysis of WRF output is showing some very
striking features of Hurricane Sandy. Level of detail
between a 3km WRF simulation and BW 500meter run is
apparent in these radar reflectivity results
3km WRF results Blue Waters 500meter WRF results
10/3/13
WRF Hurricane Sandy Simulation on Blue
Waters
Cray Confidential
9
10/3/13
Cavity Flow Studies using HECToR (Cray XE6)
S. Lawson, et.al. University of Liverpool
10
●  1.1 Billion grid point model
●  Scaling to 24,000 cores
●  Good agreement between
experiments and CFD
* Ref: http://www.hector.ac.uk/casestudies/ucav.php
11
CTH Shock Physics
CTH is a multi-material, large
deformation, strong shock wave, solid
mechanics code and is one of the most
heavily used computational structural
mechanics codes on DoD HPC
platforms.
“For large models, CTH will show linear scaling to over 10,000 cores.
We have not seen a limit to the scalability of the CTH application”
“A single parametric study can easily consume all of the ORNL
Jaguar resources”
CTH developer
Seismic processing Compute requirements
1212
petaFLOPS
0,1
1
10
1000
100
1995 2000 2005 2010 2015 2020
0,5
Seismic Algorithm complexity
Visco elastic FWI
Petro-elastic inversion
Elastic FWI
Visco elastic modeling
Isotropic/anisotropic FWI
Elastic modeling/RTM
Isotropic/anisotropic RTM
Isotropic/anisotropic modeling
Paraxial isotropic/anisotropic imaging
Asymptotic approximation imaging
A petaflop scale system is required to deliver the capability to
move to a new level of seismic imaging.
One petaflop
13
Compute requirements in CAE
Simulation Fidelity
Robust
Design
Design
Optimization
Design
Exploration
Multiple
runs
Departmental cluster
100 cores
Desktop
16 cores
Single run
Central Compute
Cluster
1000 cores
Supercomputing
Environment
>2000 cores
“Simulation allows engineers to know, not
guess – but only if IT can deliver dramatically
scaled up infrastructure for mega
simulations….
1000’s of cores per mega simulation”
CAE developer
13
CAE Application Workload
14
CFD	
  	
  
(30%)	
  
Structures	
  	
  
(20%)	
  
Impact/Crash
(40%)
Vast majority of large
simulations are MPI parallel
Basically the same codes used across all industries
CAE Workload status
●  ISV codes dominate the CAE commercial workload
●  Many large manufacturing companies have >>10,000
cores HPC systems
●  Even for large organizations very few jobs use more
than 256 MPI ranks
●  There is a huge discrepancy between the scalability in
production at large HPC centers and the commercial
CAE environment
15
Why aren’t commercial CAE
environments leveraging scaling
for better performance
16
Often the full power available is
not being leveraged
10/3/13
17
Innovations in the field of Combustion
c. 2003,
high density,
fast interconnect
Crash & CFD
c. 1983, Cray X-MP, Convex
MSC/NASTRAN
c. 1988, Cray Y-MP, SGI
Crash
18
c.1978
Cray-1, Vector processing
Serial
c. 2007, Extreme scalability
Proprietary interconnect
1000’s cores
Requires “end-to-end parallel”
c. 1983
Cray X-MP, SMP
2-4 cores
c. 1998,
MPI Parallel
“Linux cluster”,
low density, slow interconnect
~100 MPI ranks
c. 2013
Cray XE6
driving apps:
CFD, CEM, ???
Propagation of HPC to commercial CAE
Early adoption Common in Industry
Obstacles to extreme scalability using ISV CAE codes
19
1.  Most CAE environments are configured for capacity computing
—  Difficult to schedule 1000‘s of cores for one simulation
—  Simulation size and complexity driven by available compute resource
—  This will change as compute environments evolve
2.  Application license fees are an issue
—  Application cost can be 2-5 times the hardware costs
—  ISVs are encouraging scalable computing and are adjusting their
licensing models
3.  Applications must deliver “end-to-end” scalability
—  “Amdahl’s Law” requires vast majority of the code to be parallel
—  This includes all of the features in a general purpose ISV code
—  This is an active area of development for CAE ISVs
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Collaboration with Cray
•  Fine-tuned AcuSolve for maximum efficiency on Cray hardware
•  Using Cray MPI libraries
•  Efficient core placement
•  AcuSolve package built specifically for Cray’s Extreme Scalability
Mode(ESM) -- and for the first time shipped in V12.0
•  Extensively tested the code for various Cray systems (XE6, XC30)
20
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Version 12.0: More Scalable
•  Optimized domain decomposition for hybrid mpi/openmp
•  Added MPI performance optimizer
•  Nearly perfect scalability seen down to ~4k nodes/subdomain
Parallel performance
on a Linux Cluster with
IB interconnect
21
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Version 12.0: Larger Problems
•  In V12.0 the capacity of AcuSolve is increased to efficiently solve
problem sizes exceeding 1 billion elements
•  Example: transient, DDES simulation of F1 drafting on ~1 billion
elements
22
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Case studies are real-life engineering problems
All tests are performed in-house by Cray
•  Case Study #1: Aerodynamics of a car model (ASMO)
referred as 70M
•  70 million elements
•  Transient incompressible flow (implicit solve)
Case Studies
23
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
•  Case Study #2: Cabin comforter model referred as
140M
•  140 million elements
•  Steady incompressible flow + heat transfer (implicit solve) + radiation
Case Studies
24
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Performance Results (combined)
140M Cray XC30 (SandyBridge + Aries)
70M Cray XC30 (SandyBridge + Aries)
70M Cray XE6 (AMD + Gemini)
70M Linux cluster (SandyBridge + IB)
Ideal
small core
count 25
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Webinar Conclusions
•  Cray’s XC30 demonstrated the best performance in terms of both
scalability and throughput
•  Parallel performance of XE6 and XC30 interconnect was superior
to IB
•  For small number of cores (approximately less than 750),
AcuSolve parallel performance is satisfactory across multiple
platforms
•  Throughput is mostly affected by core type (e.g. SandyBridge v.s.
Westmere)
26
Summary of Cray Value
1. Extreme fidelity simulations require HPC
performance and extreme scalability is the only
option to achieve this performance
2. Cray systems are designed for large production HPC
environments, whether it is a single simulation
using 10,000 cores or 100 simulations each using
100 cores.
3. The technology is in place for CAE environments to
leverage >>1000s of cores per simulation and we are
over due to see extreme scaling leveraged in
commercial environments
Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.
Questions?
28

More Related Content

What's hot (20)

PDF
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Grigori Fursin
 
PPTX
Virtualized high performance computing with mellanox fdr and ro ce
inside-BigData.com
 
PDF
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
NVIDIA Taiwan
 
PDF
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
KTN
 
PDF
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
NVIDIA Taiwan
 
PDF
Advances in Accelerator-based CFD Simulation
Ansys
 
PDF
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
NVIDIA Taiwan
 
PPTX
HPC Parallel Computing for CFD - Customer Examples (2 of 4)
Ansys
 
PDF
Modeling Lare Deformations Phenomenon with Altair OptiStruct
Altair
 
PPTX
HPC Top 5 Stories: April 26, 2018
NVIDIA
 
PDF
JMI Techtalk: 한재근 - How to use GPU for developing AI
Lablup Inc.
 
PDF
Accelerating open science and AI with automated, portable, customizable and r...
Grigori Fursin
 
PDF
GTC 2017: Powering the AI Revolution
NVIDIA
 
ODP
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
 
PDF
The Coming Age of Extreme Heterogeneity in HPC
inside-BigData.com
 
PDF
FEA Based Design Optimization to Mitigate Anchor Cage Impact Damage Risk
Altair
 
PDF
How to Choose Mobile Workstation? VR Ready
NVIDIA Taiwan
 
PPTX
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
Martin Hamilton
 
PDF
Are You Maximising the Potential of Composite Materials?
Altair
 
PDF
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 
Adapting to a Cambrian AI/SW/HW explosion with open co-design competitions an...
Grigori Fursin
 
Virtualized high performance computing with mellanox fdr and ro ce
inside-BigData.com
 
GTC Taiwan 2017 自主駕駛車輛發展平台與技術研發
NVIDIA Taiwan
 
Implementing AI: High Performance Architectures: A Universal Accelerated Comp...
KTN
 
Ai Forum at Computex 2017 - Keynote Slides by Jensen Huang
NVIDIA Taiwan
 
Advances in Accelerator-based CFD Simulation
Ansys
 
GTC Taiwan 2017 如何在充滿未知的巨量數據時代中建構一個數據中心
NVIDIA Taiwan
 
HPC Parallel Computing for CFD - Customer Examples (2 of 4)
Ansys
 
Modeling Lare Deformations Phenomenon with Altair OptiStruct
Altair
 
HPC Top 5 Stories: April 26, 2018
NVIDIA
 
JMI Techtalk: 한재근 - How to use GPU for developing AI
Lablup Inc.
 
Accelerating open science and AI with automated, portable, customizable and r...
Grigori Fursin
 
GTC 2017: Powering the AI Revolution
NVIDIA
 
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
 
The Coming Age of Extreme Heterogeneity in HPC
inside-BigData.com
 
FEA Based Design Optimization to Mitigate Anchor Cage Impact Damage Risk
Altair
 
How to Choose Mobile Workstation? VR Ready
NVIDIA Taiwan
 
HPC Midlands - Supercomputing for Research and Industry (Hartree Centre prese...
Martin Hamilton
 
Are You Maximising the Potential of Composite Materials?
Altair
 
High Performance Computing (HPC) and Engineering Simulations in the Cloud
Wolfgang Gentzsch
 

Similar to Cray HPC Environments for Leading Edge Simulations (20)

PDF
Deview 2013 rise of the wimpy machines - john mao
NAVER D2
 
PPTX
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Rebekah Rodriguez
 
PDF
스마트 엔지니어링: 제조사를 위한 품질 예측 시뮬레이션 및 인공지능 모델 적용 사례 소개 – 권신중 AWS 솔루션즈 아키텍트, 천준홍 두산...
Amazon Web Services Korea
 
PDF
Lecture 1 Advanced Computer Architecture
MuhammadYasirQadri1
 
PDF
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
PDF
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Edge AI and Vision Alliance
 
PDF
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013
Ignacio M. Llorente
 
PDF
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
OpenNebula Project
 
PDF
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
PDF
Real time machine learning proposers day v3
mustafa sarac
 
PDF
Architecting for Hyper-Scale Datacenter Efficiency
Intel IT Center
 
PDF
Robotics technical Presentation
klepsydratechnologie
 
PPTX
Cloud Roundtable at Microsoft Switzerland
mictc
 
PPTX
Energy efficient AI workload partitioning on multi-core systems
Deepak Shankar
 
PDF
Microsoft Azure in HPC scenarios
mictc
 
PDF
AWS Cloud for HPC and Big Data
inside-BigData.com
 
PDF
CA Spectrum® Just Keeps Getting Better and Better
CA Technologies
 
PDF
Smart Manufacturing: CAE in the Cloud
Wolfgang Gentzsch
 
PDF
Update on Trinity System Procurement and Plans
inside-BigData.com
 
PDF
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
Docker, Inc.
 
Deview 2013 rise of the wimpy machines - john mao
NAVER D2
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Rebekah Rodriguez
 
스마트 엔지니어링: 제조사를 위한 품질 예측 시뮬레이션 및 인공지능 모델 적용 사례 소개 – 권신중 AWS 솔루션즈 아키텍트, 천준홍 두산...
Amazon Web Services Korea
 
Lecture 1 Advanced Computer Architecture
MuhammadYasirQadri1
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
Edge AI and Vision Alliance
 
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013
Ignacio M. Llorente
 
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
OpenNebula Project
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Intel® Software
 
Real time machine learning proposers day v3
mustafa sarac
 
Architecting for Hyper-Scale Datacenter Efficiency
Intel IT Center
 
Robotics technical Presentation
klepsydratechnologie
 
Cloud Roundtable at Microsoft Switzerland
mictc
 
Energy efficient AI workload partitioning on multi-core systems
Deepak Shankar
 
Microsoft Azure in HPC scenarios
mictc
 
AWS Cloud for HPC and Big Data
inside-BigData.com
 
CA Spectrum® Just Keeps Getting Better and Better
CA Technologies
 
Smart Manufacturing: CAE in the Cloud
Wolfgang Gentzsch
 
Update on Trinity System Procurement and Plans
inside-BigData.com
 
DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...
Docker, Inc.
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
PDF
State of ARM-based HPC
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Ad

Recently uploaded (20)

PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Digital Circuits, important subject in CS
contactparinay1
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
SIZING YOUR AIR CONDITIONER---A PRACTICAL GUIDE.pdf
Muhammad Rizwan Akram
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 

Cray HPC Environments for Leading Edge Simulations

  • 1. HPC Environments for Leading Edge Simulations 1 Greg Clifford Manufacturing Segment Manager [email protected]
  • 2. Topics: ● Cray, Inc and some customer examples ● Application scaling examples ● The Manufacturing segment is set for a significant step forward in performance 2
  • 3. Manufacturing Earth Sciences Energy Life Sciences Financial Services Cray Industry Solutions Cray Inc. – 2013 3 Anything That Can Be Simulated Needs a Cray Computation Analysis Storage/Data
  • 4. Capacity Focus (Highly Configurable Solutions) Supercomputing Solutions to Match the Needs of the Application Supercomputing Solutions Cray Inc. – 2013 4 CapabilityFocus (TightlyIntegratedSolutions) Cray CS300 Series: Flexible Performance Cray XC30 Series: Scalable Performance
  • 5. Cray Specializes in Large Systems… Over 45 PF’s in XE6 and XK7 Systems 5 Cray Higher-Ed Roundtable, July 22, 2013
  • 6. New Clothes: NERSC - Edison 10/3/13 6 Cray Higher-Ed Roundtable, July 22, 2013
  • 7. Running Large Jobs… NERSC “Now Computing” Snapshot (taken Sept. 4th 2013) 10/3/13 Cray Higher-Ed Roundtable, July 22, 2013 7
  • 8. WRF Hurricane Sandy Simulation on Blue Waters Cray Confidential 8 ●  Initial analysis of WRF output is showing some very striking features of Hurricane Sandy. Level of detail between a 3km WRF simulation and BW 500meter run is apparent in these radar reflectivity results 3km WRF results Blue Waters 500meter WRF results 10/3/13
  • 9. WRF Hurricane Sandy Simulation on Blue Waters Cray Confidential 9 10/3/13
  • 10. Cavity Flow Studies using HECToR (Cray XE6) S. Lawson, et.al. University of Liverpool 10 ●  1.1 Billion grid point model ●  Scaling to 24,000 cores ●  Good agreement between experiments and CFD * Ref: http://www.hector.ac.uk/casestudies/ucav.php
  • 11. 11 CTH Shock Physics CTH is a multi-material, large deformation, strong shock wave, solid mechanics code and is one of the most heavily used computational structural mechanics codes on DoD HPC platforms. “For large models, CTH will show linear scaling to over 10,000 cores. We have not seen a limit to the scalability of the CTH application” “A single parametric study can easily consume all of the ORNL Jaguar resources” CTH developer
  • 12. Seismic processing Compute requirements 1212 petaFLOPS 0,1 1 10 1000 100 1995 2000 2005 2010 2015 2020 0,5 Seismic Algorithm complexity Visco elastic FWI Petro-elastic inversion Elastic FWI Visco elastic modeling Isotropic/anisotropic FWI Elastic modeling/RTM Isotropic/anisotropic RTM Isotropic/anisotropic modeling Paraxial isotropic/anisotropic imaging Asymptotic approximation imaging A petaflop scale system is required to deliver the capability to move to a new level of seismic imaging. One petaflop
  • 13. 13 Compute requirements in CAE Simulation Fidelity Robust Design Design Optimization Design Exploration Multiple runs Departmental cluster 100 cores Desktop 16 cores Single run Central Compute Cluster 1000 cores Supercomputing Environment >2000 cores “Simulation allows engineers to know, not guess – but only if IT can deliver dramatically scaled up infrastructure for mega simulations…. 1000’s of cores per mega simulation” CAE developer 13
  • 14. CAE Application Workload 14 CFD     (30%)   Structures     (20%)   Impact/Crash (40%) Vast majority of large simulations are MPI parallel Basically the same codes used across all industries
  • 15. CAE Workload status ●  ISV codes dominate the CAE commercial workload ●  Many large manufacturing companies have >>10,000 cores HPC systems ●  Even for large organizations very few jobs use more than 256 MPI ranks ●  There is a huge discrepancy between the scalability in production at large HPC centers and the commercial CAE environment 15 Why aren’t commercial CAE environments leveraging scaling for better performance
  • 16. 16 Often the full power available is not being leveraged
  • 17. 10/3/13 17 Innovations in the field of Combustion
  • 18. c. 2003, high density, fast interconnect Crash & CFD c. 1983, Cray X-MP, Convex MSC/NASTRAN c. 1988, Cray Y-MP, SGI Crash 18 c.1978 Cray-1, Vector processing Serial c. 2007, Extreme scalability Proprietary interconnect 1000’s cores Requires “end-to-end parallel” c. 1983 Cray X-MP, SMP 2-4 cores c. 1998, MPI Parallel “Linux cluster”, low density, slow interconnect ~100 MPI ranks c. 2013 Cray XE6 driving apps: CFD, CEM, ??? Propagation of HPC to commercial CAE Early adoption Common in Industry
  • 19. Obstacles to extreme scalability using ISV CAE codes 19 1.  Most CAE environments are configured for capacity computing —  Difficult to schedule 1000‘s of cores for one simulation —  Simulation size and complexity driven by available compute resource —  This will change as compute environments evolve 2.  Application license fees are an issue —  Application cost can be 2-5 times the hardware costs —  ISVs are encouraging scalable computing and are adjusting their licensing models 3.  Applications must deliver “end-to-end” scalability —  “Amdahl’s Law” requires vast majority of the code to be parallel —  This includes all of the features in a general purpose ISV code —  This is an active area of development for CAE ISVs
  • 20. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Collaboration with Cray •  Fine-tuned AcuSolve for maximum efficiency on Cray hardware •  Using Cray MPI libraries •  Efficient core placement •  AcuSolve package built specifically for Cray’s Extreme Scalability Mode(ESM) -- and for the first time shipped in V12.0 •  Extensively tested the code for various Cray systems (XE6, XC30) 20
  • 21. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Version 12.0: More Scalable •  Optimized domain decomposition for hybrid mpi/openmp •  Added MPI performance optimizer •  Nearly perfect scalability seen down to ~4k nodes/subdomain Parallel performance on a Linux Cluster with IB interconnect 21
  • 22. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Version 12.0: Larger Problems •  In V12.0 the capacity of AcuSolve is increased to efficiently solve problem sizes exceeding 1 billion elements •  Example: transient, DDES simulation of F1 drafting on ~1 billion elements 22
  • 23. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Case studies are real-life engineering problems All tests are performed in-house by Cray •  Case Study #1: Aerodynamics of a car model (ASMO) referred as 70M •  70 million elements •  Transient incompressible flow (implicit solve) Case Studies 23
  • 24. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. •  Case Study #2: Cabin comforter model referred as 140M •  140 million elements •  Steady incompressible flow + heat transfer (implicit solve) + radiation Case Studies 24
  • 25. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Performance Results (combined) 140M Cray XC30 (SandyBridge + Aries) 70M Cray XC30 (SandyBridge + Aries) 70M Cray XE6 (AMD + Gemini) 70M Linux cluster (SandyBridge + IB) Ideal small core count 25
  • 26. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Webinar Conclusions •  Cray’s XC30 demonstrated the best performance in terms of both scalability and throughput •  Parallel performance of XE6 and XC30 interconnect was superior to IB •  For small number of cores (approximately less than 750), AcuSolve parallel performance is satisfactory across multiple platforms •  Throughput is mostly affected by core type (e.g. SandyBridge v.s. Westmere) 26
  • 27. Summary of Cray Value 1. Extreme fidelity simulations require HPC performance and extreme scalability is the only option to achieve this performance 2. Cray systems are designed for large production HPC environments, whether it is a single simulation using 10,000 cores or 100 simulations each using 100 cores. 3. The technology is in place for CAE environments to leverage >>1000s of cores per simulation and we are over due to see extreme scaling leveraged in commercial environments
  • 28. Copyright © 2013 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved. Questions? 28