SlideShare a Scribd company logo
Programming Trends in
High Performance Computing
2016
Juris Vencels
* Tianhe-2 (China)
fastest in the world since June 2013
2
About Me
B.Sc - Physics @ University of Latvia
* Modeling of plasma processes in
magnetron sputtering systems (Sidrabe, Inc.)
M.Sc - Electrophysics @ KTH (Sweden)
* Research engineer at EPiGRAM project
(PDC Center for High Performance Comp.)
Intern @ Los Alamos National Lab (USA)
* Development of spectral codes for plasma
physics problems (instabilities, turbulence)
3
* LINPACK Benchmark – solves Ax = b
* FLOPs - floating-point operations per second
* Kilo –> Mega –> Giga –> Tera –> Peta -> Exa
Exascale – very common name in HPC
EXAFLOPS
Tianhe-2
4
CPUs:
Intel Xeon E5-2692v2 12C 2.2GHz
Accelerators:
Intel Xeon Phi 31S1P 57C 1.1GHz
32’000 x
48’000 x
3’120’000 cores
Linpack: 34 PFLOPS
Theoretical: 55 PFLOPS
17.6 MW (24 MW with cooling)
US$390 million
Tianhe-2
- heterogeneous
* CPUs - few fast cores
* Accelerators – many slow cores
- Accelerators outperform CPUs in
* FLOPs/$
* FLOPs/Watts
Future of HPC – heterogeneous hardware
5
Exascale ProGRAmming  Models
www.epigram-project.eu
Extrapolation of current technology to
Exascale would result in
* Codes that do not scale efficiently
* Large risk of hardware failure
* High power consumption
* Expensive hardware
The project mostly aims to solve the 1st problem
6
EPiGRAM project
- Test experimental programming models in practice
- IPIC3D for plasma physics
* implicit Particle-In-Cell, fully electromagnetic
* magnetic reconnection, magnetosphere, instabilities
* C++, MPI+OpenMP
- Nek5000 for incompressible fluids
* spectral elements
* fluid-dynamics in fission nuclear reactor
* Fortran, MPI
7
Existing parallel programming APIs
Widely used
* MPI - Message Passing Interface
* OpenMP - Open Multi-Processing
* CUDA - programming interface for NVIDIA GPUs
Application dependent or experimental
* GPI-2 - Partitioned Global Address Space
* OpenACC - Open Accelerators
* OpenCL, Coarray Fortran, Chapel, Cilk, TBB, ...
8
MPI - Message Passing Interface
- Distributed memory model
- MPI 3.x provides some shared memory
mechanisms
Implementations
* free: MPICH, Open MPI, ...
* prop: Intel, Cray, ...
9
OpenMP - Open Multi-Processing
- Shared memory model
- MPI + OpenMP
- Race condition
- Compilers supporting OpenMP
* free: GNU, ...
* prop: Intel, Cray, ...
Intel Xeon Phi 7120P
61 cores
1.24 GHz
16GB
~$2000
10
CUDA
- Programming interface for NVIDIA GPUs
- MPI + CUDA
- Hard to code & debug
- Small memory/core
- Slow CPU GPU data transfer↔
NVIDIA Tesla K80
4992 CUDA cores
573-875 MHz
24GB
~$4000
11
PGAS - Partitioned Global Address Space
- Abstract shared address space
- Standards: GASPI, Coarray Fortran, Chapel, …
EPiGRAM focused on GASPI implementation GPI-2 from
* scalable, asynchronous, fault tolerant
* proprietary €
6 24 96 384 1536
0.00
1.00
2.00
3.00
4.00
5.00
6.00
iPIC3D particle communication time (s)
GPI2
MPI
# of cores
12
One sided communication
MPI vs GASPI
13
OpenACC - Open Accelerators
- Compiler directives (pragmas) for CPU+GPU systems
- Higher level than CUDA, easier to use
- Similar to OpenMP
- Compilers:
* free: OpenUH
* prop: PGI, Cray, CAPS
14
Debugging parallel applications
* free: Valgrind
* prop: TotalView, Allinea DDT, Intel Inspector
My choice
* DDT - critical bugs
* Intel Insp. - memory leaks
15
Profiling parallel applications
* free: Valgrind
* prop: Allinea MAP, Intel Vtune, Vampir
My choice: Allinea MAP – simply compile the code with ‘-g’ option and run
16
Conclusions
- HPC is moving towards heterogeneous hardware
- Future codes will exploit high degree of parallelism
- Petascale computer in 2008, Exascale in ~2020
- Most likely MPI will be present in Exascale (MPI+x)
- Tolerance to hardware failures
- Power consumption must be below 20MW
Thank you!
Questions?

More Related Content

What's hot (20)

PDF
Building Network Functions with eBPF & BCC
Kernel TLV
 
PDF
netfilter and iptables
Kernel TLV
 
PDF
Learning Erlang (from a Prolog dropout's perspective)
elliando dias
 
PDF
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
PDF
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
PPT
PFQ@ PAM12
Nicola Bonelli
 
PPT
PF_DIRECT@TMA12
Nicola Bonelli
 
PDF
P4, EPBF, and Linux TC Offload
Open-NFP
 
PDF
Functional approach to packet processing
Nicola Bonelli
 
PDF
BPF - in-kernel virtual machine
Alexei Starovoitov
 
PDF
FreeBSD and Drivers
Kernel TLV
 
PPTX
Linux Network Stack
Adrien Mahieux
 
PDF
Comprehensive XDP Off‌load-handling the Edge Cases
Netronome
 
PDF
Deep Learning on ARM Platforms - SFO17-509
Linaro
 
PDF
VLANs in the Linux Kernel
Kernel TLV
 
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
PDF
Exploring the Programming Models for the LUMI Supercomputer
George Markomanolis
 
PDF
Fun with Network Interfaces
Kernel TLV
 
PDF
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Anne Nicolas
 
PPTX
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
oholiab
 
Building Network Functions with eBPF & BCC
Kernel TLV
 
netfilter and iptables
Kernel TLV
 
Learning Erlang (from a Prolog dropout's perspective)
elliando dias
 
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
George Markomanolis
 
PFQ@ PAM12
Nicola Bonelli
 
PF_DIRECT@TMA12
Nicola Bonelli
 
P4, EPBF, and Linux TC Offload
Open-NFP
 
Functional approach to packet processing
Nicola Bonelli
 
BPF - in-kernel virtual machine
Alexei Starovoitov
 
FreeBSD and Drivers
Kernel TLV
 
Linux Network Stack
Adrien Mahieux
 
Comprehensive XDP Off‌load-handling the Edge Cases
Netronome
 
Deep Learning on ARM Platforms - SFO17-509
Linaro
 
VLANs in the Linux Kernel
Kernel TLV
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
Exploring the Programming Models for the LUMI Supercomputer
George Markomanolis
 
Fun with Network Interfaces
Kernel TLV
 
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Anne Nicolas
 
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
oholiab
 

Viewers also liked (11)

DOCX
nikkies resumes 2016
Sheneque Davis
 
PPTX
SAP BASIS Training in Chennai
Thecreating Experts
 
PPTX
Beneficios que proporciona el consumo de plátanos para personas con enfermeda...
Candy Zapata Caballero
 
PPTX
Johnson ESS Digital Sociology 031516(S)
W. Michael Johnson
 
DOCX
40 نکته جهت افزایش انگیزه دانش آموزان
reza kaboli
 
PDF
Olivia Hosie - Speaker Score Letter HR Tech Fest 2016
Olivia Hosie
 
PPT
Malware & Safe Browsing
jgswift
 
PDF
Бережливое производство 2016
Sergey Tsvetaev
 
PDF
China's Bubble Cambridge Lecture MT
MoTanweer
 
PPTX
Leveraging SAP, Hadoop, and Big Data to Redefine Business
DataWorks Summit
 
PDF
TLE-HE Travel Services Curriculum Guide
Dr. Joy Kenneth Sala Biasong
 
nikkies resumes 2016
Sheneque Davis
 
SAP BASIS Training in Chennai
Thecreating Experts
 
Beneficios que proporciona el consumo de plátanos para personas con enfermeda...
Candy Zapata Caballero
 
Johnson ESS Digital Sociology 031516(S)
W. Michael Johnson
 
40 نکته جهت افزایش انگیزه دانش آموزان
reza kaboli
 
Olivia Hosie - Speaker Score Letter HR Tech Fest 2016
Olivia Hosie
 
Malware & Safe Browsing
jgswift
 
Бережливое производство 2016
Sergey Tsvetaev
 
China's Bubble Cambridge Lecture MT
MoTanweer
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
DataWorks Summit
 
TLE-HE Travel Services Curriculum Guide
Dr. Joy Kenneth Sala Biasong
 
Ad

Similar to Programming Trends in High Performance Computing (20)

PDF
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
inside-BigData.com
 
PDF
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
inside-BigData.com
 
PPTX
Communication Frameworks for HPC and Big Data
inside-BigData.com
 
PPTX
Heterogeneous programming
Boyana Norris
 
PPTX
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
PDF
High-Performance and Scalable Designs of Programming Models for Exascale Systems
inside-BigData.com
 
PDF
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
PDF
OpenHPI - Parallel Programming Concepts - Week 4
Peter Tröger
 
PDF
Panda scalable hpc_bestpractices_tue100418
inside-BigData.com
 
PDF
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
inside-BigData.com
 
PDF
Deep learning: Hardware Landscape
Grigory Sapunov
 
PDF
Mauricio breteernitiz hpc-exascale-iscte
mbreternitz
 
PDF
A Library for Emerging High-Performance Computing Clusters
Intel® Software
 
PDF
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 
PDF
Nikravesh big datafeb2013bt
Masoud Nikravesh
 
PDF
An Update on the European Processor Initiative
inside-BigData.com
 
PDF
High-Performance Computing with C++
JetBrains
 
PDF
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
npinto
 
PDF
Programming Models for Exascale Systems
inside-BigData.com
 
PPTX
Role of python in hpc
Dr Reeja S R
 
Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...
inside-BigData.com
 
MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plans
inside-BigData.com
 
Communication Frameworks for HPC and Big Data
inside-BigData.com
 
Heterogeneous programming
Boyana Norris
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
High-Performance and Scalable Designs of Programming Models for Exascale Systems
inside-BigData.com
 
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
OpenHPI - Parallel Programming Concepts - Week 4
Peter Tröger
 
Panda scalable hpc_bestpractices_tue100418
inside-BigData.com
 
Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systems
inside-BigData.com
 
Deep learning: Hardware Landscape
Grigory Sapunov
 
Mauricio breteernitiz hpc-exascale-iscte
mbreternitz
 
A Library for Emerging High-Performance Computing Clusters
Intel® Software
 
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 
Nikravesh big datafeb2013bt
Masoud Nikravesh
 
An Update on the European Processor Initiative
inside-BigData.com
 
High-Performance Computing with C++
JetBrains
 
[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)
npinto
 
Programming Models for Exascale Systems
inside-BigData.com
 
Role of python in hpc
Dr Reeja S R
 
Ad

Recently uploaded (20)

PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
July Patch Tuesday
Ivanti
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
July Patch Tuesday
Ivanti
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Python basic programing language for automation
DanialHabibi2
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 

Programming Trends in High Performance Computing

  • 1. Programming Trends in High Performance Computing 2016 Juris Vencels * Tianhe-2 (China) fastest in the world since June 2013
  • 2. 2 About Me B.Sc - Physics @ University of Latvia * Modeling of plasma processes in magnetron sputtering systems (Sidrabe, Inc.) M.Sc - Electrophysics @ KTH (Sweden) * Research engineer at EPiGRAM project (PDC Center for High Performance Comp.) Intern @ Los Alamos National Lab (USA) * Development of spectral codes for plasma physics problems (instabilities, turbulence)
  • 3. 3 * LINPACK Benchmark – solves Ax = b * FLOPs - floating-point operations per second * Kilo –> Mega –> Giga –> Tera –> Peta -> Exa Exascale – very common name in HPC EXAFLOPS Tianhe-2
  • 4. 4 CPUs: Intel Xeon E5-2692v2 12C 2.2GHz Accelerators: Intel Xeon Phi 31S1P 57C 1.1GHz 32’000 x 48’000 x 3’120’000 cores Linpack: 34 PFLOPS Theoretical: 55 PFLOPS 17.6 MW (24 MW with cooling) US$390 million Tianhe-2 - heterogeneous * CPUs - few fast cores * Accelerators – many slow cores - Accelerators outperform CPUs in * FLOPs/$ * FLOPs/Watts Future of HPC – heterogeneous hardware
  • 5. 5 Exascale ProGRAmming  Models www.epigram-project.eu Extrapolation of current technology to Exascale would result in * Codes that do not scale efficiently * Large risk of hardware failure * High power consumption * Expensive hardware The project mostly aims to solve the 1st problem
  • 6. 6 EPiGRAM project - Test experimental programming models in practice - IPIC3D for plasma physics * implicit Particle-In-Cell, fully electromagnetic * magnetic reconnection, magnetosphere, instabilities * C++, MPI+OpenMP - Nek5000 for incompressible fluids * spectral elements * fluid-dynamics in fission nuclear reactor * Fortran, MPI
  • 7. 7 Existing parallel programming APIs Widely used * MPI - Message Passing Interface * OpenMP - Open Multi-Processing * CUDA - programming interface for NVIDIA GPUs Application dependent or experimental * GPI-2 - Partitioned Global Address Space * OpenACC - Open Accelerators * OpenCL, Coarray Fortran, Chapel, Cilk, TBB, ...
  • 8. 8 MPI - Message Passing Interface - Distributed memory model - MPI 3.x provides some shared memory mechanisms Implementations * free: MPICH, Open MPI, ... * prop: Intel, Cray, ...
  • 9. 9 OpenMP - Open Multi-Processing - Shared memory model - MPI + OpenMP - Race condition - Compilers supporting OpenMP * free: GNU, ... * prop: Intel, Cray, ... Intel Xeon Phi 7120P 61 cores 1.24 GHz 16GB ~$2000
  • 10. 10 CUDA - Programming interface for NVIDIA GPUs - MPI + CUDA - Hard to code & debug - Small memory/core - Slow CPU GPU data transfer↔ NVIDIA Tesla K80 4992 CUDA cores 573-875 MHz 24GB ~$4000
  • 11. 11 PGAS - Partitioned Global Address Space - Abstract shared address space - Standards: GASPI, Coarray Fortran, Chapel, … EPiGRAM focused on GASPI implementation GPI-2 from * scalable, asynchronous, fault tolerant * proprietary € 6 24 96 384 1536 0.00 1.00 2.00 3.00 4.00 5.00 6.00 iPIC3D particle communication time (s) GPI2 MPI # of cores
  • 13. 13 OpenACC - Open Accelerators - Compiler directives (pragmas) for CPU+GPU systems - Higher level than CUDA, easier to use - Similar to OpenMP - Compilers: * free: OpenUH * prop: PGI, Cray, CAPS
  • 14. 14 Debugging parallel applications * free: Valgrind * prop: TotalView, Allinea DDT, Intel Inspector My choice * DDT - critical bugs * Intel Insp. - memory leaks
  • 15. 15 Profiling parallel applications * free: Valgrind * prop: Allinea MAP, Intel Vtune, Vampir My choice: Allinea MAP – simply compile the code with ‘-g’ option and run
  • 16. 16 Conclusions - HPC is moving towards heterogeneous hardware - Future codes will exploit high degree of parallelism - Petascale computer in 2008, Exascale in ~2020 - Most likely MPI will be present in Exascale (MPI+x) - Tolerance to hardware failures - Power consumption must be below 20MW Thank you! Questions?