Programming Trends in High Performance Computing

1 like446 views

The document discusses programming trends in high performance computing (HPC) as of 2016, highlighting the Tianhe-2 supercomputer's architecture and benchmarks. It reviews existing parallel programming APIs, challenges with scaling to exascale, and introduces the Epigram project's focus on developing efficient programming models. The future of HPC is predicted to rely on heterogeneous hardware and robust programming solutions to manage power consumption and hardware failure risks.

Technology

Programming Trends in
High Performance Computing
2016
Juris Vencels
* Tianhe-2 (China)
fastest in the world since June 2013

2
About Me
B.Sc - Physics @ University of Latvia
* Modeling of plasma processes in
magnetron sputtering systems (Sidrabe, Inc.)
M.Sc - Electrophysics @ KTH (Sweden)
* Research engineer at EPiGRAM project
(PDC Center for High Performance Comp.)
Intern @ Los Alamos National Lab (USA)
* Development of spectral codes for plasma
physics problems (instabilities, turbulence)

3
* LINPACK Benchmark – solves Ax = b
* FLOPs - floating-point operations per second
* Kilo –> Mega –> Giga –> Tera –> Peta -> Exa
Exascale – very common name in HPC
EXAFLOPS
Tianhe-2

4
CPUs:
Intel Xeon E5-2692v2 12C 2.2GHz
Accelerators:
Intel Xeon Phi 31S1P 57C 1.1GHz
32’000 x
48’000 x
3’120’000 cores
Linpack: 34 PFLOPS
Theoretical: 55 PFLOPS
17.6 MW (24 MW with cooling)
US$390 million
Tianhe-2
- heterogeneous
* CPUs - few fast cores
* Accelerators – many slow cores
- Accelerators outperform CPUs in
* FLOPs/$
* FLOPs/Watts
Future of HPC – heterogeneous hardware

5
Exascale ProGRAmming Models
www.epigram-project.eu
Extrapolation of current technology to
Exascale would result in
* Codes that do not scale efficiently
* Large risk of hardware failure
* High power consumption
* Expensive hardware
The project mostly aims to solve the 1st problem

6
EPiGRAM project
- Test experimental programming models in practice
- IPIC3D for plasma physics
* implicit Particle-In-Cell, fully electromagnetic
* magnetic reconnection, magnetosphere, instabilities
* C++, MPI+OpenMP
- Nek5000 for incompressible fluids
* spectral elements
* fluid-dynamics in fission nuclear reactor
* Fortran, MPI

7
Existing parallel programming APIs
Widely used
* MPI - Message Passing Interface
* OpenMP - Open Multi-Processing
* CUDA - programming interface for NVIDIA GPUs
Application dependent or experimental
* GPI-2 - Partitioned Global Address Space
* OpenACC - Open Accelerators
* OpenCL, Coarray Fortran, Chapel, Cilk, TBB, ...

8
MPI - Message Passing Interface
- Distributed memory model
- MPI 3.x provides some shared memory
mechanisms
Implementations
* free: MPICH, Open MPI, ...
* prop: Intel, Cray, ...

9
OpenMP - Open Multi-Processing
- Shared memory model
- MPI + OpenMP
- Race condition
- Compilers supporting OpenMP
* free: GNU, ...
* prop: Intel, Cray, ...
Intel Xeon Phi 7120P
61 cores
1.24 GHz
16GB
~$2000

10
CUDA
- Programming interface for NVIDIA GPUs
- MPI + CUDA
- Hard to code & debug
- Small memory/core
- Slow CPU GPU data transfer↔
NVIDIA Tesla K80
4992 CUDA cores
573-875 MHz
24GB
~$4000

11
PGAS - Partitioned Global Address Space
- Abstract shared address space
- Standards: GASPI, Coarray Fortran, Chapel, …
EPiGRAM focused on GASPI implementation GPI-2 from
* scalable, asynchronous, fault tolerant
* proprietary €
6 24 96 384 1536
0.00
1.00
2.00
3.00
4.00
5.00
6.00
iPIC3D particle communication time (s)
GPI2
MPI
# of cores

13
OpenACC - Open Accelerators
- Compiler directives (pragmas) for CPU+GPU systems
- Higher level than CUDA, easier to use
- Similar to OpenMP
- Compilers:
* free: OpenUH
* prop: PGI, Cray, CAPS

14
Debugging parallel applications
* free: Valgrind
* prop: TotalView, Allinea DDT, Intel Inspector
My choice
* DDT - critical bugs
* Intel Insp. - memory leaks

15
Profiling parallel applications
* free: Valgrind
* prop: Allinea MAP, Intel Vtune, Vampir
My choice: Allinea MAP – simply compile the code with ‘-g’ option and run

16
Conclusions
- HPC is moving towards heterogeneous hardware
- Future codes will exploit high degree of parallelism
- Petascale computer in 2008, Exascale in ~2020
- Most likely MPI will be present in Exascale (MPI+x)
- Tolerance to hardware failures
- Power consumption must be below 20MW
Thank you!
Questions?

More Related Content

What's hot (20)

PDF

Building Network Functions with eBPF & BCCKernel TLV

PDF

netfilter and iptablesKernel TLV

PDF

Learning Erlang (from a Prolog dropout's perspective)elliando dias

PDF

Linux Kernel Cryptographic API and Use CasesKernel TLV

PDF

Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis

PPT

PFQ@ PAM12Nicola Bonelli

PPT

PF_DIRECT@TMA12Nicola Bonelli

PDF

P4, EPBF, and Linux TC OffloadOpen-NFP

PDF

Functional approach to packet processingNicola Bonelli

PDF

BPF - in-kernel virtual machineAlexei Starovoitov

PDF

FreeBSD and DriversKernel TLV

PPTX

Linux Network StackAdrien Mahieux

PDF

Comprehensive XDP Off‌load-handling the Edge CasesNetronome

PDF

Deep Learning on ARM Platforms - SFO17-509Linaro

PDF

VLANs in the Linux KernelKernel TLV

PDF

Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf

PDF

Exploring the Programming Models for the LUMI Supercomputer George Markomanolis

PDF

Fun with Network InterfacesKernel TLV

PDF

Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas

PPTX

A Kernel of Truth: Intrusion Detection and Attestation with eBPFoholiab

Building Network Functions with eBPF & BCCKernel TLV

netfilter and iptablesKernel TLV

Learning Erlang (from a Prolog dropout's perspective)elliando dias

Linux Kernel Cryptographic API and Use CasesKernel TLV

Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis

PFQ@ PAM12Nicola Bonelli

PF_DIRECT@TMA12Nicola Bonelli

P4, EPBF, and Linux TC OffloadOpen-NFP

Functional approach to packet processingNicola Bonelli

BPF - in-kernel virtual machineAlexei Starovoitov

FreeBSD and DriversKernel TLV

Linux Network StackAdrien Mahieux

Comprehensive XDP Off‌load-handling the Edge CasesNetronome

Deep Learning on ARM Platforms - SFO17-509Linaro

VLANs in the Linux KernelKernel TLV

Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf

Exploring the Programming Models for the LUMI Supercomputer George Markomanolis

Fun with Network InterfacesKernel TLV

Kernel Recipes 2017 - EBPF and XDP - Eric LeblondAnne Nicolas

A Kernel of Truth: Intrusion Detection and Attestation with eBPFoholiab

Viewers also liked (11)

DOCX

nikkies resumes 2016Sheneque Davis

PPTX

SAP BASIS Training in ChennaiThecreating Experts

PPTX

Beneficios que proporciona el consumo de plátanos para personas con enfermeda...Candy Zapata Caballero

PPTX

Johnson ESS Digital Sociology 031516(S)W. Michael Johnson

DOCX

40 نکته جهت افزایش انگیزه دانش آموزانreza kaboli

PDF

Olivia Hosie - Speaker Score Letter HR Tech Fest 2016Olivia Hosie

PPT

Malware & Safe Browsingjgswift

PDF

Бережливое производство 2016Sergey Tsvetaev

PDF

China's Bubble Cambridge Lecture MTMoTanweer

PPTX

Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit

PDF

TLE-HE Travel Services Curriculum GuideDr. Joy Kenneth Sala Biasong

nikkies resumes 2016Sheneque Davis

SAP BASIS Training in ChennaiThecreating Experts

Beneficios que proporciona el consumo de plátanos para personas con enfermeda...Candy Zapata Caballero

Johnson ESS Digital Sociology 031516(S)W. Michael Johnson

40 نکته جهت افزایش انگیزه دانش آموزانreza kaboli

Olivia Hosie - Speaker Score Letter HR Tech Fest 2016Olivia Hosie

Malware & Safe Browsingjgswift

Бережливое производство 2016Sergey Tsvetaev

China's Bubble Cambridge Lecture MTMoTanweer

Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit

TLE-HE Travel Services Curriculum GuideDr. Joy Kenneth Sala Biasong

Similar to Programming Trends in High Performance Computing (20)

PDF

Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com

PDF

MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com

PPTX

Communication Frameworks for HPC and Big Datainside-BigData.com

PPTX

Heterogeneous programmingBoyana Norris

PPTX

Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com

PDF

High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com

PDF

Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com

PDF

OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger

PDF

Panda scalable hpc_bestpractices_tue100418inside-BigData.com

PDF

Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com

PDF

Deep learning: Hardware LandscapeGrigory Sapunov

PDF

Mauricio breteernitiz hpc-exascale-isctembreternitz

PDF

A Library for Emerging High-Performance Computing ClustersIntel® Software

PDF

Directive-based approach to Heterogeneous ComputingRuymán Reyes

PDF

Nikravesh big datafeb2013btMasoud Nikravesh

PDF

An Update on the European Processor Initiativeinside-BigData.com

PDF

High-Performance Computing with C++JetBrains

PDF

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto

PDF

Programming Models for Exascale Systemsinside-BigData.com

PPTX

Role of python in hpcDr Reeja S R

Designing Software Libraries and Middleware for Exascale Systems: Opportuniti...inside-BigData.com

MVAPICH2 and MVAPICH2-X Projects: Latest Developments and Future Plansinside-BigData.com

Communication Frameworks for HPC and Big Datainside-BigData.com

Heterogeneous programmingBoyana Norris

Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com

High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com

Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com

OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger

Panda scalable hpc_bestpractices_tue100418inside-BigData.com

Designing Scalable HPC, Deep Learning and Cloud Middleware for Exascale Systemsinside-BigData.com

Deep learning: Hardware LandscapeGrigory Sapunov

Mauricio breteernitiz hpc-exascale-isctembreternitz

A Library for Emerging High-Performance Computing ClustersIntel® Software

Directive-based approach to Heterogeneous ComputingRuymán Reyes

Nikravesh big datafeb2013btMasoud Nikravesh

An Update on the European Processor Initiativeinside-BigData.com

High-Performance Computing with C++JetBrains

[Harvard CS264] 07 - GPU Cluster Programming (MPI & ZeroMQ)npinto

Programming Models for Exascale Systemsinside-BigData.com

Role of python in hpcDr Reeja S R

Recently uploaded (20)

PDF

Complete JavaScript Notes: From Basics to Advanced Concepts.pdfhaydendavispro

PDF

Achieving Consistent and Reliable AI Code Generation - Medusa AImedusaaico

PDF

Agentic AI lifecycle for Enterprise Hyper-AutomationDebmalya Biswas

PDF

Smart Trailers 2025 Update with History and OverviewPaul Menig

PDF

Bitcoin for Millennials podcast with Bram, Power Laws of BitcoinStephen Perrenod

PDF

[Newgen] NewgenONE Marvin Brochure 1.pdfdarshakparmar

PDF

Exolore The Essential AI Tools in 2025.pdfSrinivasan M

PDF

HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...mcastillo49

PDF

How Startups Are Growing Faster with App Developers in Australia.pdfIndia App Developer

PDF

CIFDAQ Market Insights for July 7th 2025CIFDAQ

PDF

Chris Elwell Woburn, MA - Passionate About IT InnovationChris Elwell Woburn, MA

PDF

July Patch TuesdayIvanti

PDF

SWEBOK Guide and Software Services Engineering EducationHironori Washizaki

PDF

Empower Inclusion Through Accessible Java ApplicationsAna-Maria Mihalceanu

PDF

Python basic programing language for automationDanialHabibi2

PDF

Reverse Engineering of Security Products: Developing an Advanced Microsoft De...nwbxhhcyjv

PDF

Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdfdarshakparmar

PDF

"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...Fwdays

PPTX

AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptxsameeraaabegumm

PPTX

WooCommerce Workshop: Bring Your LaptopLaura Hartwig

Complete JavaScript Notes: From Basics to Advanced Concepts.pdfhaydendavispro

Achieving Consistent and Reliable AI Code Generation - Medusa AImedusaaico

Agentic AI lifecycle for Enterprise Hyper-AutomationDebmalya Biswas

Smart Trailers 2025 Update with History and OverviewPaul Menig

Bitcoin for Millennials podcast with Bram, Power Laws of BitcoinStephen Perrenod

[Newgen] NewgenONE Marvin Brochure 1.pdfdarshakparmar

Exolore The Essential AI Tools in 2025.pdfSrinivasan M

HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...mcastillo49

How Startups Are Growing Faster with App Developers in Australia.pdfIndia App Developer

CIFDAQ Market Insights for July 7th 2025CIFDAQ

Chris Elwell Woburn, MA - Passionate About IT InnovationChris Elwell Woburn, MA

July Patch TuesdayIvanti

SWEBOK Guide and Software Services Engineering EducationHironori Washizaki

Empower Inclusion Through Accessible Java ApplicationsAna-Maria Mihalceanu

Python basic programing language for automationDanialHabibi2

Reverse Engineering of Security Products: Developing an Advanced Microsoft De...nwbxhhcyjv

Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdfdarshakparmar

"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...Fwdays

AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptxsameeraaabegumm

WooCommerce Workshop: Bring Your LaptopLaura Hartwig

Programming Trends in High Performance Computing

1. Programming Trends in High Performance Computing 2016 Juris Vencels * Tianhe-2 (China) fastest in the world since June 2013

2. 2 About Me B.Sc - Physics @ University of Latvia * Modeling of plasma processes in magnetron sputtering systems (Sidrabe, Inc.) M.Sc - Electrophysics @ KTH (Sweden) * Research engineer at EPiGRAM project (PDC Center for High Performance Comp.) Intern @ Los Alamos National Lab (USA) * Development of spectral codes for plasma physics problems (instabilities, turbulence)

3. 3 * LINPACK Benchmark – solves Ax = b * FLOPs - floating-point operations per second * Kilo –> Mega –> Giga –> Tera –> Peta -> Exa Exascale – very common name in HPC EXAFLOPS Tianhe-2

4. 4 CPUs: Intel Xeon E5-2692v2 12C 2.2GHz Accelerators: Intel Xeon Phi 31S1P 57C 1.1GHz 32’000 x 48’000 x 3’120’000 cores Linpack: 34 PFLOPS Theoretical: 55 PFLOPS 17.6 MW (24 MW with cooling) US$390 million Tianhe-2 - heterogeneous * CPUs - few fast cores * Accelerators – many slow cores - Accelerators outperform CPUs in * FLOPs/$ * FLOPs/Watts Future of HPC – heterogeneous hardware

5. 5 Exascale ProGRAmming Models www.epigram-project.eu Extrapolation of current technology to Exascale would result in * Codes that do not scale efficiently * Large risk of hardware failure * High power consumption * Expensive hardware The project mostly aims to solve the 1st problem

6. 6 EPiGRAM project - Test experimental programming models in practice - IPIC3D for plasma physics * implicit Particle-In-Cell, fully electromagnetic * magnetic reconnection, magnetosphere, instabilities * C++, MPI+OpenMP - Nek5000 for incompressible fluids * spectral elements * fluid-dynamics in fission nuclear reactor * Fortran, MPI

7. 7 Existing parallel programming APIs Widely used * MPI - Message Passing Interface * OpenMP - Open Multi-Processing * CUDA - programming interface for NVIDIA GPUs Application dependent or experimental * GPI-2 - Partitioned Global Address Space * OpenACC - Open Accelerators * OpenCL, Coarray Fortran, Chapel, Cilk, TBB, ...

8. 8 MPI - Message Passing Interface - Distributed memory model - MPI 3.x provides some shared memory mechanisms Implementations * free: MPICH, Open MPI, ... * prop: Intel, Cray, ...

9. 9 OpenMP - Open Multi-Processing - Shared memory model - MPI + OpenMP - Race condition - Compilers supporting OpenMP * free: GNU, ... * prop: Intel, Cray, ... Intel Xeon Phi 7120P 61 cores 1.24 GHz 16GB ~$2000

10. 10 CUDA - Programming interface for NVIDIA GPUs - MPI + CUDA - Hard to code & debug - Small memory/core - Slow CPU GPU data transfer↔ NVIDIA Tesla K80 4992 CUDA cores 573-875 MHz 24GB ~$4000

11. 11 PGAS - Partitioned Global Address Space - Abstract shared address space - Standards: GASPI, Coarray Fortran, Chapel, … EPiGRAM focused on GASPI implementation GPI-2 from * scalable, asynchronous, fault tolerant * proprietary € 6 24 96 384 1536 0.00 1.00 2.00 3.00 4.00 5.00 6.00 iPIC3D particle communication time (s) GPI2 MPI # of cores

12. 12 One sided communication MPI vs GASPI

13. 13 OpenACC - Open Accelerators - Compiler directives (pragmas) for CPU+GPU systems - Higher level than CUDA, easier to use - Similar to OpenMP - Compilers: * free: OpenUH * prop: PGI, Cray, CAPS

14. 14 Debugging parallel applications * free: Valgrind * prop: TotalView, Allinea DDT, Intel Inspector My choice * DDT - critical bugs * Intel Insp. - memory leaks

15. 15 Profiling parallel applications * free: Valgrind * prop: Allinea MAP, Intel Vtune, Vampir My choice: Allinea MAP – simply compile the code with ‘-g’ option and run

16. 16 Conclusions - HPC is moving towards heterogeneous hardware - Future codes will exploit high degree of parallelism - Petascale computer in 2008, Exascale in ~2020 - Most likely MPI will be present in Exascale (MPI+x) - Tolerance to hardware failures - Power consumption must be below 20MW Thank you! Questions?