SlideShare a Scribd company logo
Porting application to Intel Xeon Phi: some experiences

    RIKEN Advanced Center for Computing and Communication
    2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US

    maho@riken.jp

    Other side of my face
    maho@FreeBSD.org (FreeBSD committer)
    maho@apache.org (Apache OpenOffice committer)
                                                                  2012/11 Super Computing 2012




12年11月15日木曜日
Aims of my talk

    •Proof of concept:
       - Intel says, “One source base, tuned to many targets”
      - Is it true or not?
         - my answer is TRUE.
    •Native model is considered
      - Just compile with Intel Composer XE 2013 :-)
      - Offload model is extremely demanding for modern complicated programs
         - CUDA expertise's say: to get performance, do everything on GPU, do not
           transfer data between CPU and GPU.
         - Modern applications use a lot of external open source / free software
           packages. Very complex structure!
         - Not realistic!
    •Providing Porting tips
     - Gaussian09, povray, sdpa...                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
What is Intel Xeon Phi ??
    • Intel Xeon Phi is a co-processor, connected via PCI-express slot.
    • Peak performance is 1TFlops in double precision
       - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM...
    • We can see as if there are another cluster of computer inside a Linux box.
       - Linux micro OS is provided
    • Better programability
       - x86 based (64bit)
       - Development tool: Intel Composer XE 2013
          - C, C++, Fortran
          - compile and run same code to CPU
          - familiar parallelism : OpenMP, MPI, OpenCL
       - Various programming model
          - MIC centric
          - CPU centric
       -CAUTION: BINARY IS INCOMPATIBLE!
       -Recompile is needed for Xeon Phi!

                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
How to build your program on Xeon Phi
    •Very easy.
    •Just passing -mmic flags to Compilers
      -icc -mmic
      -icpc -mmic
      -ifort -mmic
    •How to link against optimized BLAS and LAPACK?
      -just add -mkl
      -same for CPU case.




                                                      Super Computing 2012 @ Intel Booth

12年11月15日木曜日
DGEMM benchmark: sorry, no free lunch, tune Needed.
    • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU
      performance (if tuned) so it is used for benchmarking.
       - not see the memory bandwidth
    • Intel Xeon Phi’s theoretical peak performance is 1TFlops.
    • Do we need some tunes for Intel Xeon Phi?
       - YES. Otherwise 40% of peak is attained: ~400GFlops
       - If tuned we attain ~816GFlops.
       - memory allocation, thread affinity
    • How to obtain the data?
       - just malloc and fill random values
       - no alignment is specified
       - CPU’s case it is sufficient, but
       - not sufficient for Xeon Phi.




                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
SDPA : How to cheat “configure” part I
    • SDPA is a highly efficient semidefinite programming solver.
       - distributed at http://sdpa.sourceforge.net/, under GPL.
    • ./configure ; make (on CPU)
    • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this?
       - almost the same environment...
       - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then
         replace to “-mmic”, then compile.
                           #!/bin/sh

                           CC="icc"; export CC
                           CXX="icpc"; export CXX
                           FC="ifort"; export FC

                           CFLAGS="-DMMIC" ; export CFLAGS
                           CXXFLAGS="-DMMIC" ; export CXXFLAGS
                           FFLAGS="-DMMIC" ; export FFLAGS

                           ./configure --with-blas="-mkl" --with-lapack="-mkl"

                           files=$(find ./* -name Makefile)
                           perl -p -i -e 's/-DMMIC/-mmic/g' $files
                                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure part II
    • The Persistence of Vision Raytracer is a high-quality, totally free tool for
      creating stunning three-dimensional graphics; a famous ray tracing program.
    • This treat how to build Povray 3.7 RC
       - This version is the first pthread parallelized Povray.
    • Requires some external libraries other than provided to Intel Xeon Phi.




                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • How to build boost and zlib: We took the same strategy as povray.
       - First build and install host version of boost to /home/maho/HOST then Phi
         version to /home/maho/MIC
       - Next, build and install host version of zlib to /home/maho/HOST
       - then, build Phi version as follows:
          - backup /home/maho/MIC to /home/maho/MIC.org
          - copy /home/maho/HOST to /home/maho/MIC
          - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
              - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
          - remove /home/maho/MIC
          - rename /home/maho/MIC.org to /home/maho/MIC
          - replace -DMMIC to -mmic
          - make for Xeon Phi binary.
          - Done.
    • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth
12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • Strategy: do build twice: host build then Xeon Phi build
       - build and install host version of libraries to /home/maho/HOST
       - build and install Phi version of libraires to /home/maho/MIC
          - actually,
    • Final configure for Povray should be done as follows:
       - backup /home/maho/MIC to /home/maho/MIC.org
       - copy /home/maho/HOST to /home/maho/MIC
       - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
          - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
       - remove /home/maho/MIC
       - rename /home/maho/MIC.org to /home/maho/MIC
       - replace -DMMIC to -mmic
       - make for Xeon Phi binary.
       - Done.
                                                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Gaussian09 is a famous quantum chemical program package and it provides state-
      of the-art capabilities for electronic structure modeling.
    • Very large source code: 1.7 million lines
       - $ cat *F | wc -l
       - 1714217
    • Intel Composer XE is not officially supported compiler
       - Gaussian Inc. only supports PGI compiler.
       - Patches are made by M.N. (sorry, we cannot provide the patches to public)
       - Small set of patches enable us to build
         -   -rw-r--r--. 1 maho users   463 1 30 10:53 2012 patch-bsd+buldg09
         -   -rw-r--r--. 1 maho users   692 1 30 10:53 2012 patch-bsd+fsplit.c
         -   -rw-r--r-- 1 maho users    5674 10 18 16:41 2012 patch-bsd+i386.make
         -   -rw-r--r--. 1 maho users   643 1 30 10:53 2012 patch-bsd+mdutil.F
         -   -rw-r--r--. 1 maho users   240 1 30 10:53 2012 patch-bsd+mygau
         -   -rw-r--r--. 1 maho users   486 1 30 10:53 2012 patch-bsd+set-mflags

       - patches are almost the same as hosts’ one.
         - almost merely adding -mmic
      - somehow shared libs don’t work??
         - utils.a should be a static library.
         - Intel MKL should also be linked statically.
         - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed?
         - Resultant binaries occupy approximately 2GB                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Just run
    • Still very unstable with -O3
       - l303.exe (just wish your luck)
       - l401.exe (should be built with -O0)
       - Passed:(just test000.com-test200.com)
         test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03
         8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11
         5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17
         0,172,177,184,188,195




                                                               Super Computing 2012 @ Intel Booth

12年11月15日木曜日
A packaging system (pkgsrc) porting effort on Intel Phi!!!

    • What is the pkgsrc?
         - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000
           packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://
           www.pkgsrc.org/

    • NAKATA, Maho has over ten years of FreeBSD ports committer experience.
    • Why pkgsrc?
      - We need MORE software packages on Intel Phi!
         - Currently HPC program packages depend on other free software packages.
      - RPM, deb are too complex (to me).
      - Native tool chain for Intel Phi is really important
         - ./configure (autotools) is a good one but cross building is rarely supported.
         - ./configure looks some parameters of the host machine.
         - Intel Composer can be used as if it is a native toolkit with a small trick.
      - highly portable packaging system: works on *BSD (Net, DragonFly, Free),
        various Linux variants, AIX, MacOSX, FreeBSD
    • Status:
      - ./bootstrap : done
    • How to get?
      - I’ll provide ASAP on sourceforge.net or somewhere...
12年11月15日木曜日
Summary and outlook
    • We tested Intel Xeon Phi, especially how to build Phi native binary.
       -“One source base, tuned to many targets” is TRUE!
    • We regard Intel Xeon Phi as a small Linux cluster.
       - but no binary compatibility inbetween.
    • We provided a porting tip; how to build gaussian, povray and sdpa.
    • For packages using autotools (./configure) or similar things, our approach
      requires two pass configure to cheat
       - if configure looks Phi specific stuffs like availability of FMA, then this
         strategy doesn’t work.
       - Yoshikazu Kamoshida’s strategy solves for configure or build system which
         requires run small programs on target machine (SWoPP 2012; Development of
         middleware which facilitate tuning while installation under cross compile
         environment).
    • More packages are needed!
       - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment
         like Intel Xeon Phi.
               - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over
                 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms;
                 http://www.pkgsrc.org/
12年11月15日木曜日

More Related Content

What's hot (20)

PDF
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
PDF
On the Capability and Achievable Performance of FPGAs for HPC Applications
Wim Vanderbauwhede
 
PDF
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
PDF
Omp tutorial cpugpu_programming_cdac
Ganesan Narayanasamy
 
PDF
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
PDF
Getting started with AMD GPUs
George Markomanolis
 
PDF
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
PDF
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
PDF
Making Hardware Accelerator Easier to Use
Kazuaki Ishizaki
 
PDF
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
PDF
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
PDF
Distributed TensorFlow on Hops (Papis London, April 2018)
Jim Dowling
 
PDF
TinyML as-a-Service
Hiroshi Doyu
 
PDF
Towards Automated Design Space Exploration and Code Generation using Type Tra...
waqarnabi
 
PDF
Available HPC resources at CSUC
CSUC - Consorci de Serveis Universitaris de Catalunya
 
PDF
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
PDF
Everything You Need to Know About the Intel® MPI Library
Intel® Software
 
PDF
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Anne Nicolas
 
PPTX
Advanced spark deep learning
Adam Gibson
 
PDF
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 
Assisting User’s Transition to Titan’s Accelerated Architecture
inside-BigData.com
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
Wim Vanderbauwhede
 
Easy and High Performance GPU Programming for Java Programmers
Kazuaki Ishizaki
 
Omp tutorial cpugpu_programming_cdac
Ganesan Narayanasamy
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
Linaro
 
Getting started with AMD GPUs
George Markomanolis
 
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
MLconf
 
Making Hardware Accelerator Easier to Use
Kazuaki Ishizaki
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
Linaro
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
Jonathan Dursi
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Jim Dowling
 
TinyML as-a-Service
Hiroshi Doyu
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
waqarnabi
 
Exploiting GPUs in Spark
Kazuaki Ishizaki
 
Everything You Need to Know About the Intel® MPI Library
Intel® Software
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Anne Nicolas
 
Advanced spark deep learning
Adam Gibson
 
Profiling PyTorch for Efficiency & Sustainability
geetachauhan
 

Viewers also liked (6)

PDF
Post-processing SAR images on Xeon Phi - a porting exercise
Intel IT Center
 
PDF
Intel xeon phi coprocessor slideshare ppt
Intel IT Center
 
PDF
Using Xeon + FPGA for Accelerating HPC Workloads
inside-BigData.com
 
PPTX
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
PDF
Nervana AI Overview Deck April 2016
Sean Everett
 
PDF
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
Post-processing SAR images on Xeon Phi - a porting exercise
Intel IT Center
 
Intel xeon phi coprocessor slideshare ppt
Intel IT Center
 
Using Xeon + FPGA for Accelerating HPC Workloads
inside-BigData.com
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana
 
Nervana AI Overview Deck April 2016
Sean Everett
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana
 
Ad

Similar to Some experiences for porting application to Intel Xeon Phi (20)

PDF
Dmitriy D1g1 Evdokimov - DBI Intro
DefconRussia
 
PDF
olibc: Another C Library optimized for Embedded Linux
National Cheng Kung University
 
PDF
May2010 hex-core-opt
Jeff Larkin
 
PDF
Linux on System z debugging with Valgrind
IBM India Smarter Computing
 
DOCX
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Scott Tsai
 
PDF
Metasepi team meeting #14: ATS programming on MCU
Kiwamu Okabe
 
PDF
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
PDF
オペレーティングシステム 設計と実装 第3版(20101211)
Ryousei Takano
 
PDF
Cray XT Porting, Scaling, and Optimization Best Practices
Jeff Larkin
 
PDF
Android OS Porting: Introduction
Jollen Chen
 
PDF
Linux on System z the Toolchain in a Nutshell
IBM India Smarter Computing
 
PDF
Kitware: Qt and Scientific Computing
account inactive
 
PDF
DTrace Topics: Introduction
Brendan Gregg
 
PDF
Next Stop, Android
National Cheng Kung University
 
PDF
HPC lab projects
Jason Riedy
 
PDF
Parsing and Type checking all 2^10000 configurations of the Linux kernel
chk49
 
PDF
今日から始めるPlan 9 from Bell Labs
Ryousei Takano
 
PDF
Introduction to the IBM AS/400
tvlooy
 
PDF
Design and Concepts of Android Graphics
National Cheng Kung University
 
PDF
Cmake kitware
achintyalte
 
Dmitriy D1g1 Evdokimov - DBI Intro
DefconRussia
 
olibc: Another C Library optimized for Embedded Linux
National Cheng Kung University
 
May2010 hex-core-opt
Jeff Larkin
 
Linux on System z debugging with Valgrind
IBM India Smarter Computing
 
Bsdtw17: johannes m dieterich: high performance computing and gpu acceleratio...
Scott Tsai
 
Metasepi team meeting #14: ATS programming on MCU
Kiwamu Okabe
 
L Fu - Dao: a novel programming language for bioinformatics
Jan Aerts
 
オペレーティングシステム 設計と実装 第3版(20101211)
Ryousei Takano
 
Cray XT Porting, Scaling, and Optimization Best Practices
Jeff Larkin
 
Android OS Porting: Introduction
Jollen Chen
 
Linux on System z the Toolchain in a Nutshell
IBM India Smarter Computing
 
Kitware: Qt and Scientific Computing
account inactive
 
DTrace Topics: Introduction
Brendan Gregg
 
Next Stop, Android
National Cheng Kung University
 
HPC lab projects
Jason Riedy
 
Parsing and Type checking all 2^10000 configurations of the Linux kernel
chk49
 
今日から始めるPlan 9 from Bell Labs
Ryousei Takano
 
Introduction to the IBM AS/400
tvlooy
 
Design and Concepts of Android Graphics
National Cheng Kung University
 
Cmake kitware
achintyalte
 
Ad

More from Maho Nakata (20)

PDF
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
Maho Nakata
 
PDF
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Maho Nakata
 
PDF
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
Maho Nakata
 
PPTX
Q#による量子化学計算 : 水素分子の位相推定について
Maho Nakata
 
PPTX
量子コンピュータの量子化学計算への応用の現状と展望
Maho Nakata
 
PPTX
qubitによる波動関数の虚時間発展のシミュレーション: a review
Maho Nakata
 
PDF
Openfermionを使った分子の計算 part I
Maho Nakata
 
PPTX
量子コンピュータで量子化学のfullCIが超高速になる(かも
Maho Nakata
 
PDF
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
Maho Nakata
 
PPTX
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
Maho Nakata
 
PPTX
Kobeworkshop pubchemqc project
Maho Nakata
 
PPTX
計算化学実習講座:第二回
Maho Nakata
 
PPTX
計算化学実習講座:第一回
Maho Nakata
 
PPTX
HOKUSAIのベンチマーク 理研シンポジウム 中田分
Maho Nakata
 
PPTX
為替取引(FX)でのtickdataの加工とMySQLで管理
Maho Nakata
 
PPTX
為替のTickdataをDukascopyからダウンロードする
Maho Nakata
 
PPTX
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
Maho Nakata
 
PDF
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
Maho Nakata
 
PPTX
The PubChemQC Project
Maho Nakata
 
DOCX
3Dプリンタ導入記 タンパク質の模型をプリントする
Maho Nakata
 
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
Maho Nakata
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Maho Nakata
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
Maho Nakata
 
Q#による量子化学計算 : 水素分子の位相推定について
Maho Nakata
 
量子コンピュータの量子化学計算への応用の現状と展望
Maho Nakata
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
Maho Nakata
 
Openfermionを使った分子の計算 part I
Maho Nakata
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
Maho Nakata
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
Maho Nakata
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
Maho Nakata
 
Kobeworkshop pubchemqc project
Maho Nakata
 
計算化学実習講座:第二回
Maho Nakata
 
計算化学実習講座:第一回
Maho Nakata
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
Maho Nakata
 
為替取引(FX)でのtickdataの加工とMySQLで管理
Maho Nakata
 
為替のTickdataをDukascopyからダウンロードする
Maho Nakata
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
Maho Nakata
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
Maho Nakata
 
The PubChemQC Project
Maho Nakata
 
3Dプリンタ導入記 タンパク質の模型をプリントする
Maho Nakata
 

Recently uploaded (20)

PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
July Patch Tuesday
Ivanti
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 

Some experiences for porting application to Intel Xeon Phi

  • 1. Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US [email protected] Other side of my face [email protected] (FreeBSD committer) [email protected] (Apache OpenOffice committer)  2012/11 Super Computing 2012 12年11月15日木曜日
  • 2. Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertise's say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 3. What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 4. How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 5. DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 6. SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./configure --with-blas="-mkl" --with-lapack="-mkl" files=$(find ./* -name Makefile) perl -p -i -e 's/-DMMIC/-mmic/g' $files Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 7. Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 8. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 9. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 10. Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 11. Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 12. A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere... 12年11月15日木曜日
  • 13. Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/ 12年11月15日木曜日