SlideShare a Scribd company logo
OPENMP ANALYSIS IN VTUNE AMPLIFIER XE
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
2
Agenda
• VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions
about performance in the same language a program was written in
• Concepts, metrics and technology inside
• VTune Amplifier XE OpenMP Analysis Workflow
• OpenMP analysis for hybrid MPI + OpenMP applications
• Summary
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
3
Typical customer questions on parallelization efficiency of
OpenMP* applications
• “I put pragmas but why my
speed up is far from linear?
• Parallelization inefficiency
• “I ran my app on a system with
bigger number of cores but it
does not run as efficient as on
smaller number”
• Scalability issues
• Is serial time of my application significant
to prevent scaling?
• How efficient is my OpenMP
parallelization?
• If not, how much gain can be achieved if I
invest in fighting with the inefficiencies?
• What OpenMP regions/loops/barriers are
worth to tune?
• What are the particular problems with them?
Decomposing
the questions
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
4
If Performance Information is OpenMP “Unaware”..
OpenMP “unaware” views of VTune Amplifier XE
Difficult to detect problems, customers might even “blame”
runtime seeing CPU time consumption there not understanding
that this is a result of parallelization inefficiency
The questions are tied to OpenMP program structure – #pragmas
Answers should be given the same way to be understandable and
actionable
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Is serial time of my application significant to prevent scaling?
How efficient is my parallelization towards ideal parallel execution?
How much theoretical gain I can get if invest in tuning?
What regions are more
perspective to invest?
Links to grid view for
more details on
inefficiency
5
VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions
about performance in the same language a program was written in
Overview on summary pane
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6
Key of OpenMP awareness in VTune – Region based views
and metrics
Definition of Region Potential Gain (elapsed time metric)
Lock spinning (sampling)
Effective time (sampling)
Imbalance (tracing)
Actual Parallel Region Elapsed Time
Estimated Ideal Time =
Effective time / Number of Threads
Potential Gain as a sum of inefficiencies normalized by num of threads
Fork Join
Scheduling (sampling)
Work forking (sampling)
Atomics (sampling)
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
7
Technology under VTune Amplifier XE OpenMP Analysis
Tracing of OpenMP constructions to provide region/work sharing context and
precise imbalance on barriers
• Provided to VTune by Intel OpenMP Runtime under profiling
• Fork-Join points of parallel regions with number of working threads (Intel Compiler 14 and later)
• OpenMP construct barrier points with imbalance info and OpenMP loop metadata
• -parallel-source-info=2 compiler option to embed source file name to a region name
• Looking at transition to OMPT, working with John M.-C. on interface enrichments for low overhead
analysis
Sampling to define and classify CPU time - user’s code and OpenMP RTL work
• Classification: Locking, Scheduling, Work Forking
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
8
VTune Amplifier XE OpenMP Analysis Workflow
Start with HPC Performance Characterization analysis
Explore CPU Utilization aspect metrics related to OpenMP in summary,
grid, source view
CL: >amplxe-cl –collect hpc-performance <my_app>
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
9
Per Region Details in grid view: inefficiencies in wall time -
classification and issue highlighting
Imbalance on a loop barrier
Dynamic scheduling overhead on a parallel loop
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
10
Details in Grid View: Serial Time Hotspots
Serial hotspots under
Master Thread
Time Filter to exclude
initialization phase
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
11
Details on Scalable Timeline
Super tiny timeline display mode – a bird-eye’s view having all data without scrolling
Intel® Xeon Phi™ profiling result with 288 threads
Region frames on the ruler
More green color – more efficient multithreaded execution
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
12
Details for a Region at source file level
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
13
Summary
• VTune Amplifier XE OpenMP analysis answers on customer’s questions
about performance on the language of OpenMP constructs
• The analysis is well-scalable for many-core systems with good balance of
tracing and sampling collection technologies
• The OpenMP analysis is “MPI-aware” that is helpful for inner-node hybrid
MPI + OpenMP application tuning
• The full feature set is available in VTune Amplifier XE 2018 with Intel
OpenMP and Intel MPI runtimes as a part of Intel® Parallel Studio XE 2018
14
Back-up
15
A Use Case: NPB CG imbalance improvement
• Step 1: Profiling original application – NPB CG (Class B)
There is a region with promising potential gain – go to Grid View for
more details
16
A Use Case: NPB CG imbalance improvement
• Step 1: Profiling original application – NPB CG (Class B)
• There are barriers in region – use experimental “/OpenMP Region/OpenMP Barrier..” grouping
• Imbalance on omp loop in cg.f, lines: 572 - 580, schedule is static
17
A Use Case: NPB CG imbalance improvement
• Step 2: Trying dynamic scheduling omp do schedule (dynamic)
Elapsed time increased – no improvement
Go to Grid View for details
18
A Use Case: NPB CG imbalance improvement
• Step 2: Trying dynamic scheduling “omp do schedule (dynamic)”
Default chunk size is 1 and it led to scheduling overhead
Let’s try bigger chunk size
19
A Use Case: NPB CG imbalance improvement
• Step 3: Trying dynamic schedule with chunk 20
Improved original elapsed time ~15%, eliminated imbalance
Intel Confedencial
20
Back-up
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
Copyright © 2017, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are
trademarks of Intel Corporation in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent
optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture
are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice.
Notice revision #20110804
21
OPENMP ANALYSIS IN VTUNE AMPLIFIER XE

More Related Content

Similar to OPENMP ANALYSIS IN VTUNE AMPLIFIER XE (20)

PDF
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Intel® Software
 
PDF
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
Ramkumar Nagappan
 
PDF
Intel python 2017
DESMOND YUEN
 
PDF
Python* Scalability in Production Environments
Intel® Software
 
PDF
Intel Distribution for Python - Scaling for HPC and Big Data
DESMOND YUEN
 
PDF
Intel NFVi Enabling Kit Demo/Lab
Michelle Holley
 
PPTX
Ready access to high performance Python with Intel Distribution for Python 2018
AWS User Group Bengaluru
 
PDF
Omni-Path Status, Upstreaming and Ongoing Work
inside-BigData.com
 
PDF
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
PDF
NFF-GO (YANFF) - Yet Another Network Function Framework
Michelle Holley
 
PDF
Intel Technologies for High Performance Computing
Intel Software Brasil
 
PDF
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Intel Software Brasil
 
PDF
Scaling python to_hpc_big_data-maidanov
Denis Nagorny
 
PDF
Accelerating AI from the Cloud to the Edge
Intel® Software
 
PDF
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY
 
PDF
Microsoft Build 2019- Intel AI Workshop
Intel® Software
 
PPTX
Intel precision medicine apr 2015
Ketan Paranjape
 
PDF
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel IT Center
 
PDF
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
inside-BigData.com
 
PPTX
Vasiliy Litvinov - Python Profiling
Sergey Arkhipov
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Intel® Software
 
HPC Facility Designing for next generation HPC systems Ram Nagappan Intel Final
Ramkumar Nagappan
 
Intel python 2017
DESMOND YUEN
 
Python* Scalability in Production Environments
Intel® Software
 
Intel Distribution for Python - Scaling for HPC and Big Data
DESMOND YUEN
 
Intel NFVi Enabling Kit Demo/Lab
Michelle Holley
 
Ready access to high performance Python with Intel Distribution for Python 2018
AWS User Group Bengaluru
 
Omni-Path Status, Upstreaming and Ongoing Work
inside-BigData.com
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
NFF-GO (YANFF) - Yet Another Network Function Framework
Michelle Holley
 
Intel Technologies for High Performance Computing
Intel Software Brasil
 
Getting the maximum performance in distributed clusters Intel Cluster Studio XE
Intel Software Brasil
 
Scaling python to_hpc_big_data-maidanov
Denis Nagorny
 
Accelerating AI from the Cloud to the Edge
Intel® Software
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY
 
Microsoft Build 2019- Intel AI Workshop
Intel® Software
 
Intel precision medicine apr 2015
Ketan Paranjape
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel IT Center
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
inside-BigData.com
 
Vasiliy Litvinov - Python Profiling
Sergey Arkhipov
 

More from DESMOND YUEN (20)

PDF
2022-AI-Index-Report_Master.pdf
DESMOND YUEN
 
PDF
Small Is the New Big
DESMOND YUEN
 
PDF
Intel® Blockscale™ ASIC Product Brief
DESMOND YUEN
 
PDF
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
DESMOND YUEN
 
PDF
Intel 2021 Product Security Report
DESMOND YUEN
 
PDF
How can regulation keep up as transformation races ahead? 2022 Global regulat...
DESMOND YUEN
 
PDF
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
DESMOND YUEN
 
PDF
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
DESMOND YUEN
 
PDF
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
DESMOND YUEN
 
PDF
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
DESMOND YUEN
 
PDF
An Introduction to Semiconductors and Intel
DESMOND YUEN
 
PDF
Changing demographics and economic growth bloom
DESMOND YUEN
 
PDF
Intel’s Impacts on the US Economy
DESMOND YUEN
 
PDF
2021 private networks infographics
DESMOND YUEN
 
PDF
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
DESMOND YUEN
 
PDF
Accelerate Your AI Today
DESMOND YUEN
 
PDF
Increasing Throughput per Node for Content Delivery Networks
DESMOND YUEN
 
PDF
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
DESMOND YUEN
 
PDF
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
DESMOND YUEN
 
PDF
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
DESMOND YUEN
 
2022-AI-Index-Report_Master.pdf
DESMOND YUEN
 
Small Is the New Big
DESMOND YUEN
 
Intel® Blockscale™ ASIC Product Brief
DESMOND YUEN
 
Cryptography Processing with 3rd Gen Intel Xeon Scalable Processors
DESMOND YUEN
 
Intel 2021 Product Security Report
DESMOND YUEN
 
How can regulation keep up as transformation races ahead? 2022 Global regulat...
DESMOND YUEN
 
NASA Spinoffs Help Fight Coronavirus, Clean Pollution, Grow Food, More
DESMOND YUEN
 
A Survey on Security and Privacy Issues in Edge Computing-Assisted Internet o...
DESMOND YUEN
 
PUTTING PEOPLE FIRST: ITS IS SMART COMMUNITIES AND CITIES
DESMOND YUEN
 
BUILDING AN OPEN RAN ECOSYSTEM FOR EUROPE
DESMOND YUEN
 
An Introduction to Semiconductors and Intel
DESMOND YUEN
 
Changing demographics and economic growth bloom
DESMOND YUEN
 
Intel’s Impacts on the US Economy
DESMOND YUEN
 
2021 private networks infographics
DESMOND YUEN
 
Transforming the Modern City with the Intel-based 5G Smart City Road Side Uni...
DESMOND YUEN
 
Accelerate Your AI Today
DESMOND YUEN
 
Increasing Throughput per Node for Content Delivery Networks
DESMOND YUEN
 
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
DESMOND YUEN
 
"Life and Learning After One-Hundred Years: Trust Is The Coin Of The Realm."
DESMOND YUEN
 
Telefónica views on the design, architecture, and technology of 4G/5G Open RA...
DESMOND YUEN
 
Ad

Recently uploaded (20)

PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
SAP GUI Installation Guide for Windows | Step-by-Step Setup for SAP Access
SAP Vista, an A L T Z E N Company
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
Ad

OPENMP ANALYSIS IN VTUNE AMPLIFIER XE

  • 2. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 2 Agenda • VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions about performance in the same language a program was written in • Concepts, metrics and technology inside • VTune Amplifier XE OpenMP Analysis Workflow • OpenMP analysis for hybrid MPI + OpenMP applications • Summary
  • 3. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 3 Typical customer questions on parallelization efficiency of OpenMP* applications • “I put pragmas but why my speed up is far from linear? • Parallelization inefficiency • “I ran my app on a system with bigger number of cores but it does not run as efficient as on smaller number” • Scalability issues • Is serial time of my application significant to prevent scaling? • How efficient is my OpenMP parallelization? • If not, how much gain can be achieved if I invest in fighting with the inefficiencies? • What OpenMP regions/loops/barriers are worth to tune? • What are the particular problems with them? Decomposing the questions
  • 4. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 4 If Performance Information is OpenMP “Unaware”.. OpenMP “unaware” views of VTune Amplifier XE Difficult to detect problems, customers might even “blame” runtime seeing CPU time consumption there not understanding that this is a result of parallelization inefficiency The questions are tied to OpenMP program structure – #pragmas Answers should be given the same way to be understandable and actionable
  • 5. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Is serial time of my application significant to prevent scaling? How efficient is my parallelization towards ideal parallel execution? How much theoretical gain I can get if invest in tuning? What regions are more perspective to invest? Links to grid view for more details on inefficiency 5 VTune Amplifier XE OpenMP* Analysis: answering on customers’ questions about performance in the same language a program was written in Overview on summary pane
  • 6. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6 Key of OpenMP awareness in VTune – Region based views and metrics Definition of Region Potential Gain (elapsed time metric) Lock spinning (sampling) Effective time (sampling) Imbalance (tracing) Actual Parallel Region Elapsed Time Estimated Ideal Time = Effective time / Number of Threads Potential Gain as a sum of inefficiencies normalized by num of threads Fork Join Scheduling (sampling) Work forking (sampling) Atomics (sampling)
  • 7. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 7 Technology under VTune Amplifier XE OpenMP Analysis Tracing of OpenMP constructions to provide region/work sharing context and precise imbalance on barriers • Provided to VTune by Intel OpenMP Runtime under profiling • Fork-Join points of parallel regions with number of working threads (Intel Compiler 14 and later) • OpenMP construct barrier points with imbalance info and OpenMP loop metadata • -parallel-source-info=2 compiler option to embed source file name to a region name • Looking at transition to OMPT, working with John M.-C. on interface enrichments for low overhead analysis Sampling to define and classify CPU time - user’s code and OpenMP RTL work • Classification: Locking, Scheduling, Work Forking
  • 8. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 8 VTune Amplifier XE OpenMP Analysis Workflow Start with HPC Performance Characterization analysis Explore CPU Utilization aspect metrics related to OpenMP in summary, grid, source view CL: >amplxe-cl –collect hpc-performance <my_app>
  • 9. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 9 Per Region Details in grid view: inefficiencies in wall time - classification and issue highlighting Imbalance on a loop barrier Dynamic scheduling overhead on a parallel loop
  • 10. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 10 Details in Grid View: Serial Time Hotspots Serial hotspots under Master Thread Time Filter to exclude initialization phase
  • 11. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 11 Details on Scalable Timeline Super tiny timeline display mode – a bird-eye’s view having all data without scrolling Intel® Xeon Phi™ profiling result with 288 threads Region frames on the ruler More green color – more efficient multithreaded execution
  • 12. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 12 Details for a Region at source file level
  • 13. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 13 Summary • VTune Amplifier XE OpenMP analysis answers on customer’s questions about performance on the language of OpenMP constructs • The analysis is well-scalable for many-core systems with good balance of tracing and sampling collection technologies • The OpenMP analysis is “MPI-aware” that is helpful for inner-node hybrid MPI + OpenMP application tuning • The full feature set is available in VTune Amplifier XE 2018 with Intel OpenMP and Intel MPI runtimes as a part of Intel® Parallel Studio XE 2018
  • 15. 15 A Use Case: NPB CG imbalance improvement • Step 1: Profiling original application – NPB CG (Class B) There is a region with promising potential gain – go to Grid View for more details
  • 16. 16 A Use Case: NPB CG imbalance improvement • Step 1: Profiling original application – NPB CG (Class B) • There are barriers in region – use experimental “/OpenMP Region/OpenMP Barrier..” grouping • Imbalance on omp loop in cg.f, lines: 572 - 580, schedule is static
  • 17. 17 A Use Case: NPB CG imbalance improvement • Step 2: Trying dynamic scheduling omp do schedule (dynamic) Elapsed time increased – no improvement Go to Grid View for details
  • 18. 18 A Use Case: NPB CG imbalance improvement • Step 2: Trying dynamic scheduling “omp do schedule (dynamic)” Default chunk size is 1 and it led to scheduling overhead Let’s try bigger chunk size
  • 19. 19 A Use Case: NPB CG imbalance improvement • Step 3: Trying dynamic schedule with chunk 20 Improved original elapsed time ~15%, eliminated imbalance
  • 21. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Copyright © 2017, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. Optimization Notice Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 21