Submitted by webmaster on
Title | Automated Data Analysis for Defining Performance Metrics from Raw Hardware Events |
Publication Type | Conference Paper |
Year of Publication | 2024 |
Authors | Barry, D., A. Danalis, and J. Dongarra |
Conference Name | 2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
Date Published | 2024-05 |
Publisher | IEEE |
Conference Location | San Francisco, CA, USA |
Abstract | Hardware performance events are at the center of application performance analysis. However, the sheer volume of low-level hardware events in modern HPC systems is overwhelming, making them difficult for users to comprehend. Understanding which concepts are monitored by performance events can be achieved using a two-step process. The first step is the execution of benchmarks designed to stress different hardware attributes in isolation. For every hardware event we wish to understand, we execute the benchmarks while measuring the event. In the second step, the data produced by executing the benchmarks is analyzed to identify what each event actually measures. In this paper, we present the methodology for analyzing the data from four previously developed benchmarks that stress key hardware attributes-CPU and GPU floating-point units, branching units, and data caches-to map low-level hardware events to high-level programming concepts. We present an automated methodology to express the event data in a well-understood, conceptual basis. We implement a specialized pivoting scheme for QR factorization to identify events that provide distinct information from each other, and techniques for addressing noise in event measurements. Lastly, we utilize least-squares regression to combine the chosen events to define particular metrics of interest. |
URL | https://ieeexplore.ieee.org/document/10596509/ |
DOI | 10.1109/IPDPSW63119.2024.00134 |