Automated Data Analysis for Defining Performance Metrics from Raw Hardware Events

TitleAutomated Data Analysis for Defining Performance Metrics from Raw Hardware Events
Publication TypeConference Paper
Year of Publication2024
AuthorsBarry, D., A. Danalis, and J. Dongarra
Conference Name2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Date Published2024-05
PublisherIEEE
Conference LocationSan Francisco, CA, USA
Abstract

Hardware performance events are at the center of application performance analysis. However, the sheer volume of low-level hardware events in modern HPC systems is overwhelming, making them difficult for users to comprehend. Understanding which concepts are monitored by performance events can be achieved using a two-step process. The first step is the execution of benchmarks designed to stress different hardware attributes in isolation. For every hardware event we wish to understand, we execute the benchmarks while measuring the event. In the second step, the data produced by executing the benchmarks is analyzed to identify what each event actually measures. In this paper, we present the methodology for analyzing the data from four previously developed benchmarks that stress key hardware attributes-CPU and GPU floating-point units, branching units, and data caches-to map low-level hardware events to high-level programming concepts. We present an automated methodology to express the event data in a well-understood, conceptual basis. We implement a specialized pivoting scheme for QR factorization to identify events that provide distinct information from each other, and techniques for addressing noise in event measurements. Lastly, we utilize least-squares regression to combine the chosen events to define particular metrics of interest.

URLhttps://ieeexplore.ieee.org/document/10596509/
DOI10.1109/IPDPSW63119.2024.00134
Project Tags: 
External Publication Flag: