linux性能分析工具：perf入门一页纸

磨刀砍柴Debug

已于 2023-04-11 10:35:48 修改

阅读量1.8k

点赞数 4

CC 4.0 BY-SA版权

分类专栏：调试和性能工具文章标签： linux 运维服务器 c++

于 2022-07-13 21:30:00 首次发布

本文链接：https://blog.csdn.net/weixin_44531336/article/details/125767551

perf是Linux系统内置的性能分析工具，用于CPU剖析、性能事件统计、动态追踪等。本文介绍了perf的常用子命令，如stat、record、report和script，以及采样原理和火焰图的生成。通过示例展示了如何使用perf分析性能瓶颈，如通过perfrecord采样、perfreport分析采样数据，并结合perfscript和FlameGraph生成火焰图，以直观地呈现性能热点。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

软件开发中程序运行一段时间出现2类问题最头疼：

1.突然崩溃（调试问题）

Windows：可以用WinDbg分析core dump文件，使用应用程序验证器（appverif.exe）对程序进行全面检查

Linux：可以用gdb分析core dump文件，使用valgrind的相关工具（里面最常用的是memleak检查内存泄露）对程序进行全面检查

2.运行变慢或CPU等使用率居高不下（性能问题）

静态分析代码：效率太低，除非是某一个git分支的合入导致出现问题，这种小范围的改动还可以用静态分析

采样分析：windows下可以使用Xperf进行分析，linux下可以使用perf+火焰图进行分析

说明：下面的笔记是对perf的总结，方便自己后续复习，因此里面列举了很多常用的命令，显得篇幅有点长

重要的事情说3遍：

笔记更新换位置了！
笔记更新换位置了！
笔记更新换位置了！

新位置：我写的新系列，刚开始写没几天，后续文章主要在新地址更新，欢迎支持；写作不易，且看且珍惜（点击跳转，欢迎收藏）

1.perf-Overview

简介

perf是linux（2.6+）官方的分析器（profiler），是一个轻量化的采用和分析的内核级工具，位于tools/perf下的linux内核源码中，并且是基于内核perf_events的；是一个具有分析（profiling）、跟踪（ tracing）和脚本（scripting）功能的多工具集合

安装

使用lsb_release -a列举出版本信息，然后使用下面对应的命令进行安装

Cent OS/RHEL：yum install perf
Fedora：dnf install perf
SUSE：zypper install perf
Ubuntu：apt install linux-tools-common

常见使用场景

perf特别适合CPU分析（perf可以被用来剖析CPU的调用路径）：分析/采样CPU的堆栈跟踪、跟踪CPU调度器的行为、磁盘I/O等；通过对程序进行几次采样，通常就可以找到影响性能的线索

提示：perf的子命令跟git很像，学习perf就是学习子命令的使用方式，最常用的是stat、record、report和script；先对perf支持的常用子命令有一个概括的了解吧…

2.子命令-Overview

part 1.子命令框架

下面展示了最常用的perf子命令，包含子命令的输入来源和输出格式，其中还展示了配合stackcollapse-perf.pl和flamegraph.pl生成火焰图的流程

提示：学会这个图的每个细节，基本上就学会了perf了

在这里插入图片描述

part 2.支持的命令列表

下面直接列出了perf支持的常用子命令和基本说明，注意：perf list列举出的是支持的events列表

# perf
 usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]

 The most commonly used perf commands are:
   annotate        Read perf.data (created by perf record) and display annotated code	
   archive         Create archive with object files with build-ids found in perf.data file
   bench           General framework for benchmark suites
   buildid-cache   Manage build-id cache.
   buildid-list    List the buildids in a perf.data file
   c2c             Shared Data C2C/HITM Analyzer.
   config          Get and set variables in a configuration file.
   data            Data file related processing
   diff            Read perf.data files and display the differential profile
   evlist          List the event names in a perf.data file
   ftrace          simple wrapper for kernel's ftrace functionality
   inject          Filter to augment the events stream with additional information
   kallsyms        Searches running kernel for symbols
   kmem            Tool to trace/measure kernel memory properties
   kvm             Tool to trace/measure kvm guest os
   list            List all symbolic event types
   lock            Analyze lock events
   mem             Profile memory accesses
   record          Run a command and record its profile into perf.data
   report          Read perf.data (created by perf record) and display the profile
   sched           Tool to trace/measure scheduler properties (latencies)
   script          Read perf.data (created by perf record) and display trace output
   stat            Run a command and gather performance counter statistics
   test            Runs sanity tests.
   timechart       Tool to visualize total system behavior during a workload
   top             System profiling tool.
   version         display the version of perf binary
   probe           Define new dynamic tracepoints
   trace           strace inspired tool

 See 'perf help COMMAND' for more information on a specific command.

part 3.Option参数

下面使用-h列举出了perf stat子命令的Option参数，其他命令也类似

提示：不用全部记住，也不现实，记住常用的，不明白查询一下就可以

# perf stat -h

 Usage: perf stat [<options>] [<command>]

    -a, --all-cpus        system-wide collection from all CPUs
    -A, --no-aggr         disable CPU count aggregation
    -B, --big-num         print large numbers with thousands' separators
    -C, --cpu <cpu>       list of cpus to monitor in system-wide
    -c, --scale           scale/normalize counters
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (overhead is possible for values <= 100ms)
    -i, --no-inherit      child tasks do not inherit counters
    -M, --metrics <metric/metric group list>
                          monitor specified metrics or metric groups (separated by ,)
    -n, --null            null run - dont start any counters
    -o, --output <file>   output file name
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
    -T, --transaction     hardware transaction statistics
    -v, --verbose         be more verbose (show counter open errors, etc)
    -x, --field-separator <separator>

part 4.支持版本

下面列出了常用命令引入的版本信息，这也说明perf能支持的子命令是与linux内核版本有关系的；可以使用cat /proc/version查看linux的内核版本

在这里插入图片描述

part 5.子命令快速预览

下面是最常用命令的快速预览，基本上能涵盖80%的perf使用场景

#1.Listing Events
#列举名字中包含字符串 "block"的事件（events）
perf list block

#2.Counting Events
#按类型 计数 指定PID的系统调用
perf stat -e 'syscalls:sys_enter_*' -p PID
#计数整个系统的阻塞设备I/O事件，持续10秒
perf stat -e 'block:*' -a sleep 10

#3.剖析（Profiling）
#Sample on-CPU functions for the specified command, at 99 Hertz:
perf record -F 99 command
#Sample CPU stack traces (via frame pointers) system-wide ，at 99 Hertz，for 10 seconds:
perf record -F 99 -a -g sleep 10
#Sample CPU stack traces for the PID, using dwarf (debuginfo) to unwind stacks:
perf record -F 99 -p PID --call-graph dwarf sleep 10
#Record new process events via exec:
perf record -e sched:sched_process_exec -a

#4.Static Tracing
#Trace all context switches with stack traces for 1 second:
perf record -e sched:sched_switch -a -g sleep 1
#Trace all block requests, of size at least 64 Kbytes, until Ctrl-C:
perf record -e block:block_rq_issue --filter 'bytes >= 65536'

#5.Dynamic Tracing
#Add a probe for the kernel tcp_sendmsg() function entry (--add optional):
perf probe --add tcp_sendmsg
#Remove the tcp_sendmsg() tracepoint (or -d):
perf probe --del tcp_sendmsg

#6.Reporting
#Show perf.data as a text report, with data coalesced（合并） and counts and percentages:
perf report -n --stdio
#List all perf.data events, with data header (recommended):
perf script --header
#List all perf.data events, w