SlideShare a Scribd company logo
Linux 
Performance 
Tools 
Brendan 
Gregg 
Senior 
Performance 
Architect 
Performance 
Engineering 
Team 
bgregg@ne8lix.com 
@brendangregg 
Oct, 2014
A 
quick 
tour 
of 
many 
tools…
• Massive 
AWS 
EC2 
Linux 
cloud 
– Tens 
of 
thousands 
of 
instances 
– Autoscale 
by 
~3k 
each 
day 
– CentOS 
and 
Ubuntu 
• FreeBSD 
for 
content 
delivery 
– Approx 
33% 
of 
US 
Internet 
traffic 
at 
night 
• Performance 
is 
criRcal 
– Customer 
saRsfacRon: 
>50M 
subscribers 
– $$$ 
price/performance 
– Develop 
tools 
for 
cloud-­‐wide 
analysis; 
use 
server 
tools 
as 
needed 
• Just 
launched 
in 
Europe!
Brendan 
Gregg 
• Senior 
Performance 
Architect, 
Ne8lix 
– Linux 
and 
FreeBSD 
performance 
– Performance 
Engineering 
team 
(@coburnw) 
• Recent 
work: 
– Linux 
perf-­‐tools, 
using 
crace 
& 
perf_events 
– Systems 
Performance, 
PrenRce 
Hall 
• Previous 
work 
includes: 
– USE 
Method, 
flame 
graphs, 
uRlizaRon 
& 
latency 
heat 
maps, 
DTrace 
tools, 
ZFS 
L2ARC 
• Twijer 
@brendangregg 
(these 
slides)
Agenda 
• Methodologies 
& 
Tools 
• Tool 
Types: 
– Observability 
• Basic 
• Intermediate 
• Advanced 
– Benchmarking 
– Tuning 
– StaRc 
• Tracing 
Aim: 
to 
show 
what 
can 
be 
done 
Knowing 
that 
something 
can 
be 
done 
is 
more 
important 
than 
knowing 
how 
to 
do 
it.
Methodologies 
& 
Tools
Methodologies 
& 
Tools 
• There 
are 
dozens 
of 
performance 
tools 
for 
Linux 
– Packages: 
sysstat, 
procps, 
coreuRls, 
… 
– Commercial 
products 
• Methodologies 
can 
provide 
guidance 
for 
choosing 
and 
using 
tools 
effecRvely
An3-­‐Methodologies 
• The 
lack 
of 
a 
deliberate 
methodology… 
• Street 
Light 
AnR-­‐Method: 
– 1. 
Pick 
observability 
tools 
that 
are 
• Familiar 
• Found 
on 
the 
Internet, 
or 
at 
random 
– 2. 
Run 
tools 
– 3. 
Look 
for 
obvious 
issues 
• Drunk 
Man 
AnR-­‐Method: 
– Tune 
things 
at 
random 
unRl 
the 
problem 
goes 
away
Methodologies 
• For 
example, 
the 
USE 
Method: 
– For 
every 
resource, 
check: 
• URlizaRon 
• SaturaRon 
• Errors 
• 5 
Whys: 
– Ask 
“why?” 
5 
Rmes 
• Other 
methods 
include: 
– Workload 
characterizaRon, 
drill-­‐down 
analysis, 
event 
tracing, 
baseline 
stats, 
staRc 
performance 
tuning, 
… 
• Start 
with 
the 
quesRons, 
then 
find 
the 
tools
Command 
Line 
Tools 
• Useful 
to 
study 
even 
if 
you 
never 
use 
them: 
GUIs 
and 
commercial 
products 
ocen 
use 
the 
same 
interfaces 
Kernel 
/proc, 
/sys, 
… 
$ vmstat 1! 
procs -----------memory---------- ---swap-- …! 
r b swpd free buff cache si so …! 
9 0 0 29549320 29252 9299060 0 …! 
2 0 0 29547876 29252 9299332 0 …! 
4 0 0 29548124 29252 9299460 0 …! 
5 0 0 29548840 29252 9299592 0 …!
Tool 
Types 
Type 
Characteris.c 
Observability 
Watch 
acRvity. 
Safe, 
usually, 
depending 
on 
resource 
overhead. 
Benchmarking 
Load 
test. 
CauRon: 
producRon 
tests 
can 
cause 
issues 
due 
to 
contenRon. 
Tuning 
Change. 
Danger: 
changes 
could 
hurt 
performance, 
now 
or 
later 
with 
load. 
StaRc 
Check 
configuraRon. 
Should 
be 
safe.
Observability 
Tools
How 
do 
you 
measure 
these?
Observability 
Tools: 
Basic 
• upRme 
• top 
(or 
htop) 
• ps 
• vmstat 
• iostat 
• mpstat 
• free
upRme 
• One 
way 
to 
print 
load 
averages: 
$ uptime! 
07:42:06 up 8:16, 1 user, load average: 2.27, 2.84, 2.91! 
• A 
measure 
of 
resource 
demand: 
CPUs 
+ 
disks 
– Other 
OSes 
only 
show 
CPUs: 
easier 
to 
interpret 
• ExponenRally-­‐damped 
moving 
averages 
with 
Rme 
constants 
of 
1, 
5, 
and 
15 
minutes 
– Historic 
trend 
without 
the 
line 
graph 
• Load 
> 
# 
of 
CPUs, 
may 
mean 
CPU 
saturaRon 
– Don’t 
spend 
more 
than 
5 
seconds 
studying 
these
top 
(or 
htop) 
• System 
and 
per-­‐process 
interval 
summary: 
$ top - 18:50:26 up 7:43, 1 user, load average: 4.11, 4.91, 5.22! 
Tasks: 209 total, 1 running, 206 sleeping, 0 stopped, 2 zombie! 
Cpu(s): 47.1%us, 4.0%sy, 0.0%ni, 48.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st! 
Mem: 70197156k total, 44831072k used, 25366084k free, 36360k buffers! 
Swap: 0k total, 0k used, 0k free, 11873356k cached! 
! 
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5738 apiprod 20 0 62.6g 29g 352m S 417 44.2 2144:15 java 1386 apiprod 20 0 17452 1388 964 R 0 0.0 0:00.02 top 1 root 20 0 24340 2272 1340 S 0 0.0 0:01.51 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd […]! 
• %CPU 
is 
summed 
across 
all 
CPUs 
• Can 
miss 
short-­‐lived 
processes 
(atop 
won’t) 
• Can 
consume 
noRceable 
CPU 
to 
read 
/proc
htop
ps 
• Process 
status 
lisRng 
(eg, 
“ASCII 
art 
forest”): 
$ ps -ef f! 
UID PID PPID C STIME TTY STAT TIME CMD! 
[…]! 
root 4546 1 0 11:08 ? Ss 0:00 /usr/sbin/sshd -D! 
root 28261 4546 0 17:24 ? Ss 0:00 _ sshd: prod [priv]! 
prod 28287 28261 0 17:24 ? S 0:00 _ sshd: prod@pts/0 ! 
prod 28288 28287 0 17:24 pts/0 Ss 0:00 _ -bash! 
prod 3156 28288 0 19:15 pts/0 R+ 0:00 _ ps -ef f! 
root 4965 1 0 11:08 ? Ss 0:00 /bin/sh /usr/bin/svscanboot! 
root 4969 4965 0 11:08 ? S 0:00 _ svscan /etc/service! 
[…]! 
• Custom 
fields: 
$ ps -eo user,sz,rss,minflt,majflt,pcpu,args! 
USER SZ RSS MINFLT MAJFLT %CPU COMMAND! 
root 6085 2272 11928 24 0.0 /sbin/init! 
[…]!
vmstat 
• Virtual 
memory 
staRsRcs 
and 
more: 
$ vmstat –Sm 1! 
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----! 
r b swpd free buff cache si so bi bo in cs us sy id wa! 
8 0 0 1620 149 552 0 0 1 179 77 12 25 34 0 0! 
7 0 0 1598 149 552 0 0 0 0 205 186 46 13 0 0! 
8 0 0 1617 149 552 0 0 0 8 210 435 39 21 0 0! 
8 0 0 1589 149 552 0 0 0 0 218 219 42 17 0 0! 
[…]! 
• USAGE: 
vmstat 
[interval 
[count]] 
• First 
output 
line 
has 
some 
summary 
since 
boot 
values 
(should 
be 
all; 
parRal 
is 
confusing) 
• High 
level 
CPU 
summary. 
“r” 
is 
runnable 
tasks.
iostat 
• Block 
I/O 
(disk) 
stats. 
1st 
output 
is 
since 
boot. 
$ iostat -xmdz 1! 
! 
Linux 3.13.0-29 (db001-eb883efa) 08/18/2014 _x86_64_ (16 CPU)! 
! 
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s  ...! 
xvda 0.00 0.00 0.00 0.00 0.00 0.00 / ...! 
xvdb 213.00 0.00 15299.00 0.00 338.17 0.00  ...! 
xvdc 129.00 0.00 15271.00 3.00 336.65 0.01 / ...! 
md0 0.00 0.00 31082.00 3.00 678.45 0.01  ...! 
• Very 
useful 
set 
of 
stats 
...  avgqu-sz await r_await w_await svctm %util! 
... / 0.00 0.00 0.00 0.00 0.00 0.00! 
...  126.09 8.22 8.22 0.00 0.06 86.40! 
... / 99.31 6.47 6.47 0.00 0.06 86.00! 
...  0.00 0.00 0.00 0.00 0.00 0.00! 
Workload 
ResulRng 
Performance
mpstat 
• MulR-­‐processor 
staRsRcs, 
per-­‐CPU: 
$ mpstat –P ALL 1! 
[…]! 
08:06:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle! 
08:06:44 PM all 53.45 0.00 3.77 0.00 0.00 0.39 0.13 0.00 42.26! 
08:06:44 PM 0 49.49 0.00 3.03 0.00 0.00 1.01 1.01 0.00 45.45! 
08:06:44 PM 1 51.61 0.00 4.30 0.00 0.00 2.15 0.00 0.00 41.94! 
08:06:44 PM 2 58.16 0.00 7.14 0.00 0.00 0.00 1.02 0.00 33.67! 
08:06:44 PM 3 54.55 0.00 5.05 0.00 0.00 0.00 0.00 0.00 40.40! 
08:06:44 PM 4 47.42 0.00 3.09 0.00 0.00 0.00 0.00 0.00 49.48! 
08:06:44 PM 5 65.66 0.00 3.03 0.00 0.00 0.00 0.00 0.00 31.31! 
08:06:44 PM 6 50.00 0.00 2.08 0.00 0.00 0.00 0.00 0.00 47.92! 
[…]! 
• Look 
for 
unbalanced 
workloads, 
hot 
CPUs.
free 
• Main 
memory 
usage: 
• buffers: 
block 
device 
I/O 
cache 
• cached: 
virtual 
page 
cache 
$ free -m! 
total used free shared buffers cached! 
Mem: 3750 1111 2639 0 147 527! 
-/+ buffers/cache: 436 3313! 
Swap: 0 0 0!
Observability 
Tools: 
Basic
Observability 
Tools: 
Intermediate 
• strace 
• tcpdump 
• netstat 
• nicstat 
• pidstat 
• swapon 
• lsof 
• sar 
(and 
collectl, 
dstat, 
etc.)
strace 
• System 
call 
tracer: 
$ strace –tttT –p 313! 
1408393285.779746 getgroups(0, NULL) = 1 <0.000016>! 
1408393285.779873 getgroups(1, [0]) = 1 <0.000015>! 
1408393285.780797 close(3) = 0 <0.000016>! 
1408393285.781338 write(1, "LinuxCon 2014!n", 15LinuxCon 2014!! 
) = 15 <0.000048>! 
• Eg, 
-­‐jt: 
Rme 
(us) 
since 
epoch; 
-­‐T: 
syscall 
Rme 
(s) 
• Translates 
syscall 
args 
– Very 
helpful 
for 
solving 
system 
usage 
issues 
• Currently 
has 
massive 
overhead 
(ptrace 
based) 
– Can 
slow 
the 
target 
by 
> 
100x. 
Use 
extreme 
cauRon.
tcpdump 
• Sniff 
network 
packets 
for 
post 
analysis: 
$ tcpdump -i eth0 -w /tmp/out.tcpdump! 
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes! 
^C7985 packets captured! 
8996 packets received by filter! 
1010 packets dropped by kernel! 
# tcpdump -nr /tmp/out.tcpdump | head ! 
reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet) ! 
20:41:05.038437 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 18...! 
20:41:05.038533 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 48...! 
20:41:05.038584 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 96...! 
[…]! 
• Study 
packet 
sequences 
with 
Rmestamps 
(us) 
• CPU 
overhead 
opRmized 
(socket 
ring 
buffers), 
but 
can 
sRll 
be 
significant. 
Use 
cauRon.
netstat 
• Various 
network 
protocol 
staRsRcs 
using 
-­‐s: 
• A 
mulR-­‐tool: 
-­‐i: 
interface 
stats 
-­‐r: 
route 
table 
default: 
list 
conns 
• netstat 
-­‐p: 
shows 
process 
details! 
• Per-­‐second 
interval 
with 
-­‐c 
$ netstat –s! 
[…]! 
Tcp:! 
736455 active connections openings! 
176887 passive connection openings! 
33 failed connection attempts! 
1466 connection resets received! 
3311 connections established! 
91975192 segments received! 
180415763 segments send out! 
223685 segments retransmited! 
2 bad segments received.! 
39481 resets sent! 
[…]! 
TcpExt:! 
12377 invalid SYN cookies received! 
2982 delayed acks sent! 
[…]!
nicstat 
• Network 
interface 
stats, 
iostat-­‐like 
output: 
$ ./nicstat 1! 
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 
21:21:43 lo 823.0 823.0 171.5 171.5 4915.4 4915.4 0.00 0.00! 
21:21:43 eth0 5.53 1.74 15.11 12.72 374.5 139.8 0.00 0.00! 
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 
21:21:44 lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00! 
21:21:44 eth0 20.42 3394.1 355.8 85.94 58.76 40441.3 0.00 0.00! 
Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 
21:21:45 lo 1409.1 1409.1 327.9 327.9 4400.8 4400.8 0.00 0.00! 
21:21:45 eth0 75.12 4402.3 1398.9 1513.2 54.99 2979.1 0.00 0.00! 
[…]! 
• Check 
network 
throughput 
and 
interface 
%uRl 
• I 
wrote 
this 
years 
ago; 
Tim 
Cook 
ported 
to 
Linux
pidstat 
• Very 
useful 
process 
stats. 
eg, 
by-­‐thread, 
disk 
I/O: 
$ pidstat -t 1! 
Linux 3.2.0-54 (db002-91befe03) !08/18/2014 !_x86_64_!(8 CPU)! 
! 
08:57:52 PM TGID TID %usr %system %guest %CPU CPU Command! 
08:57:54 PM 5738 - 484.75 39.83 0.00 524.58 1 java! 
08:57:54 PM - 5817 0.85 0.00 0.00 0.85 2 |__java! 
08:57:54 PM - 5931 1.69 1.69 0.00 3.39 4 |__java! 
08:57:54 PM - 5981 0.85 0.00 0.00 0.85 7 |__java! 
08:57:54 PM - 5990 0.85 0.00 0.00 0.85 4 |__java! 
[…]! 
$ pidstat -d 1! 
[…]! 
08:58:27 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command! 
08:58:28 PM 5738 0.00 815.69 0.00 java! 
[…]! 
• I 
usually 
prefer 
this 
over 
top(1)
swapon 
• Show 
swap 
device 
usage: 
$ swapon -s! 
Filename Type Size Used Priority! 
/dev/sda3 partition 5245212 284 -1! 
• If 
you 
have 
swap 
enabled…
lsof 
• More 
a 
debug 
tool, 
lsof(8) 
shows 
file 
descriptor 
usage, 
which 
for 
some 
apps, 
equals 
current 
acRve 
network 
connecRons: 
# lsof -iTCP -sTCP:ESTABLISHED! 
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME! 
sshd 755 root 3r IPv4 13576887 0t0 TCP bgregg-test-i-f106:ssh->prod100.netflix.com: 
15241 (ESTABLISHED)! 
platforms 2614 app1 8u IPv4 14618 0t0 TCP localhost:33868->localhost:5433 (ESTABLISHED)! 
postgres 2648 app1 7u IPv4 14619 0t0 TCP localhost:5433->localhost:33868 (ESTABLISHED)! 
epic_plug 2857 app1 7u IPv4 15678 0t0 TCP localhost:33885->localhost:5433 (ESTABLISHED)! 
postgres 2892 app1 7u IPv4 15679 0t0 TCP localhost:5433->localhost:33885 (ESTABLISHED)! 
[…]! 
• I’d 
prefer 
to: 
echo /proc/PID/fd | wc -l!
sar 
• System 
AcRvity 
Reporter. 
Many 
stats, 
eg: 
$ sar -n TCP,ETCP,DEV 1! 
Linux 3.2.55 (test-e4f1a80b) !08/18/2014 !_x86_64_!(8 CPU)! 
! 
09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s! 
09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00! 
09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00! 
! 
09:10:43 PM active/s passive/s iseg/s oseg/s! 
09:10:44 PM 21.00 4.00 4107.00 22511.00! 
! 
09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s! 
09:10:44 PM 0.00 0.00 36.00 0.00 1.00! 
[…]! 
• Archive 
or 
live 
mode: 
(interval 
[count]) 
• Well 
designed. 
Header 
naming 
convenRon, 
logical 
groups: 
TCP, 
ETCP, 
DEV, 
EDEV, 
…
Observability: 
sar
Other 
Tools 
• You 
may 
also 
use 
collectl, 
atop, 
dstat, 
or 
another 
measure-­‐all 
tool 
• The 
tool 
isn’t 
important 
• It’s 
important 
to 
have 
a 
way 
to 
measure 
everything 
you 
want 
• In 
cloud 
environments, 
you 
are 
probably 
using 
a 
monitoring 
product, 
developed 
in-­‐house 
or 
commercial. 
Same 
method 
applies…
How 
does 
your 
monitoring 
tool 
measure 
these?
Observability 
Tools: 
Intermediate
Advanced 
Observability 
Tools 
• Misc: 
– ltrace, 
ss, 
iptraf, 
ethtool, 
snmpget, 
lldptool, 
iotop, 
blktrace, 
slabtop, 
/proc, 
pcstat 
• CPU 
Performance 
Counters: 
– perf_events, 
Rptop, 
rdmsr 
• Advanced 
Tracers: 
– perf_events, 
crace, 
eBPF, 
SystemTap, 
ktap, 
LTTng, 
dtrace4linux, 
sysdig 
• Some 
selected 
demos…
ss 
• More 
socket 
staRsRcs: 
$ ss -mop! 
State Recv-Q Send-Q Local Address:Port Peer Address:Port ! 
CLOSE-WAIT 1 0 127.0.0.1:42295 127.0.0.1:28527 
users:(("apacheLogParser",2702,3))! 
! mem:(r1280,w0,f2816,t0)! 
ESTAB 0 0 127.0.0.1:5433 127.0.0.1:41312 
timer:(keepalive,36min,0) users:(("postgres",2333,7))! 
! mem:(r0,w0,f0,t0)! 
[…]! 
$ ss –i! 
State Recv-Q Send-Q Local Address:Port Peer Address:Port ! 
CLOSE-WAIT 1 0 127.0.0.1:42295 127.0.0.1:28527 ! 
cubic wscale:6,6 rto:208 rtt:9/6 ato:40 cwnd:10 send 145.6Mbps rcv_space:32792! 
ESTAB 0 0 10.144.107.101:ssh 10.53.237.72:4532 ! 
cubic wscale:4,6 rto:268 rtt:71.5/3 ato:40 cwnd:10 send 1.5Mbps rcv_rtt:72 
rcv_space:14480! 
[…]!
iptraf
iotop 
• Block 
device 
I/O 
(disk) 
by 
process: 
$ iotop! 
Total DISK READ: 50.47 M/s | Total DISK WRITE: 59.21 M/s! 
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND ! 
959 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [flush-202:1]! 
6641 be/4 root 50.47 M/s 82.60 M/s 0.00 % 32.51 % java –Dnop –X! 
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init! 
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]! 
3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]! 
4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0]! 
5 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0]! 
6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]! 
• Needs 
kernel 
support 
enabled 
– CONFIG_TASK_IO_ACCOUNTING 
[…]!
slabtop 
• Kernel 
slab 
allocator 
memory 
usage: 
$ slabtop! 
Active / Total Objects (% used) : 4692768 / 4751161 (98.8%)! 
Active / Total Slabs (% used) : 129083 / 129083 (100.0%)! 
Active / Total Caches (% used) : 71 / 109 (65.1%)! 
Active / Total Size (% used) : 729966.22K / 738277.47K (98.9%)! 
Minimum / Average / Maximum Object : 0.01K / 0.16K / 8.00K! 
! 
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME ! 
3565575 3565575 100% 0.10K 91425 39 365700K buffer_head! 
314916 314066 99% 0.19K 14996 21 59984K dentry! 
184192 183751 99% 0.06K 2878 64 11512K kmalloc-64! 
138618 138618 100% 0.94K 4077 34 130464K xfs_inode! 
138602 138602 100% 0.21K 3746 37 29968K xfs_ili! 
102116 99012 96% 0.55K 3647 28 58352K radix_tree_node! 
97482 49093 50% 0.09K 2321 42 9284K kmalloc-96! 
22695 20777 91% 0.05K 267 85 1068K shared_policy_node! 
21312 21312 100% 0.86K 576 37 18432K ext4_inode_cache! 
16288 14601 89% 0.25K 509 32 4072K kmalloc-256! 
[…]!
pcstat 
• Show 
page 
cache 
residency 
by 
file: 
# ./pcstat data0*! 
|----------+----------------+------------+-----------+---------|! 
| Name | Size | Pages | Cached | Percent |! 
|----------+----------------+------------+-----------+---------|! 
| data00 | 104857600 | 25600 | 25600 | 100.000 |! 
| data01 | 104857600 | 25600 | 25600 | 100.000 |! 
| data02 | 104857600 | 25600 | 4080 | 015.938 |! 
| data03 | 104857600 | 25600 | 25600 | 100.000 |! 
| data04 | 104857600 | 25600 | 16010 | 062.539 |! 
| data05 | 104857600 | 25600 | 0 | 000.000 |! 
|----------+----------------+------------+-----------+---------|! 
• Uses 
the 
mincore(2) 
syscall. 
Useful 
for 
database 
performance 
analysis.
perf_events 
(counters) 
• Performance 
Monitoring 
Counters 
(PMCs): 
$ perf list | grep –i hardware! 
cpu-cycles OR cycles [Hardware event]! 
stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]! 
stalled-cycles-backend OR idle-cycles-backend [Hardware event]! 
instructions [Hardware event]! 
• IdenRfy 
CPU 
cycle 
breakdowns, 
esp. 
stall 
types 
– PMCs 
not 
enabled 
by-­‐default 
in 
clouds 
(yet) 
– Can 
be 
Rme-­‐consuming 
to 
use 
(CPU 
manuals) 
• Use 
flame 
graphs 
to 
visualize 
sampled 
stack 
traces 
[…]! 
branch-misses [Hardware event]! 
bus-cycles [Hardware event]! 
L1-dcache-loads [Hardware cache event]! 
L1-dcache-load-misses [Hardware cache event]! 
[…]! 
rNNN (see 'perf list --help' on how to encode it) [Raw hardware event … ! 
mem:<addr>[:access] [Hardware breakpoint]!
perf_events 
CPU 
Flame 
Graph 
Broken 
Java 
stacks 
(missing 
frame 
pointer) 
Kernel 
TCP/IP 
GC 
Idle 
Time 
thread 
Locks 
epoll
Rptop 
• IPC 
by 
process, 
%MISS, 
%BUS 
• Needs 
some 
love. 
perfmon2 
library 
integraRon? 
• SRll 
can’t 
use 
it 
in 
clouds 
yet 
(needs 
PMCs 
enabled)
rdmsr 
• Model 
Specific 
Registers 
(MSRs), 
unlike 
PMCs, 
can 
be 
read 
by 
default 
in 
Xen 
guests 
– Timestamp 
clock, 
temp, 
power, 
… 
– Use 
rdmsr(1) 
from 
the 
msr-­‐tools 
package 
to 
read 
them 
– Uses 
include 
(hjps://github.com/brendangregg/msr-­‐cloud-­‐tools): 
ec2-guest# ./showboost! 
[...]! 
TIME C0_MCYC C0_ACYC UTIL RATIO MHz! 
06:11:35 6428553166 7457384521 51% 116% 2900! 
06:11:40 6349881107 7365764152 50% 115% 2899! 
06:11:45 6240610655 7239046277 49% 115% 2899! 
[...]! 
ec2-guest# ./cputemp 1! 
CPU1 CPU2 CPU3 CPU4! 
61 61 60 59! 
60 61 60 60! 
[...]! 
Real 
CPU 
MHz 
CPU 
Temperature
More 
Advanced 
Tools… 
• Some 
others 
worth 
menRoning: 
Tool 
Descrip.on 
ltrace 
Library 
call 
tracer 
ethtool 
Mostly 
interface 
tuning; 
some 
stats 
snmpget 
SNMP 
network 
host 
staRsRcs 
lldptool 
Can 
get 
LLDP 
broadcast 
stats 
blktrace 
Block 
I/O 
event 
tracer 
/proc 
Many 
raw 
kernel 
counters 
pmu-­‐tools 
On-­‐ 
and 
off-­‐core 
CPU 
counter 
tools
Advanced 
Tracers 
• Many 
opRons 
on 
Linux: 
– perf_events, 
crace, 
eBPF, 
SystemTap, 
ktap, 
LTTng, 
dtrace4linux, 
sysdig 
• Most 
can 
do 
staRc 
and 
dynamic 
tracing 
– StaRc: 
pre-­‐defined 
events 
(tracepoints) 
– Dynamic: 
instrument 
any 
socware 
(kprobes, 
uprobes). 
Custom 
metrics 
on-­‐demand. 
Catch 
all. 
• Many 
are 
in-­‐development. 
– I’ll 
summarize 
their 
state 
later…
Linux 
Observability 
Tools
Linux 
Observability 
Tools
Benchmarking 
Tools
Benchmarking 
Tools 
• MulR: 
– UnixBench, 
lmbench, 
sysbench, 
perf 
bench 
• FS/disk: 
– dd, 
hdparm, 
fio 
• App/lib: 
– ab, 
wrk, 
jmeter, 
openssl 
• Networking: 
– ping, 
hping3, 
iperf, 
jcp, 
traceroute, 
mtr, 
pchar
AcRve 
Benchmarking 
• Most 
benchmarks 
are 
misleading 
or 
wrong 
– You 
benchmark 
A, 
but 
actually 
measure 
B, 
and 
conclude 
that 
you 
measured 
C 
• AcRve 
Benchmarking 
1. Run 
the 
benchmark 
for 
hours 
2. While 
running, 
analyze 
and 
confirm 
the 
performance 
limiter 
using 
observability 
tools 
• We 
just 
covered 
those 
tools 
– 
use 
them!
lmbench 
• CPU, 
memory, 
and 
kernel 
micro-­‐benchmarks 
• Eg, 
memory 
latency 
by 
stride 
size: 
$ lat_mem_rd 100m 128 > out.latencies! 
some R processing…! 
L1 
cache 
L2 
cache 
Main 
Memory 
L3 
cache
fio 
• FS 
or 
disk 
I/O 
micro-­‐benchmarks 
$ fio --name=seqwrite --rw=write --bs=128k --size=122374m! 
[…]! 
seqwrite: (groupid=0, jobs=1): err= 0: pid=22321! 
write: io=122374MB, bw=840951KB/s, iops=6569 , runt=149011msec! 
clat (usec): min=41 , max=133186 , avg=148.26, stdev=1287.17! 
lat (usec): min=44 , max=133188 , avg=151.11, stdev=1287.21! 
bw (KB/s) : min=10746, max=1983488, per=100.18%, avg=842503.94, 
stdev=262774.35! 
cpu : usr=2.67%, sys=43.46%, ctx=14284, majf=1, minf=24! 
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%! 
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%! 
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%! 
issued r/w/d: total=0/978992/0, short=0/0/0! 
lat (usec): 50=0.02%, 100=98.30%, 250=1.06%, 500=0.01%, 750=0.01%! 
lat (usec): 1000=0.01%! 
lat (msec): 2=0.01%, 4=0.01%, 10=0.25%, 20=0.29%, 50=0.06%! 
lat (msec): 100=0.01%, 250=0.01%! 
• Results 
include 
basic 
latency 
distribuRon
pchar 
• Traceroute 
with 
bandwidth 
per 
hop! 
$ pchar 10.71.83.1! 
[…]! 
4: 10.110.80.1 (10.110.80.1)! 
Partial loss: 0 / 5 (0%)! 
Partial char: rtt = 9.351109 ms, (b = 0.004961 ms/B), r2 = 0.184105! 
stddev rtt = 4.967992, stddev b = 0.006029! 
Partial queueing: avg = 0.000000 ms (0 bytes)! 
Hop char: rtt = --.--- ms, bw = 1268.975773 Kbps! 
Hop queueing: avg = 0.000000 ms (0 bytes)! 
5: 10.193.43.181 (10.193.43.181)! 
Partial loss: 0 / 5 (0%)! 
Partial char: rtt = 25.461597 ms, (b = 0.011934 ms/B), r2 = 0.228707! 
stddev rtt = 10.426112, stddev b = 0.012653! 
Partial queueing: avg = 0.000000 ms (0 bytes)! 
Hop char: rtt = 16.110487 ms, bw = 1147.210397 Kbps! 
Hop queueing: avg = 0.000000 ms (0 bytes)! 
[…]! 
• Needs 
love. 
Based 
on 
pathchar 
(Linux 
2.0.30).
Benchmarking 
Tools
Tuning 
Tools
Tuning 
Tools 
• Generic 
interfaces: 
– sysctl, 
/sys 
• Many 
areas 
have 
custom 
tuning 
tools: 
– ApplicaRons: 
their 
own 
config 
– CPU/scheduler: 
nice, 
renice, 
taskset, 
ulimit, 
chcpu 
– Storage 
I/O: 
tune2fs, 
ionice, 
hdparm, 
blockdev, 
… 
– Network: 
ethtool, 
tc, 
ip, 
route 
– Dynamic 
patching: 
stap, 
kpatch
Tuning 
Methods 
• ScienRfic 
Method: 
1. QuesRon 
2. Hypothesis 
3. PredicRon 
4. Test 
5. Analysis 
• Any 
observa3onal 
or 
benchmarking 
tests 
you 
can 
try 
before 
tuning? 
• Consider 
risks, 
and 
see 
previous 
tools
Tuning 
Tools
StaRc 
Tools
StaRc 
Tools 
• StaRc 
Performance 
Tuning: 
check 
the 
staRc 
state 
and 
configuraRon 
of 
the 
system 
– CPU 
types 
– Storage 
devices 
– File 
system 
capacity 
– File 
system 
and 
volume 
configuraRon 
– Route 
table 
– State 
of 
hardware 
• What 
can 
be 
checked 
on 
a 
system 
without 
load
StaRc 
Tools
Tracing
Tracing 
Frameworks: 
Tracepoints 
• StaRcally 
placed 
at 
logical 
places 
in 
the 
kernel 
• Provides 
key 
event 
details 
as 
a 
“format” 
string
Tracing 
Frameworks: 
+ 
probes 
• kprobes: 
dynamic 
kernel 
tracing 
– funcRon 
calls, 
returns, 
line 
numbers 
• uprobes: 
dynamic 
user-­‐level 
tracing
Tracing 
Tools 
• OpRons: 
– crace 
– perf_events 
– eBPF 
– SystemTap 
– ktap 
– LTTng 
– dtrace4linux 
– Oracle 
Linux 
DTrace 
– sysdig 
• Too 
many 
choices, 
and 
many 
sRll 
in-­‐development
Imagine 
Linux 
with 
Tracing 
• With 
a 
programmable 
tracer, 
high 
level 
tools 
can 
be 
wrijen, 
such 
as: 
– iosnoop 
– iolatency 
– opensnoop 
– …
iosnoop 
• Block 
I/O 
(disk) 
events 
with 
latency: 
# ./iosnoop –ts! 
Tracing block I/O. Ctrl-C to end.! 
STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms! 
5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62! 
5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42! 
5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48! 
5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43! 
[…]! 
# ./iosnoop –h! 
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]! 
-d device # device string (eg, "202,1)! 
-i iotype # match type (eg, '*R*' for all reads)! 
-n name # process name to match on I/O issue! 
-p PID # PID to match on I/O issue! 
-Q # include queueing time in LATms! 
-s # include start time of I/O (s)! 
-t # include completion time of I/O (s)! 
-h # this usage message! 
duration # duration seconds, and use buffers! 
[…]!
iolatency 
• Block 
I/O 
(disk) 
latency 
distribuRons: 
# ./iolatency ! 
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.! 
! 
>=(ms) .. <(ms) : I/O |Distribution |! 
0 -> 1 : 2104 |######################################|! 
1 -> 2 : 280 |###### |! 
2 -> 4 : 2 |# |! 
4 -> 8 : 0 | |! 
8 -> 16 : 202 |#### |! 
! 
>=(ms) .. <(ms) : I/O |Distribution |! 
0 -> 1 : 1144 |######################################|! 
1 -> 2 : 267 |######### |! 
2 -> 4 : 10 |# |! 
4 -> 8 : 5 |# |! 
8 -> 16 : 248 |######### |! 
16 -> 32 : 601 |#################### |! 
32 -> 64 : 117 |#### |! 
[…]!
opensnoop 
• Trace 
open() 
syscalls 
showing 
filenames: 
# ./opensnoop -t! 
Tracing open()s. Ctrl-C to end.! 
TIMEs COMM PID FD FILE! 
4345768.332626 postgres 23886 0x8 /proc/self/oom_adj! 
4345768.333923 postgres 23886 0x5 global/pg_filenode.map! 
4345768.333971 postgres 23886 0x5 global/pg_internal.init! 
4345768.334813 postgres 23886 0x5 base/16384/PG_VERSION! 
4345768.334877 postgres 23886 0x5 base/16384/pg_filenode.map! 
4345768.334891 postgres 23886 0x5 base/16384/pg_internal.init! 
4345768.335821 postgres 23886 0x5 base/16384/11725! 
4345768.347911 svstat 24649 0x4 supervise/ok! 
4345768.347921 svstat 24649 0x4 supervise/status! 
4345768.350340 stat 24651 0x3 /etc/ld.so.cache! 
4345768.350372 stat 24651 0x3 /lib/x86_64-linux-gnu/libselinux…! 
4345768.350460 stat 24651 0x3 /lib/x86_64-linux-gnu/libc.so.6! 
4345768.350526 stat 24651 0x3 /lib/x86_64-linux-gnu/libdl.so.2! 
4345768.350981 stat 24651 0x3 /proc/filesystems! 
4345768.351182 stat 24651 0x3 /etc/nsswitch.conf! 
[…]!
funcgraph 
• Trace 
a 
graph 
of 
kernel 
code 
flow: 
# ./funcgraph -Htp 5363 vfs_read! 
Tracing "vfs_read" for PID 5363... Ctrl-C to end.! 
# tracer: function_graph! 
#! 
# TIME CPU DURATION FUNCTION CALLS! 
# | | | | | | | |! 
4346366.073832 | 0) | vfs_read() {! 
4346366.073834 | 0) | rw_verify_area() {! 
4346366.073834 | 0) | security_file_permission() {! 
4346366.073834 | 0) | apparmor_file_permission() {! 
4346366.073835 | 0) 0.153 us | common_file_perm();! 
4346366.073836 | 0) 0.947 us | }! 
4346366.073836 | 0) 0.066 us | __fsnotify_parent();! 
4346366.073836 | 0) 0.080 us | fsnotify();! 
4346366.073837 | 0) 2.174 us | }! 
4346366.073837 | 0) 2.656 us | }! 
4346366.073837 | 0) | tty_read() {! 
4346366.073837 | 0) 0.060 us | tty_paranoia_check();! 
[…]!
kprobe 
• Dynamically 
trace 
a 
kernel 
funcRon 
call 
or 
return, 
with 
variables, 
and 
in-­‐kernel 
filtering: 
# ./kprobe 'p:open do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'! 
Tracing kprobe myopen. Ctrl-C to end.! 
postgres-1172 [000] d... 6594028.787166: open: (do_sys_open 
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"! 
postgres-1172 [001] d... 6594028.797410: open: (do_sys_open 
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat"! 
postgres-1172 [001] d... 6594028.797467: open: (do_sys_open 
+0x0/0x220) filename="pg_stat_tmp/pgstat.stat”! 
^C! 
Ending tracing...! 
• Add 
-­‐s 
for 
stack 
traces; 
-­‐p 
for 
PID 
filter 
in-­‐kernel. 
• Quickly 
confirm 
kernel 
behavior; 
eg: 
did 
a 
tunable 
take 
effect?
Imagine 
Linux 
with 
Tracing 
• These 
tools 
aren’t 
using 
dtrace4linux, 
SystemTap, 
ktap, 
or 
any 
other 
add-­‐on 
tracer 
• These 
tools 
use 
exis.ng 
Linux 
capabili.es 
– No 
extra 
kernel 
bits, 
not 
even 
kernel 
debuginfo 
– Just 
Linux’s 
built-­‐in 
8race 
profiler 
– Demoed 
on 
Linux 
3.2 
• Solving 
real 
issues 
now
crace 
• Added 
by 
Steven 
Rostedt 
and 
others 
since 
2.6.27 
• Already 
enabled 
on 
our 
servers 
(3.2+) 
– CONFIG_FTRACE, 
CONFIG_FUNCTION_PROFILER, 
… 
– Use 
directly 
via 
/sys/kernel/debug/tracing 
• My 
front-­‐end 
tools 
to 
aid 
usage 
– hjps://github.com/brendangregg/perf-­‐tools 
– Unsupported 
hacks: 
see 
WARNINGs 
– Also 
see 
the 
trace-­‐cmd 
front-­‐end, 
as 
well 
as 
perf 
• lwn.net: 
“Ftrace: 
The 
Hidden 
Light 
Switch”
My 
perf-­‐tools 
(so 
far…)
Tracing 
Summary 
• crace 
• perf_events 
• eBPF 
• SystemTap 
• ktap 
• LTTng 
• dtrace4linux 
• sysdig
perf_events 
• aka 
“perf” 
command 
• In 
Linux. 
Add 
from 
linux-­‐tools-­‐common, 
… 
• Powerful 
mulR-­‐tool 
and 
profiler 
– interval 
sampling, 
CPU 
performance 
counter 
events 
– user 
and 
kernel 
dynamic 
tracing 
– kernel 
line 
tracing 
and 
local 
variables 
(debuginfo) 
– kernel 
filtering, 
and 
in-­‐kernel 
counts 
(perf 
stat) 
• Not 
very 
programmable, 
yet 
– limited 
kernel 
summaries. 
May 
improve 
with 
eBPF.
perf_events 
Example 
# perf record –e skb:consume_skb -ag! 
^C[ perf record: Woken up 1 times to write data ]! 
[ perf record: Captured and wrote 0.065 MB perf.data (~2851 samples) ]! 
# perf report! 
[...]! 
74.42% swapper [kernel.kallsyms] [k] consume_skb! 
|! 
--- consume_skb! 
arp_process! 
arp_rcv! 
__netif_receive_skb_core! 
__netif_receive_skb! 
netif_receive_skb! 
virtnet_poll! 
net_rx_action! 
__do_softirq! 
irq_exit! 
do_IRQ! 
ret_from_intr! 
default_idle! 
cpu_idle! 
start_secondary! 
[…]! 
Summarizing 
stack 
traces 
for 
a 
tracepoint 
perf_events 
can 
do 
many 
things 
– 
hard 
to 
pick 
just 
one 
example
eBPF 
• Extended 
BPF: 
programs 
on 
tracepoints 
– High 
performance 
filtering: 
JIT 
– In-­‐kernel 
summaries: 
maps 
• Linux 
in 
3.18? 
Enhance 
perf_events/crace/…? 
# ./bitesize 1! 
writing bpf-5 -> /sys/kernel/debug/tracing/events/block/block_rq_complete/filter! 
! 
I/O sizes:! 
Kbytes : Count! 
4 -> 7 : 131! 
8 -> 15 : 32! 
16 -> 31 : 1! 
32 -> 63 : 46! 
64 -> 127 : 0! 
128 -> 255 : 15! 
[…]! 
in-­‐kernel 
summary
SystemTap 
• Fully 
programmable, 
fully 
featured 
• Compiles 
tracing 
programs 
into 
kernel 
modules 
– Needs 
a 
compiler, 
and 
takes 
Rme 
• “Works 
great 
on 
Red 
Hat” 
– I 
keep 
trying 
on 
other 
distros 
and 
have 
hit 
trouble 
in 
the 
past; 
make 
sure 
you 
are 
on 
the 
latest 
version. 
– I’m 
liking 
it 
a 
bit 
more 
acer 
finding 
ways 
to 
use 
it 
without 
kernel 
debuginfo 
(a 
difficult 
requirement 
in 
our 
environment). 
Work 
in 
progress. 
• Ever 
be 
mainline?
ktap 
• Sampling, 
staRc 
& 
dynamic 
tracing 
• Lightweight, 
simple. 
Uses 
bytecode. 
• Suited 
for 
embedded 
devices 
• Development 
appears 
suspended 
acer 
suggesRons 
to 
integrate 
with 
eBPF 
(which 
itself 
is 
in 
development) 
• ktap 
+ 
eBPF 
would 
be 
awesome: 
easy, 
lightweight, 
fast. 
Likely?
sysdig 
• sysdig: 
InnovaRve 
new 
tracer. 
Simple 
expressions: 
sysdig fd.type=file and evt.failed=true! 
sysdig evt.type=open and fd.name contains /etc! 
sysdig -p"%proc.name %fd.name" "evt.type=accept and proc.name!=httpd”! 
• Replacement 
for 
strace? 
(or 
“perf 
trace” 
will) 
• Programmable 
“chisels”. 
Eg, 
one 
of 
mine: 
# sysdig -c fileslower 1! 
TIME PROCESS TYPE LAT(ms) FILE! 
2014-04-13 20:40:43.973 cksum read 2 /mnt/partial.0.0! 
2014-04-13 20:40:44.187 cksum read 1 /mnt/partial.0.0! 
2014-04-13 20:40:44.689 cksum read 2 /mnt/partial.0.0! 
[…]! 
• Currently 
syscalls 
and 
user-­‐level 
processing 
only. 
It 
is 
opRmized, 
but 
I’m 
not 
sure 
it 
can 
be 
enough 
for 
kernel 
tracing
Present 
& 
Future 
• Present: 
– crace 
can 
serve 
many 
needs 
today 
– perf_events 
some 
more, 
esp. 
with 
debuginfo 
– ad 
hoc 
SystemTap, 
ktap, 
… 
as 
needed 
• Future: 
– crace/perf_events/ktap 
with 
eBPF, 
for 
a 
fully 
featured 
and 
mainline 
tracer? 
– One 
of 
the 
other 
tracers 
going 
mainline?
The 
Tracing 
Landscape, 
Oct 
2014 
Scope 
& 
Capability 
(less 
brutal) 
Ease 
of 
use 
sysdig 
perf 
crace 
eBPF 
ktap 
stap 
Stage 
of 
Development 
(my 
opinion) 
dtrace4L. 
(brutal) 
(alpha) 
(mature)
In 
Summary
In 
Summary… 
• Plus 
diagrams 
for 
benchmarking, 
tuning, 
tracing 
• Try 
to 
start 
with 
the 
quesRons 
(methodology), 
to 
help 
guide 
your 
use 
of 
the 
tools 
• I 
hopefully 
turned 
some 
unknown 
unknowns 
into 
known 
unknowns
References 
& 
Links 
– Systems 
Performance: 
Enterprise 
and 
the 
Cloud, 
PrenRce 
Hall, 
2014 
– hjp://www.brendangregg.com/linuxperf.html 
– hjp://www.brendangregg.com/perf.html#FlameGraphs 
– nicstat: 
hjp://sourceforge.net/projects/nicstat/ 
– Rptop: 
hjp://Rptop.gforge.inria.fr/ 
• Tiptop: 
Hardware 
Performance 
Counters 
for 
the 
Masses, 
Erven 
Rohou, 
Inria 
Research 
Report 
7789, 
Nov 
2011. 
– crace 
& 
perf-­‐tools 
• hjps://github.com/brendangregg/perf-­‐tools 
• hjp://lwn.net/ArRcles/608497/ 
– MSR 
tools: 
hjps://github.com/brendangregg/msr-­‐cloud-­‐tools 
– pcstat: 
hjps://github.com/tobert/pcstat 
– eBPF: 
hjp://lwn.net/ArRcles/603983/ 
– ktap: 
hjp://www.ktap.org/ 
– SystemTap: 
hjps://sourceware.org/systemtap/ 
– sysdig: 
hjp://www.sysdig.org/ 
– hjp://www.slideshare.net/brendangregg/linux-­‐performance-­‐analysis-­‐and-­‐tools 
– Tux 
by 
Larry 
Ewing; 
Linux® 
is 
the 
registered 
trademark 
of 
Linus 
Torvalds 
in 
the 
U.S. 
and 
other 
countries.
Thanks 
• QuesRons? 
• hjp://slideshare.net/brendangregg 
• hjp://www.brendangregg.com 
• bgregg@ne8lix.com 
• @brendangregg

More Related Content

What's hot (20)

PPTX
Process management in linux
Mazenetsolution
 
PDF
twlkh-linux-vsyscall-and-vdso
Viller Hsiao
 
PDF
Linux BPF Superpowers
Brendan Gregg
 
PDF
Linux Performance Analysis and Tools
Brendan Gregg
 
PDF
Tp n 4 linux
Amir Souissi
 
PDF
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang
 
PDF
Linux File System
Anil Kumar Pugalia
 
PDF
Linux Profiling at Netflix
Brendan Gregg
 
PDF
Velocity 2015 linux perf tools
Brendan Gregg
 
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
PPTX
Slab Allocator in Linux Kernel
Adrian Huang
 
PDF
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
PDF
Manuel des TP : Atelier systèmes 2
Faycel Chaoua
 
PDF
Linux SMEP bypass techniques
Vitaly Nikolenko
 
PDF
Type abstrait de données
ECAM Brussels Engineering School
 
PPTX
Linux Boot Process
darshhingu
 
PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PDF
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Anne Nicolas
 
PDF
Lesson 2 Understanding Linux File System
Sadia Bashir
 
Process management in linux
Mazenetsolution
 
twlkh-linux-vsyscall-and-vdso
Viller Hsiao
 
Linux BPF Superpowers
Brendan Gregg
 
Linux Performance Analysis and Tools
Brendan Gregg
 
Tp n 4 linux
Amir Souissi
 
Reverse Mapping (rmap) in Linux Kernel
Adrian Huang
 
Linux File System
Anil Kumar Pugalia
 
Linux Profiling at Netflix
Brendan Gregg
 
Velocity 2015 linux perf tools
Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
Slab Allocator in Linux Kernel
Adrian Huang
 
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Manuel des TP : Atelier systèmes 2
Faycel Chaoua
 
Linux SMEP bypass techniques
Vitaly Nikolenko
 
Type abstrait de données
ECAM Brussels Engineering School
 
Linux Boot Process
darshhingu
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Kernel Recipes 2015: Linux Kernel IO subsystem - How it works and how can I s...
Anne Nicolas
 
Lesson 2 Understanding Linux File System
Sadia Bashir
 

Viewers also liked (20)

PDF
BPF: Tracing and more
Brendan Gregg
 
PDF
Linux Systems Performance 2016
Brendan Gregg
 
PPTX
Broken Linux Performance Tools 2016
Brendan Gregg
 
POTX
Performance Tuning EC2 Instances
Brendan Gregg
 
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
PDF
Linux Performance Tools
Brendan Gregg
 
PDF
SREcon 2016 Performance Checklists for SREs
Brendan Gregg
 
PDF
Netflix: From Clouds to Roots
Brendan Gregg
 
PDF
What Linux can learn from Solaris performance and vice-versa
Brendan Gregg
 
PDF
Blazing Performance with Flame Graphs
Brendan Gregg
 
PDF
Monitorama 2015 Netflix Instance Analysis
Brendan Gregg
 
PDF
From DTrace to Linux
Brendan Gregg
 
PDF
FreeBSD 2014 Flame Graphs
Brendan Gregg
 
PDF
Designing Tracing Tools
Brendan Gregg
 
PDF
MeetBSD2014 Performance Analysis
Brendan Gregg
 
PDF
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg
 
PDF
ACM Applicative System Methodology 2016
Brendan Gregg
 
PDF
JavaOne 2015 Java Mixed-Mode Flame Graphs
Brendan Gregg
 
PDF
LISA2010 visualizations
Brendan Gregg
 
PDF
DTraceCloud2012
Brendan Gregg
 
BPF: Tracing and more
Brendan Gregg
 
Linux Systems Performance 2016
Brendan Gregg
 
Broken Linux Performance Tools 2016
Brendan Gregg
 
Performance Tuning EC2 Instances
Brendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
Linux Performance Tools
Brendan Gregg
 
SREcon 2016 Performance Checklists for SREs
Brendan Gregg
 
Netflix: From Clouds to Roots
Brendan Gregg
 
What Linux can learn from Solaris performance and vice-versa
Brendan Gregg
 
Blazing Performance with Flame Graphs
Brendan Gregg
 
Monitorama 2015 Netflix Instance Analysis
Brendan Gregg
 
From DTrace to Linux
Brendan Gregg
 
FreeBSD 2014 Flame Graphs
Brendan Gregg
 
Designing Tracing Tools
Brendan Gregg
 
MeetBSD2014 Performance Analysis
Brendan Gregg
 
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg
 
ACM Applicative System Methodology 2016
Brendan Gregg
 
JavaOne 2015 Java Mixed-Mode Flame Graphs
Brendan Gregg
 
LISA2010 visualizations
Brendan Gregg
 
DTraceCloud2012
Brendan Gregg
 
Ad

Similar to Linux Performance Tools 2014 (20)

PDF
test
WentingLiu4
 
PDF
LISA2019 Linux Systems Performance
Brendan Gregg
 
PDF
YOW2020 Linux Systems Performance
Brendan Gregg
 
PDF
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
NETWAYS
 
PDF
Linux Performance Profiling and Monitoring
Georg Schönberger
 
PDF
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
PDF
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
PDF
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
NETWAYS
 
PDF
Essential Linux Commands for DBAs
Gokhan Atil
 
PDF
Kernel Recipes 2017 - Using Linux perf at Netflix - Brendan Gregg
Anne Nicolas
 
PDF
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
PPTX
QCon 2015 Broken Performance Tools
Brendan Gregg
 
PPT
Linux monitoring and Troubleshooting for DBA's
Mydbops
 
PPTX
Designing Tracing Tools
Sysdig
 
PPTX
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov
 
PDF
The New Systems Performance
Brendan Gregg
 
PDF
Linux Server Deep Dives (DrupalCon Amsterdam)
Amin Astaneh
 
PDF
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project
 
PDF
BPF Tools 2017
Brendan Gregg
 
PDF
200.1,2-Capacity Planning
behrad eslamifar
 
LISA2019 Linux Systems Performance
Brendan Gregg
 
YOW2020 Linux Systems Performance
Brendan Gregg
 
OSDC 2015: Georg Schönberger | Linux Performance Profiling and Monitoring
NETWAYS
 
Linux Performance Profiling and Monitoring
Georg Schönberger
 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
NETWAYS
 
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
NETWAYS
 
Essential Linux Commands for DBAs
Gokhan Atil
 
Kernel Recipes 2017 - Using Linux perf at Netflix - Brendan Gregg
Anne Nicolas
 
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
QCon 2015 Broken Performance Tools
Brendan Gregg
 
Linux monitoring and Troubleshooting for DBA's
Mydbops
 
Designing Tracing Tools
Sysdig
 
Using the big guns: Advanced OS performance tools for troubleshooting databas...
Nikolay Savvinov
 
The New Systems Performance
Brendan Gregg
 
Linux Server Deep Dives (DrupalCon Amsterdam)
Amin Astaneh
 
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project
 
BPF Tools 2017
Brendan Gregg
 
200.1,2-Capacity Planning
behrad eslamifar
 
Ad

More from Brendan Gregg (20)

PDF
YOW2021 Computing Performance
Brendan Gregg
 
PDF
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
PDF
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
PDF
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
PDF
Computing Performance: On the Horizon (2021)
Brendan Gregg
 
PDF
BPF Internals (eBPF)
Brendan Gregg
 
PDF
Performance Wins with BPF: Getting Started
Brendan Gregg
 
PDF
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
PDF
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
PDF
LPC2019 BPF Tracing Tools
Brendan Gregg
 
PDF
LSFMM 2019 BPF Observability
Brendan Gregg
 
PDF
YOW2018 CTO Summit: Working at netflix
Brendan Gregg
 
PDF
eBPF Perf Tools 2019
Brendan Gregg
 
PDF
NetConf 2018 BPF Observability
Brendan Gregg
 
PDF
FlameScope 2018
Brendan Gregg
 
PDF
ATO Linux Performance 2018
Brendan Gregg
 
PDF
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
PDF
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
PDF
LISA17 Container Performance Analysis
Brendan Gregg
 
PDF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
YOW2021 Computing Performance
Brendan Gregg
 
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
Computing Performance: On the Horizon (2021)
Brendan Gregg
 
BPF Internals (eBPF)
Brendan Gregg
 
Performance Wins with BPF: Getting Started
Brendan Gregg
 
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
LPC2019 BPF Tracing Tools
Brendan Gregg
 
LSFMM 2019 BPF Observability
Brendan Gregg
 
YOW2018 CTO Summit: Working at netflix
Brendan Gregg
 
eBPF Perf Tools 2019
Brendan Gregg
 
NetConf 2018 BPF Observability
Brendan Gregg
 
FlameScope 2018
Brendan Gregg
 
ATO Linux Performance 2018
Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
LISA17 Container Performance Analysis
Brendan Gregg
 
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 

Recently uploaded (20)

PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
July Patch Tuesday
Ivanti
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 

Linux Performance Tools 2014

  • 1. Linux Performance Tools Brendan Gregg Senior Performance Architect Performance Engineering Team [email protected] @brendangregg Oct, 2014
  • 2. A quick tour of many tools…
  • 3. • Massive AWS EC2 Linux cloud – Tens of thousands of instances – Autoscale by ~3k each day – CentOS and Ubuntu • FreeBSD for content delivery – Approx 33% of US Internet traffic at night • Performance is criRcal – Customer saRsfacRon: >50M subscribers – $$$ price/performance – Develop tools for cloud-­‐wide analysis; use server tools as needed • Just launched in Europe!
  • 4. Brendan Gregg • Senior Performance Architect, Ne8lix – Linux and FreeBSD performance – Performance Engineering team (@coburnw) • Recent work: – Linux perf-­‐tools, using crace & perf_events – Systems Performance, PrenRce Hall • Previous work includes: – USE Method, flame graphs, uRlizaRon & latency heat maps, DTrace tools, ZFS L2ARC • Twijer @brendangregg (these slides)
  • 5. Agenda • Methodologies & Tools • Tool Types: – Observability • Basic • Intermediate • Advanced – Benchmarking – Tuning – StaRc • Tracing Aim: to show what can be done Knowing that something can be done is more important than knowing how to do it.
  • 7. Methodologies & Tools • There are dozens of performance tools for Linux – Packages: sysstat, procps, coreuRls, … – Commercial products • Methodologies can provide guidance for choosing and using tools effecRvely
  • 8. An3-­‐Methodologies • The lack of a deliberate methodology… • Street Light AnR-­‐Method: – 1. Pick observability tools that are • Familiar • Found on the Internet, or at random – 2. Run tools – 3. Look for obvious issues • Drunk Man AnR-­‐Method: – Tune things at random unRl the problem goes away
  • 9. Methodologies • For example, the USE Method: – For every resource, check: • URlizaRon • SaturaRon • Errors • 5 Whys: – Ask “why?” 5 Rmes • Other methods include: – Workload characterizaRon, drill-­‐down analysis, event tracing, baseline stats, staRc performance tuning, … • Start with the quesRons, then find the tools
  • 10. Command Line Tools • Useful to study even if you never use them: GUIs and commercial products ocen use the same interfaces Kernel /proc, /sys, … $ vmstat 1! procs -----------memory---------- ---swap-- …! r b swpd free buff cache si so …! 9 0 0 29549320 29252 9299060 0 …! 2 0 0 29547876 29252 9299332 0 …! 4 0 0 29548124 29252 9299460 0 …! 5 0 0 29548840 29252 9299592 0 …!
  • 11. Tool Types Type Characteris.c Observability Watch acRvity. Safe, usually, depending on resource overhead. Benchmarking Load test. CauRon: producRon tests can cause issues due to contenRon. Tuning Change. Danger: changes could hurt performance, now or later with load. StaRc Check configuraRon. Should be safe.
  • 13. How do you measure these?
  • 14. Observability Tools: Basic • upRme • top (or htop) • ps • vmstat • iostat • mpstat • free
  • 15. upRme • One way to print load averages: $ uptime! 07:42:06 up 8:16, 1 user, load average: 2.27, 2.84, 2.91! • A measure of resource demand: CPUs + disks – Other OSes only show CPUs: easier to interpret • ExponenRally-­‐damped moving averages with Rme constants of 1, 5, and 15 minutes – Historic trend without the line graph • Load > # of CPUs, may mean CPU saturaRon – Don’t spend more than 5 seconds studying these
  • 16. top (or htop) • System and per-­‐process interval summary: $ top - 18:50:26 up 7:43, 1 user, load average: 4.11, 4.91, 5.22! Tasks: 209 total, 1 running, 206 sleeping, 0 stopped, 2 zombie! Cpu(s): 47.1%us, 4.0%sy, 0.0%ni, 48.4%id, 0.0%wa, 0.0%hi, 0.3%si, 0.2%st! Mem: 70197156k total, 44831072k used, 25366084k free, 36360k buffers! Swap: 0k total, 0k used, 0k free, 11873356k cached! ! PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5738 apiprod 20 0 62.6g 29g 352m S 417 44.2 2144:15 java 1386 apiprod 20 0 17452 1388 964 R 0 0.0 0:00.02 top 1 root 20 0 24340 2272 1340 S 0 0.0 0:01.51 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd […]! • %CPU is summed across all CPUs • Can miss short-­‐lived processes (atop won’t) • Can consume noRceable CPU to read /proc
  • 17. htop
  • 18. ps • Process status lisRng (eg, “ASCII art forest”): $ ps -ef f! UID PID PPID C STIME TTY STAT TIME CMD! […]! root 4546 1 0 11:08 ? Ss 0:00 /usr/sbin/sshd -D! root 28261 4546 0 17:24 ? Ss 0:00 _ sshd: prod [priv]! prod 28287 28261 0 17:24 ? S 0:00 _ sshd: prod@pts/0 ! prod 28288 28287 0 17:24 pts/0 Ss 0:00 _ -bash! prod 3156 28288 0 19:15 pts/0 R+ 0:00 _ ps -ef f! root 4965 1 0 11:08 ? Ss 0:00 /bin/sh /usr/bin/svscanboot! root 4969 4965 0 11:08 ? S 0:00 _ svscan /etc/service! […]! • Custom fields: $ ps -eo user,sz,rss,minflt,majflt,pcpu,args! USER SZ RSS MINFLT MAJFLT %CPU COMMAND! root 6085 2272 11928 24 0.0 /sbin/init! […]!
  • 19. vmstat • Virtual memory staRsRcs and more: $ vmstat –Sm 1! procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----! r b swpd free buff cache si so bi bo in cs us sy id wa! 8 0 0 1620 149 552 0 0 1 179 77 12 25 34 0 0! 7 0 0 1598 149 552 0 0 0 0 205 186 46 13 0 0! 8 0 0 1617 149 552 0 0 0 8 210 435 39 21 0 0! 8 0 0 1589 149 552 0 0 0 0 218 219 42 17 0 0! […]! • USAGE: vmstat [interval [count]] • First output line has some summary since boot values (should be all; parRal is confusing) • High level CPU summary. “r” is runnable tasks.
  • 20. iostat • Block I/O (disk) stats. 1st output is since boot. $ iostat -xmdz 1! ! Linux 3.13.0-29 (db001-eb883efa) 08/18/2014 _x86_64_ (16 CPU)! ! Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s ...! xvda 0.00 0.00 0.00 0.00 0.00 0.00 / ...! xvdb 213.00 0.00 15299.00 0.00 338.17 0.00 ...! xvdc 129.00 0.00 15271.00 3.00 336.65 0.01 / ...! md0 0.00 0.00 31082.00 3.00 678.45 0.01 ...! • Very useful set of stats ... avgqu-sz await r_await w_await svctm %util! ... / 0.00 0.00 0.00 0.00 0.00 0.00! ... 126.09 8.22 8.22 0.00 0.06 86.40! ... / 99.31 6.47 6.47 0.00 0.06 86.00! ... 0.00 0.00 0.00 0.00 0.00 0.00! Workload ResulRng Performance
  • 21. mpstat • MulR-­‐processor staRsRcs, per-­‐CPU: $ mpstat –P ALL 1! […]! 08:06:43 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle! 08:06:44 PM all 53.45 0.00 3.77 0.00 0.00 0.39 0.13 0.00 42.26! 08:06:44 PM 0 49.49 0.00 3.03 0.00 0.00 1.01 1.01 0.00 45.45! 08:06:44 PM 1 51.61 0.00 4.30 0.00 0.00 2.15 0.00 0.00 41.94! 08:06:44 PM 2 58.16 0.00 7.14 0.00 0.00 0.00 1.02 0.00 33.67! 08:06:44 PM 3 54.55 0.00 5.05 0.00 0.00 0.00 0.00 0.00 40.40! 08:06:44 PM 4 47.42 0.00 3.09 0.00 0.00 0.00 0.00 0.00 49.48! 08:06:44 PM 5 65.66 0.00 3.03 0.00 0.00 0.00 0.00 0.00 31.31! 08:06:44 PM 6 50.00 0.00 2.08 0.00 0.00 0.00 0.00 0.00 47.92! […]! • Look for unbalanced workloads, hot CPUs.
  • 22. free • Main memory usage: • buffers: block device I/O cache • cached: virtual page cache $ free -m! total used free shared buffers cached! Mem: 3750 1111 2639 0 147 527! -/+ buffers/cache: 436 3313! Swap: 0 0 0!
  • 24. Observability Tools: Intermediate • strace • tcpdump • netstat • nicstat • pidstat • swapon • lsof • sar (and collectl, dstat, etc.)
  • 25. strace • System call tracer: $ strace –tttT –p 313! 1408393285.779746 getgroups(0, NULL) = 1 <0.000016>! 1408393285.779873 getgroups(1, [0]) = 1 <0.000015>! 1408393285.780797 close(3) = 0 <0.000016>! 1408393285.781338 write(1, "LinuxCon 2014!n", 15LinuxCon 2014!! ) = 15 <0.000048>! • Eg, -­‐jt: Rme (us) since epoch; -­‐T: syscall Rme (s) • Translates syscall args – Very helpful for solving system usage issues • Currently has massive overhead (ptrace based) – Can slow the target by > 100x. Use extreme cauRon.
  • 26. tcpdump • Sniff network packets for post analysis: $ tcpdump -i eth0 -w /tmp/out.tcpdump! tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes! ^C7985 packets captured! 8996 packets received by filter! 1010 packets dropped by kernel! # tcpdump -nr /tmp/out.tcpdump | head ! reading from file /tmp/out.tcpdump, link-type EN10MB (Ethernet) ! 20:41:05.038437 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 18...! 20:41:05.038533 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 48...! 20:41:05.038584 IP 10.44.107.151.22 > 10.53.237.72.46425: Flags [P.], seq 96...! […]! • Study packet sequences with Rmestamps (us) • CPU overhead opRmized (socket ring buffers), but can sRll be significant. Use cauRon.
  • 27. netstat • Various network protocol staRsRcs using -­‐s: • A mulR-­‐tool: -­‐i: interface stats -­‐r: route table default: list conns • netstat -­‐p: shows process details! • Per-­‐second interval with -­‐c $ netstat –s! […]! Tcp:! 736455 active connections openings! 176887 passive connection openings! 33 failed connection attempts! 1466 connection resets received! 3311 connections established! 91975192 segments received! 180415763 segments send out! 223685 segments retransmited! 2 bad segments received.! 39481 resets sent! […]! TcpExt:! 12377 invalid SYN cookies received! 2982 delayed acks sent! […]!
  • 28. nicstat • Network interface stats, iostat-­‐like output: $ ./nicstat 1! Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 21:21:43 lo 823.0 823.0 171.5 171.5 4915.4 4915.4 0.00 0.00! 21:21:43 eth0 5.53 1.74 15.11 12.72 374.5 139.8 0.00 0.00! Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 21:21:44 lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00! 21:21:44 eth0 20.42 3394.1 355.8 85.94 58.76 40441.3 0.00 0.00! Time Int rKB/s wKB/s rPk/s wPk/s rAvs wAvs %Util Sat! 21:21:45 lo 1409.1 1409.1 327.9 327.9 4400.8 4400.8 0.00 0.00! 21:21:45 eth0 75.12 4402.3 1398.9 1513.2 54.99 2979.1 0.00 0.00! […]! • Check network throughput and interface %uRl • I wrote this years ago; Tim Cook ported to Linux
  • 29. pidstat • Very useful process stats. eg, by-­‐thread, disk I/O: $ pidstat -t 1! Linux 3.2.0-54 (db002-91befe03) !08/18/2014 !_x86_64_!(8 CPU)! ! 08:57:52 PM TGID TID %usr %system %guest %CPU CPU Command! 08:57:54 PM 5738 - 484.75 39.83 0.00 524.58 1 java! 08:57:54 PM - 5817 0.85 0.00 0.00 0.85 2 |__java! 08:57:54 PM - 5931 1.69 1.69 0.00 3.39 4 |__java! 08:57:54 PM - 5981 0.85 0.00 0.00 0.85 7 |__java! 08:57:54 PM - 5990 0.85 0.00 0.00 0.85 4 |__java! […]! $ pidstat -d 1! […]! 08:58:27 PM PID kB_rd/s kB_wr/s kB_ccwr/s Command! 08:58:28 PM 5738 0.00 815.69 0.00 java! […]! • I usually prefer this over top(1)
  • 30. swapon • Show swap device usage: $ swapon -s! Filename Type Size Used Priority! /dev/sda3 partition 5245212 284 -1! • If you have swap enabled…
  • 31. lsof • More a debug tool, lsof(8) shows file descriptor usage, which for some apps, equals current acRve network connecRons: # lsof -iTCP -sTCP:ESTABLISHED! COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME! sshd 755 root 3r IPv4 13576887 0t0 TCP bgregg-test-i-f106:ssh->prod100.netflix.com: 15241 (ESTABLISHED)! platforms 2614 app1 8u IPv4 14618 0t0 TCP localhost:33868->localhost:5433 (ESTABLISHED)! postgres 2648 app1 7u IPv4 14619 0t0 TCP localhost:5433->localhost:33868 (ESTABLISHED)! epic_plug 2857 app1 7u IPv4 15678 0t0 TCP localhost:33885->localhost:5433 (ESTABLISHED)! postgres 2892 app1 7u IPv4 15679 0t0 TCP localhost:5433->localhost:33885 (ESTABLISHED)! […]! • I’d prefer to: echo /proc/PID/fd | wc -l!
  • 32. sar • System AcRvity Reporter. Many stats, eg: $ sar -n TCP,ETCP,DEV 1! Linux 3.2.55 (test-e4f1a80b) !08/18/2014 !_x86_64_!(8 CPU)! ! 09:10:43 PM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s! 09:10:44 PM lo 14.00 14.00 1.34 1.34 0.00 0.00 0.00! 09:10:44 PM eth0 4114.00 4186.00 4537.46 28513.24 0.00 0.00 0.00! ! 09:10:43 PM active/s passive/s iseg/s oseg/s! 09:10:44 PM 21.00 4.00 4107.00 22511.00! ! 09:10:43 PM atmptf/s estres/s retrans/s isegerr/s orsts/s! 09:10:44 PM 0.00 0.00 36.00 0.00 1.00! […]! • Archive or live mode: (interval [count]) • Well designed. Header naming convenRon, logical groups: TCP, ETCP, DEV, EDEV, …
  • 34. Other Tools • You may also use collectl, atop, dstat, or another measure-­‐all tool • The tool isn’t important • It’s important to have a way to measure everything you want • In cloud environments, you are probably using a monitoring product, developed in-­‐house or commercial. Same method applies…
  • 35. How does your monitoring tool measure these?
  • 37. Advanced Observability Tools • Misc: – ltrace, ss, iptraf, ethtool, snmpget, lldptool, iotop, blktrace, slabtop, /proc, pcstat • CPU Performance Counters: – perf_events, Rptop, rdmsr • Advanced Tracers: – perf_events, crace, eBPF, SystemTap, ktap, LTTng, dtrace4linux, sysdig • Some selected demos…
  • 38. ss • More socket staRsRcs: $ ss -mop! State Recv-Q Send-Q Local Address:Port Peer Address:Port ! CLOSE-WAIT 1 0 127.0.0.1:42295 127.0.0.1:28527 users:(("apacheLogParser",2702,3))! ! mem:(r1280,w0,f2816,t0)! ESTAB 0 0 127.0.0.1:5433 127.0.0.1:41312 timer:(keepalive,36min,0) users:(("postgres",2333,7))! ! mem:(r0,w0,f0,t0)! […]! $ ss –i! State Recv-Q Send-Q Local Address:Port Peer Address:Port ! CLOSE-WAIT 1 0 127.0.0.1:42295 127.0.0.1:28527 ! cubic wscale:6,6 rto:208 rtt:9/6 ato:40 cwnd:10 send 145.6Mbps rcv_space:32792! ESTAB 0 0 10.144.107.101:ssh 10.53.237.72:4532 ! cubic wscale:4,6 rto:268 rtt:71.5/3 ato:40 cwnd:10 send 1.5Mbps rcv_rtt:72 rcv_space:14480! […]!
  • 40. iotop • Block device I/O (disk) by process: $ iotop! Total DISK READ: 50.47 M/s | Total DISK WRITE: 59.21 M/s! TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND ! 959 be/4 root 0.00 B/s 0.00 B/s 0.00 % 99.99 % [flush-202:1]! 6641 be/4 root 50.47 M/s 82.60 M/s 0.00 % 32.51 % java –Dnop –X! 1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init! 2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]! 3 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [ksoftirqd/0]! 4 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0]! 5 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/u:0]! 6 rt/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [migration/0]! • Needs kernel support enabled – CONFIG_TASK_IO_ACCOUNTING […]!
  • 41. slabtop • Kernel slab allocator memory usage: $ slabtop! Active / Total Objects (% used) : 4692768 / 4751161 (98.8%)! Active / Total Slabs (% used) : 129083 / 129083 (100.0%)! Active / Total Caches (% used) : 71 / 109 (65.1%)! Active / Total Size (% used) : 729966.22K / 738277.47K (98.9%)! Minimum / Average / Maximum Object : 0.01K / 0.16K / 8.00K! ! OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME ! 3565575 3565575 100% 0.10K 91425 39 365700K buffer_head! 314916 314066 99% 0.19K 14996 21 59984K dentry! 184192 183751 99% 0.06K 2878 64 11512K kmalloc-64! 138618 138618 100% 0.94K 4077 34 130464K xfs_inode! 138602 138602 100% 0.21K 3746 37 29968K xfs_ili! 102116 99012 96% 0.55K 3647 28 58352K radix_tree_node! 97482 49093 50% 0.09K 2321 42 9284K kmalloc-96! 22695 20777 91% 0.05K 267 85 1068K shared_policy_node! 21312 21312 100% 0.86K 576 37 18432K ext4_inode_cache! 16288 14601 89% 0.25K 509 32 4072K kmalloc-256! […]!
  • 42. pcstat • Show page cache residency by file: # ./pcstat data0*! |----------+----------------+------------+-----------+---------|! | Name | Size | Pages | Cached | Percent |! |----------+----------------+------------+-----------+---------|! | data00 | 104857600 | 25600 | 25600 | 100.000 |! | data01 | 104857600 | 25600 | 25600 | 100.000 |! | data02 | 104857600 | 25600 | 4080 | 015.938 |! | data03 | 104857600 | 25600 | 25600 | 100.000 |! | data04 | 104857600 | 25600 | 16010 | 062.539 |! | data05 | 104857600 | 25600 | 0 | 000.000 |! |----------+----------------+------------+-----------+---------|! • Uses the mincore(2) syscall. Useful for database performance analysis.
  • 43. perf_events (counters) • Performance Monitoring Counters (PMCs): $ perf list | grep –i hardware! cpu-cycles OR cycles [Hardware event]! stalled-cycles-frontend OR idle-cycles-frontend [Hardware event]! stalled-cycles-backend OR idle-cycles-backend [Hardware event]! instructions [Hardware event]! • IdenRfy CPU cycle breakdowns, esp. stall types – PMCs not enabled by-­‐default in clouds (yet) – Can be Rme-­‐consuming to use (CPU manuals) • Use flame graphs to visualize sampled stack traces […]! branch-misses [Hardware event]! bus-cycles [Hardware event]! L1-dcache-loads [Hardware cache event]! L1-dcache-load-misses [Hardware cache event]! […]! rNNN (see 'perf list --help' on how to encode it) [Raw hardware event … ! mem:<addr>[:access] [Hardware breakpoint]!
  • 44. perf_events CPU Flame Graph Broken Java stacks (missing frame pointer) Kernel TCP/IP GC Idle Time thread Locks epoll
  • 45. Rptop • IPC by process, %MISS, %BUS • Needs some love. perfmon2 library integraRon? • SRll can’t use it in clouds yet (needs PMCs enabled)
  • 46. rdmsr • Model Specific Registers (MSRs), unlike PMCs, can be read by default in Xen guests – Timestamp clock, temp, power, … – Use rdmsr(1) from the msr-­‐tools package to read them – Uses include (hjps://github.com/brendangregg/msr-­‐cloud-­‐tools): ec2-guest# ./showboost! [...]! TIME C0_MCYC C0_ACYC UTIL RATIO MHz! 06:11:35 6428553166 7457384521 51% 116% 2900! 06:11:40 6349881107 7365764152 50% 115% 2899! 06:11:45 6240610655 7239046277 49% 115% 2899! [...]! ec2-guest# ./cputemp 1! CPU1 CPU2 CPU3 CPU4! 61 61 60 59! 60 61 60 60! [...]! Real CPU MHz CPU Temperature
  • 47. More Advanced Tools… • Some others worth menRoning: Tool Descrip.on ltrace Library call tracer ethtool Mostly interface tuning; some stats snmpget SNMP network host staRsRcs lldptool Can get LLDP broadcast stats blktrace Block I/O event tracer /proc Many raw kernel counters pmu-­‐tools On-­‐ and off-­‐core CPU counter tools
  • 48. Advanced Tracers • Many opRons on Linux: – perf_events, crace, eBPF, SystemTap, ktap, LTTng, dtrace4linux, sysdig • Most can do staRc and dynamic tracing – StaRc: pre-­‐defined events (tracepoints) – Dynamic: instrument any socware (kprobes, uprobes). Custom metrics on-­‐demand. Catch all. • Many are in-­‐development. – I’ll summarize their state later…
  • 52. Benchmarking Tools • MulR: – UnixBench, lmbench, sysbench, perf bench • FS/disk: – dd, hdparm, fio • App/lib: – ab, wrk, jmeter, openssl • Networking: – ping, hping3, iperf, jcp, traceroute, mtr, pchar
  • 53. AcRve Benchmarking • Most benchmarks are misleading or wrong – You benchmark A, but actually measure B, and conclude that you measured C • AcRve Benchmarking 1. Run the benchmark for hours 2. While running, analyze and confirm the performance limiter using observability tools • We just covered those tools – use them!
  • 54. lmbench • CPU, memory, and kernel micro-­‐benchmarks • Eg, memory latency by stride size: $ lat_mem_rd 100m 128 > out.latencies! some R processing…! L1 cache L2 cache Main Memory L3 cache
  • 55. fio • FS or disk I/O micro-­‐benchmarks $ fio --name=seqwrite --rw=write --bs=128k --size=122374m! […]! seqwrite: (groupid=0, jobs=1): err= 0: pid=22321! write: io=122374MB, bw=840951KB/s, iops=6569 , runt=149011msec! clat (usec): min=41 , max=133186 , avg=148.26, stdev=1287.17! lat (usec): min=44 , max=133188 , avg=151.11, stdev=1287.21! bw (KB/s) : min=10746, max=1983488, per=100.18%, avg=842503.94, stdev=262774.35! cpu : usr=2.67%, sys=43.46%, ctx=14284, majf=1, minf=24! IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%! submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%! complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%! issued r/w/d: total=0/978992/0, short=0/0/0! lat (usec): 50=0.02%, 100=98.30%, 250=1.06%, 500=0.01%, 750=0.01%! lat (usec): 1000=0.01%! lat (msec): 2=0.01%, 4=0.01%, 10=0.25%, 20=0.29%, 50=0.06%! lat (msec): 100=0.01%, 250=0.01%! • Results include basic latency distribuRon
  • 56. pchar • Traceroute with bandwidth per hop! $ pchar 10.71.83.1! […]! 4: 10.110.80.1 (10.110.80.1)! Partial loss: 0 / 5 (0%)! Partial char: rtt = 9.351109 ms, (b = 0.004961 ms/B), r2 = 0.184105! stddev rtt = 4.967992, stddev b = 0.006029! Partial queueing: avg = 0.000000 ms (0 bytes)! Hop char: rtt = --.--- ms, bw = 1268.975773 Kbps! Hop queueing: avg = 0.000000 ms (0 bytes)! 5: 10.193.43.181 (10.193.43.181)! Partial loss: 0 / 5 (0%)! Partial char: rtt = 25.461597 ms, (b = 0.011934 ms/B), r2 = 0.228707! stddev rtt = 10.426112, stddev b = 0.012653! Partial queueing: avg = 0.000000 ms (0 bytes)! Hop char: rtt = 16.110487 ms, bw = 1147.210397 Kbps! Hop queueing: avg = 0.000000 ms (0 bytes)! […]! • Needs love. Based on pathchar (Linux 2.0.30).
  • 59. Tuning Tools • Generic interfaces: – sysctl, /sys • Many areas have custom tuning tools: – ApplicaRons: their own config – CPU/scheduler: nice, renice, taskset, ulimit, chcpu – Storage I/O: tune2fs, ionice, hdparm, blockdev, … – Network: ethtool, tc, ip, route – Dynamic patching: stap, kpatch
  • 60. Tuning Methods • ScienRfic Method: 1. QuesRon 2. Hypothesis 3. PredicRon 4. Test 5. Analysis • Any observa3onal or benchmarking tests you can try before tuning? • Consider risks, and see previous tools
  • 63. StaRc Tools • StaRc Performance Tuning: check the staRc state and configuraRon of the system – CPU types – Storage devices – File system capacity – File system and volume configuraRon – Route table – State of hardware • What can be checked on a system without load
  • 66. Tracing Frameworks: Tracepoints • StaRcally placed at logical places in the kernel • Provides key event details as a “format” string
  • 67. Tracing Frameworks: + probes • kprobes: dynamic kernel tracing – funcRon calls, returns, line numbers • uprobes: dynamic user-­‐level tracing
  • 68. Tracing Tools • OpRons: – crace – perf_events – eBPF – SystemTap – ktap – LTTng – dtrace4linux – Oracle Linux DTrace – sysdig • Too many choices, and many sRll in-­‐development
  • 69. Imagine Linux with Tracing • With a programmable tracer, high level tools can be wrijen, such as: – iosnoop – iolatency – opensnoop – …
  • 70. iosnoop • Block I/O (disk) events with latency: # ./iosnoop –ts! Tracing block I/O. Ctrl-C to end.! STARTs ENDs COMM PID TYPE DEV BLOCK BYTES LATms! 5982800.302061 5982800.302679 supervise 1809 W 202,1 17039600 4096 0.62! 5982800.302423 5982800.302842 supervise 1809 W 202,1 17039608 4096 0.42! 5982800.304962 5982800.305446 supervise 1801 W 202,1 17039616 4096 0.48! 5982800.305250 5982800.305676 supervise 1801 W 202,1 17039624 4096 0.43! […]! # ./iosnoop –h! USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]! -d device # device string (eg, "202,1)! -i iotype # match type (eg, '*R*' for all reads)! -n name # process name to match on I/O issue! -p PID # PID to match on I/O issue! -Q # include queueing time in LATms! -s # include start time of I/O (s)! -t # include completion time of I/O (s)! -h # this usage message! duration # duration seconds, and use buffers! […]!
  • 71. iolatency • Block I/O (disk) latency distribuRons: # ./iolatency ! Tracing block I/O. Output every 1 seconds. Ctrl-C to end.! ! >=(ms) .. <(ms) : I/O |Distribution |! 0 -> 1 : 2104 |######################################|! 1 -> 2 : 280 |###### |! 2 -> 4 : 2 |# |! 4 -> 8 : 0 | |! 8 -> 16 : 202 |#### |! ! >=(ms) .. <(ms) : I/O |Distribution |! 0 -> 1 : 1144 |######################################|! 1 -> 2 : 267 |######### |! 2 -> 4 : 10 |# |! 4 -> 8 : 5 |# |! 8 -> 16 : 248 |######### |! 16 -> 32 : 601 |#################### |! 32 -> 64 : 117 |#### |! […]!
  • 72. opensnoop • Trace open() syscalls showing filenames: # ./opensnoop -t! Tracing open()s. Ctrl-C to end.! TIMEs COMM PID FD FILE! 4345768.332626 postgres 23886 0x8 /proc/self/oom_adj! 4345768.333923 postgres 23886 0x5 global/pg_filenode.map! 4345768.333971 postgres 23886 0x5 global/pg_internal.init! 4345768.334813 postgres 23886 0x5 base/16384/PG_VERSION! 4345768.334877 postgres 23886 0x5 base/16384/pg_filenode.map! 4345768.334891 postgres 23886 0x5 base/16384/pg_internal.init! 4345768.335821 postgres 23886 0x5 base/16384/11725! 4345768.347911 svstat 24649 0x4 supervise/ok! 4345768.347921 svstat 24649 0x4 supervise/status! 4345768.350340 stat 24651 0x3 /etc/ld.so.cache! 4345768.350372 stat 24651 0x3 /lib/x86_64-linux-gnu/libselinux…! 4345768.350460 stat 24651 0x3 /lib/x86_64-linux-gnu/libc.so.6! 4345768.350526 stat 24651 0x3 /lib/x86_64-linux-gnu/libdl.so.2! 4345768.350981 stat 24651 0x3 /proc/filesystems! 4345768.351182 stat 24651 0x3 /etc/nsswitch.conf! […]!
  • 73. funcgraph • Trace a graph of kernel code flow: # ./funcgraph -Htp 5363 vfs_read! Tracing "vfs_read" for PID 5363... Ctrl-C to end.! # tracer: function_graph! #! # TIME CPU DURATION FUNCTION CALLS! # | | | | | | | |! 4346366.073832 | 0) | vfs_read() {! 4346366.073834 | 0) | rw_verify_area() {! 4346366.073834 | 0) | security_file_permission() {! 4346366.073834 | 0) | apparmor_file_permission() {! 4346366.073835 | 0) 0.153 us | common_file_perm();! 4346366.073836 | 0) 0.947 us | }! 4346366.073836 | 0) 0.066 us | __fsnotify_parent();! 4346366.073836 | 0) 0.080 us | fsnotify();! 4346366.073837 | 0) 2.174 us | }! 4346366.073837 | 0) 2.656 us | }! 4346366.073837 | 0) | tty_read() {! 4346366.073837 | 0) 0.060 us | tty_paranoia_check();! […]!
  • 74. kprobe • Dynamically trace a kernel funcRon call or return, with variables, and in-­‐kernel filtering: # ./kprobe 'p:open do_sys_open filename=+0(%si):string' 'filename ~ "*stat"'! Tracing kprobe myopen. Ctrl-C to end.! postgres-1172 [000] d... 6594028.787166: open: (do_sys_open +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"! postgres-1172 [001] d... 6594028.797410: open: (do_sys_open +0x0/0x220) filename="pg_stat_tmp/pgstat.stat"! postgres-1172 [001] d... 6594028.797467: open: (do_sys_open +0x0/0x220) filename="pg_stat_tmp/pgstat.stat”! ^C! Ending tracing...! • Add -­‐s for stack traces; -­‐p for PID filter in-­‐kernel. • Quickly confirm kernel behavior; eg: did a tunable take effect?
  • 75. Imagine Linux with Tracing • These tools aren’t using dtrace4linux, SystemTap, ktap, or any other add-­‐on tracer • These tools use exis.ng Linux capabili.es – No extra kernel bits, not even kernel debuginfo – Just Linux’s built-­‐in 8race profiler – Demoed on Linux 3.2 • Solving real issues now
  • 76. crace • Added by Steven Rostedt and others since 2.6.27 • Already enabled on our servers (3.2+) – CONFIG_FTRACE, CONFIG_FUNCTION_PROFILER, … – Use directly via /sys/kernel/debug/tracing • My front-­‐end tools to aid usage – hjps://github.com/brendangregg/perf-­‐tools – Unsupported hacks: see WARNINGs – Also see the trace-­‐cmd front-­‐end, as well as perf • lwn.net: “Ftrace: The Hidden Light Switch”
  • 78. Tracing Summary • crace • perf_events • eBPF • SystemTap • ktap • LTTng • dtrace4linux • sysdig
  • 79. perf_events • aka “perf” command • In Linux. Add from linux-­‐tools-­‐common, … • Powerful mulR-­‐tool and profiler – interval sampling, CPU performance counter events – user and kernel dynamic tracing – kernel line tracing and local variables (debuginfo) – kernel filtering, and in-­‐kernel counts (perf stat) • Not very programmable, yet – limited kernel summaries. May improve with eBPF.
  • 80. perf_events Example # perf record –e skb:consume_skb -ag! ^C[ perf record: Woken up 1 times to write data ]! [ perf record: Captured and wrote 0.065 MB perf.data (~2851 samples) ]! # perf report! [...]! 74.42% swapper [kernel.kallsyms] [k] consume_skb! |! --- consume_skb! arp_process! arp_rcv! __netif_receive_skb_core! __netif_receive_skb! netif_receive_skb! virtnet_poll! net_rx_action! __do_softirq! irq_exit! do_IRQ! ret_from_intr! default_idle! cpu_idle! start_secondary! […]! Summarizing stack traces for a tracepoint perf_events can do many things – hard to pick just one example
  • 81. eBPF • Extended BPF: programs on tracepoints – High performance filtering: JIT – In-­‐kernel summaries: maps • Linux in 3.18? Enhance perf_events/crace/…? # ./bitesize 1! writing bpf-5 -> /sys/kernel/debug/tracing/events/block/block_rq_complete/filter! ! I/O sizes:! Kbytes : Count! 4 -> 7 : 131! 8 -> 15 : 32! 16 -> 31 : 1! 32 -> 63 : 46! 64 -> 127 : 0! 128 -> 255 : 15! […]! in-­‐kernel summary
  • 82. SystemTap • Fully programmable, fully featured • Compiles tracing programs into kernel modules – Needs a compiler, and takes Rme • “Works great on Red Hat” – I keep trying on other distros and have hit trouble in the past; make sure you are on the latest version. – I’m liking it a bit more acer finding ways to use it without kernel debuginfo (a difficult requirement in our environment). Work in progress. • Ever be mainline?
  • 83. ktap • Sampling, staRc & dynamic tracing • Lightweight, simple. Uses bytecode. • Suited for embedded devices • Development appears suspended acer suggesRons to integrate with eBPF (which itself is in development) • ktap + eBPF would be awesome: easy, lightweight, fast. Likely?
  • 84. sysdig • sysdig: InnovaRve new tracer. Simple expressions: sysdig fd.type=file and evt.failed=true! sysdig evt.type=open and fd.name contains /etc! sysdig -p"%proc.name %fd.name" "evt.type=accept and proc.name!=httpd”! • Replacement for strace? (or “perf trace” will) • Programmable “chisels”. Eg, one of mine: # sysdig -c fileslower 1! TIME PROCESS TYPE LAT(ms) FILE! 2014-04-13 20:40:43.973 cksum read 2 /mnt/partial.0.0! 2014-04-13 20:40:44.187 cksum read 1 /mnt/partial.0.0! 2014-04-13 20:40:44.689 cksum read 2 /mnt/partial.0.0! […]! • Currently syscalls and user-­‐level processing only. It is opRmized, but I’m not sure it can be enough for kernel tracing
  • 85. Present & Future • Present: – crace can serve many needs today – perf_events some more, esp. with debuginfo – ad hoc SystemTap, ktap, … as needed • Future: – crace/perf_events/ktap with eBPF, for a fully featured and mainline tracer? – One of the other tracers going mainline?
  • 86. The Tracing Landscape, Oct 2014 Scope & Capability (less brutal) Ease of use sysdig perf crace eBPF ktap stap Stage of Development (my opinion) dtrace4L. (brutal) (alpha) (mature)
  • 88. In Summary… • Plus diagrams for benchmarking, tuning, tracing • Try to start with the quesRons (methodology), to help guide your use of the tools • I hopefully turned some unknown unknowns into known unknowns
  • 89. References & Links – Systems Performance: Enterprise and the Cloud, PrenRce Hall, 2014 – hjp://www.brendangregg.com/linuxperf.html – hjp://www.brendangregg.com/perf.html#FlameGraphs – nicstat: hjp://sourceforge.net/projects/nicstat/ – Rptop: hjp://Rptop.gforge.inria.fr/ • Tiptop: Hardware Performance Counters for the Masses, Erven Rohou, Inria Research Report 7789, Nov 2011. – crace & perf-­‐tools • hjps://github.com/brendangregg/perf-­‐tools • hjp://lwn.net/ArRcles/608497/ – MSR tools: hjps://github.com/brendangregg/msr-­‐cloud-­‐tools – pcstat: hjps://github.com/tobert/pcstat – eBPF: hjp://lwn.net/ArRcles/603983/ – ktap: hjp://www.ktap.org/ – SystemTap: hjps://sourceware.org/systemtap/ – sysdig: hjp://www.sysdig.org/ – hjp://www.slideshare.net/brendangregg/linux-­‐performance-­‐analysis-­‐and-­‐tools – Tux by Larry Ewing; Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.
  • 90. Thanks • QuesRons? • hjp://slideshare.net/brendangregg • hjp://www.brendangregg.com • [email protected] • @brendangregg