SlideShare a Scribd company logo
Linux	
  4.x	
  Performance	
  
Using	
  BPF	
  Superpowers	
  
Brendan Gregg
Senior Performance Architect
Feb	
  
2016	
  
Ten	
  years	
  ago,	
  
I	
  gave	
  a	
  talk	
  here	
  	
  
about	
  DTrace	
  tools…	
  
Linux BPF Superpowers
Superpowers	
  are	
  coming	
  to	
  Linux	
  	
  
Solve performance issues that were previously impossible
For example, full off-CPU analysis…
Linux BPF Superpowers
Linux BPF Superpowers
Ideal	
  Thread	
  States	
  
A starting point for
deeper analysis
Linux	
  Thread	
  States	
  
Based on:
TASK_RUNNING
TASK_INTERRUPTIBLE
TASK_UNINTERRUPTIBLE
Still a useful
starting point
Linux	
  On-­‐CPU	
  Analysis	
  
CPU	
  Flame	
  Graph	
  
•  I'll start with on-CPU analysis:
•  Split into user/kernel states
using /proc, mpstat(1), ...
•  perf_events ("perf") to analyze further:
–  User & kernel stack sampling (as a CPU flame graph)
–  CPI
–  Should be easy, but…
Broken	
  Stacks	
  
Missing Java
stacks
Missing	
  Symbols	
  
"[unknown]"
Java	
  Mixed-­‐Mode	
  CPU	
  Flame	
  Graph	
  
Java
JVM
Kernel
GC
•  Fixed!
–  Java –XX:+PreserveFramePointer
–  Java perf-map-agent
–  Linux perf_events
Stack depth
Samples
(alphabetical sort)
Also,	
  CPI	
  Flame	
  Graph	
  
Cycles Per Instruction
-  red == instruction
heavy
-  blue == cycle heavy
(likely mem stalls)
zoomed:
Linux	
  Off-­‐CPU	
  Analysis	
  
On Linux, the state
isn't helpful, but the
code path is
Off-CPU analysis by
measuring blocked
time with stack traces
Off-­‐CPU	
  Time	
  Flame	
  Graph	
  
From	
  hRp://www.brendangregg.com/blog/2016-­‐02-­‐01/linux-­‐wakeup-­‐offwake-­‐profiling.html	
  
Stack depth
Off-CPU time
Off-­‐CPU	
  Time	
  (zoomed):	
  tar(1)	
  
file read
from disk
directory read
from disk
Currently kernel stacks only; user stacks will add more context
pipe write
path read from disk
fstat from disk
Off-­‐CPU	
  Time:	
  more	
  states	
  
lock
contention sleep
run queue
latency
Flame graph quantifies total time spent in states
CPU	
  +	
  Off-­‐CPU	
  ==	
  See	
  Everything?	
  
Off-­‐CPU	
  Time	
  (zoomed):	
  gzip(1)	
  
Off-CPU doesn't always make sense:
what is gzip blocked on?
Wakeup	
  Time	
  Flame	
  Graph	
  
Wakeup	
  Time	
  (zoomed):	
  gzip(1)	
  
gzip(1) is blocked on tar(1)!
tar cf - * | gzip > out.tar.gz
Can't we associate off-CPU with wakeup stacks?
Off-­‐Wake	
  Time	
  Flame	
  Graph	
  
Wakeup stacks
are associated
and merged
in-kernel
using
BPF
We couldn't do
this before
Linux BPF Superpowers
•  One wakeup stack is often
not enough…
•  Who woke the waker?
Haven't	
  Solved	
  Everything	
  Yet…	
  
Chain	
  Graphs	
  
Merging multiple
wakeup stacks
in kernel using
BPF
With enough
stacks, all paths
lead to metal
Solve	
  Everything	
  
CPU + off-CPU analysis can solve most issues
Flame graph (profiling) types:
1.  CPU
2.  CPI
3.  Off-CPU time
4.  Wakeup time
5.  Off-wake time
6.  Chain
BPF makes this all more practical
different off-CPU analysis views,
with more context and
increasing measurement cost
2.	
  BPF	
  
"One	
  of	
  the	
  more	
  interesbng	
  features	
  in	
  this	
  
cycle	
  is	
  the	
  ability	
  to	
  aRach	
  eBPF	
  programs	
  
(user-­‐defined,	
  sandboxed	
  bytecode	
  executed	
  
by	
  the	
  kernel)	
  to	
  kprobes.	
  This	
  allows	
  user-­‐
defined	
  instrumentabon	
  on	
  a	
  live	
  kernel	
  image	
  
that	
  can	
  never	
  crash,	
  hang	
  or	
  interfere	
  with	
  the	
  
kernel	
  negabvely."	
  
–	
  Ingo	
  Molnár	
  (Linux	
  developer)	
  
Source:	
  hRps://lkml.org/lkml/2015/4/14/232	
  
2.	
  BPF	
  
"crazy	
  stuff"	
  
–	
  Alexei	
  Starovoitov	
  (eBPF	
  lead)	
  
Source:	
  hRp://www.slideshare.net/AlexeiStarovoitov/bpf-­‐inkernel-­‐virtual-­‐machine	
  
BPF	
  
•  eBPF == enhanced Berkeley Packet Filter; now just BPF
•  Integrated into Linux (in stages: 3.15, 3.19, 4.1, 4.5, …)
•  Uses
–  virtual networking
–  tracing
–  "crazy stuff"
•  Front-ends
–  samples/bpf (raw)
–  bcc: Python, C
–  Linux perf_events BPF	
  mascot	
  
BPF	
  for	
  Tracing	
  
•  Can do per-event output and in-kernel summary
statistics (histograms, etc).
BPF	
  bytecode	
  
User	
  Program	
  
1.	
  generate	
  
2.	
  load	
  
Kernel	
  
kprobes	
  
uprobes	
  
tracepoints	
  
BPF	
  
maps	
  
perf_output	
  
per-­‐
event	
  
data	
  
stabsbcs	
  
3.	
  async	
  
read	
  
Old	
  way:	
  TCP	
  Retransmits	
  
•  tcpdump of all send & receive, dump to FS, post-process
•  Overheads adds up on 10GbE+
send	
  
receive	
  
tcpdump	
   Kernel	
  
file	
  system	
  
1.	
  read	
  
2.	
  dump	
  
Analyzer	
  
1.	
  read	
  
2.	
  state	
  machine	
  
3.	
  print	
  
disks	
  
buffer	
  
New	
  way:	
  BPF	
  TCP	
  Retransmits	
  
•  Just trace the retransmit functions
•  Negligible overhead
send	
  
receive	
  
tcpretrans	
  (bcc)	
   Kernel	
  
tcp_retransmit_skb()	
  
1.	
  Config	
  BPF	
  &	
  kprobe	
  
2.	
  read,	
  print	
   send/recv	
  
as-­‐is	
  
BPF:	
  TCP	
  Retransmits	
  
# ./tcpretrans
TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED
[…]
includes	
  kernel	
  state	
  
Old:	
  Off-­‐CPU	
  Time	
  Stack	
  Profiling	
  
•  perf_events tracing of sched events, post-process
•  Despite buffering, usually high cost (>1M events/sec)
perf	
  record	
   Kernel	
  
scheduler	
  
1.	
  async	
  read	
  
2.	
  dump	
  
perf	
  inject	
  
1.	
  read	
  
2.	
  rewrite	
   disks	
  
perf	
  report/script	
  
read,	
  process,	
  print	
  
buffer	
  
file	
  system	
  
(or	
  pipe)	
  
New:	
  BPF	
  Off-­‐CPU	
  Time	
  Stacks	
  
•  Measure off-CPU time, add to map with key = stack,
value = total time. Async read map.
offcpuDme	
  (bcc)	
   Kernel	
  
1.	
  Config	
  BPF	
  &	
  kprobe	
  
2.	
  async	
  read	
  stacks	
  
3.	
  symbol	
  translate	
  
4.	
  print	
  
maps	
  
BPF	
  
scheduler	
  
finish_task_switch()	
  
Stack	
  Trace	
  Hack	
  
•  For my offcputime tool, I wrote a BPF stack walker:
"Crazy	
  Stuff"	
  
•  … using unrolled loops & goto:
BPF	
  Stack	
  Traces	
  
•  Proper BPF stack support just landed in net-next:
•  Allows more than just chain graphs
Date Sat, 20 Feb 2016 00:25:05 -0500 (EST)
Subject Re: [PATCH net-next 0/3] bpf_get_stackid() and stack_trace map
From David Miller <>
From: Alexei Starovoitov <ast@fb.com>
Date: Wed, 17 Feb 2016 19:58:56 -0800
> This patch set introduces new map type to store stack traces and
> corresponding bpf_get_stackid() helper.
...
Series applied, thanks Alexei.
memleak	
  
•  Real-time memory growth and leak analysis:
•  Uses my stack hack, but will switch to BPF stacks soon
•  By Sasha Goldshtein. Another bcc tool.
# ./memleak.py -o 10 60 1
Attaching to kmalloc and kfree, Ctrl+C to quit.
[01:27:34] Top 10 stacks with outstanding allocations:
72 bytes in 1 allocations from stack
alloc_fdtable [kernel] (ffffffff8121960f)
expand_files [kernel] (ffffffff8121986b)
sys_dup2 [kernel] (ffffffff8121a68d)
[…]
2048 bytes in 1 allocations from stack
alloc_fdtable [kernel] (ffffffff812195da)
expand_files [kernel] (ffffffff8121986b)
sys_dup2 [kernel] (ffffffff8121a68d) ]
Trace	
  for	
  60s	
  
Show	
  kernel	
  
allocabons	
  
older	
  than	
  10s	
  
that	
  were	
  not	
  
freed	
  
3.	
  bcc	
  
•  BPF Compiler Collection
–  https://github.com/iovisor/bcc
•  Python front-end, C instrumentation
•  Currently beta – in development!
•  Some example tracing tools…
execsnoop	
  
•  Trace new processes:
# ./execsnoop
PCOMM PID RET ARGS
bash 15887 0 /usr/bin/man ls
preconv 15894 0 /usr/bin/preconv -e UTF-8
man 15896 0 /usr/bin/tbl
man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8
man 15898 0 /usr/bin/pager -s
nroff 15900 0 /usr/bin/locale charmap
nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n …
groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169 …
groff 15903 0 /usr/bin/grotty
biolatency	
  
•  Block device (disk) I/O latency distribution:
# ./biolatency -mT 1 5
Tracing block device I/O... Hit Ctrl-C to end.
06:20:16
msecs : count distribution
0 -> 1 : 36 |**************************************|
2 -> 3 : 1 |* |
4 -> 7 : 3 |*** |
8 -> 15 : 17 |***************** |
16 -> 31 : 33 |********************************** |
32 -> 63 : 7 |******* |
64 -> 127 : 6 |****** |
[…]
ext4slower	
  
•  ext4 file system I/O, slower than a threshold:
# ./ext4slower 1
Tracing ext4 operations slower than 1 ms
TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME
06:49:17 bash 3616 R 128 0 7.75 cksum
06:49:17 cksum 3616 R 39552 0 1.34 [
06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7
06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4
06:49:17 cksum 3616 R 10320 0 6.82 411toppm
06:49:17 cksum 3616 R 65536 0 4.01 a2p
06:49:17 cksum 3616 R 55400 0 8.77 ab
06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14
06:49:17 cksum 3616 R 15008 0 19.31 acpi_listen
06:49:17 cksum 3616 R 6123 0 17.23 add-apt-
repository
06:49:17 cksum 3616 R 6280 0 18.40 addpart
06:49:17 cksum 3616 R 27696 0 2.16 addr2line
06:49:17 cksum 3616 R 58080 0 10.11 ag
06:49:17 cksum 3616 R 906 0 6.30 ec2-meta-data
[…]
bashreadline	
  
•  Trace bash interactive commands system-wide:
# ./bashreadline
TIME PID COMMAND
05:28:25 21176 ls -l
05:28:28 21176 date
05:28:35 21176 echo hello world
05:28:43 21176 foo this command failed
05:28:45 21176 df -h
05:29:04 3059 echo another shell
05:29:13 21176 echo first shell again
gethostlatency	
  
•  Show latency for getaddrinfo/gethostbyname[2] calls:
# ./gethostlatency
TIME PID COMM LATms HOST
06:10:24 28011 wget 90.00 www.iovisor.org
06:10:28 28127 wget 0.00 www.iovisor.org
06:10:41 28404 wget 9.00 www.netflix.com
06:10:48 28544 curl 35.00 www.netflix.com.au
06:11:10 29054 curl 31.00 www.plumgrid.com
06:11:16 29195 curl 3.00 www.facebook.com
06:11:25 29404 curl 72.00 foo
06:11:28 29475 curl 1.00 foo
trace	
  
•  Trace custom events. Ad hoc analysis multitool:
# trace 'sys_read (arg3 > 20000) "read %d bytes", arg3'
TIME PID COMM FUNC -
05:18:23 4490 dd sys_read read 1048576 bytes
05:18:23 4490 dd sys_read read 1048576 bytes
05:18:23 4490 dd sys_read read 1048576 bytes
05:18:23 4490 dd sys_read read 1048576 bytes
^C
Linux	
  bcc/BPF	
  Tracing	
  Tools	
  
4.	
  Future	
  Work	
  
•  All event sources
•  Language improvements
•  More tools: eg, TCP
•  GUI support
Linux	
  Event	
  Sources	
  
done	
  
XXX:	
  todo	
   XXX:	
  todo	
  
XXX:	
  todo	
  
done	
  
BPF/bcc	
  Language	
  Improvements	
  
More	
  Tools	
  
•  eg, netstat(8)…
$ netstat -s
Ip:
7962754 total packets received
8 with invalid addresses
0 forwarded
0 incoming packets discarded
7962746 incoming packets delivered
8019427 requests sent out
Icmp:
382 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 125
timeout in transit: 257
3410 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 3410
IcmpMsg:
InType3: 125
InType11: 257
OutType3: 3410
Tcp:
17337 active connections openings
395515 passive connection openings
8953 failed connection attempts
240214 connection resets received
3 connections established
7198375 segments received
7504939 segments send out
62696 segments retransmited
10 bad segments received.
1072 resets sent
InCsumErrors: 5
Udp:
759925 packets received
3412 packets to unknown port received.
0 packet receive errors
784370 packets sent
UdpLite:
TcpExt:
858 invalid SYN cookies received
8951 resets received for embryonic SYN_RECV sockets
14 packets pruned from receive queue because of socket buffer overrun
6177 TCP sockets finished time wait in fast timer
293 packets rejects in established connections because of timestamp
733028 delayed acks sent
89 delayed acks further delayed because of locked socket
Quick ack mode was activated 13214 times
336520 packets directly queued to recvmsg prequeue.
43964 packets directly received from backlog
11406012 packets directly received from prequeue
1039165 packets header predicted
7066 packets header predicted and directly queued to user
1428960 acknowledgments not containing data received
1004791 predicted acknowledgments
1 times recovered from packet loss due to fast retransmit
5044 times recovered from packet loss due to SACK data
2 bad SACKs received
Detected reordering 4 times using SACK
Detected reordering 11 times using time stamp
13 congestion windows fully recovered
11 congestion windows partially recovered using Hoe heuristic
TCPDSACKUndo: 39
2384 congestion windows recovered after partial ack
228 timeouts after SACK recovery
100 timeouts in loss state
5018 fast retransmits
39 forward retransmits
783 retransmits in slow start
32455 other TCP timeouts
TCPLossProbes: 30233
TCPLossProbeRecovery: 19070
992 sack retransmits failed
18 times receiver scheduled too late for direct processing
705 packets collapsed in receive queue due to low socket buffer
13658 DSACKs sent for old packets
8 DSACKs sent for out of order packets
13595 DSACKs received
33 DSACKs for out of order packets received
32 connections reset due to unexpected data
108 connections reset due to early user close
1608 connections aborted due to timeout
TCPSACKDiscard: 4
TCPDSACKIgnoredOld: 1
TCPDSACKIgnoredNoUndo: 8649
TCPSpuriousRTOs: 445
TCPSackShiftFallback: 8588
TCPRcvCoalesce: 95854
TCPOFOQueue: 24741
TCPOFOMerge: 8
TCPChallengeACK: 1441
TCPSYNChallenge: 5
TCPSpuriousRtxHostQueues: 1
TCPAutoCorking: 4823
IpExt:
InOctets: 1561561375
OutOctets: 1509416943
InNoECTPkts: 8201572
InECT1Pkts: 2
InECT0Pkts: 3844
InCEPkts: 306
Linux BPF Superpowers
BeRer	
  TCP	
  Tools	
  
•  TCP retransmit by type and time
•  Congestion algorithm metrics
•  etc.
GUI	
  Support	
  
•  eg, Netflix Vector: open source instance analyzer:
Summary	
  
•  BPF in Linux 4.x makes many new things possible
–  Stack-based thread state analysis (solve all issues!)
–  Real-time memory growth/leak detection
–  Better TCP metrics
–  etc...
•  Get involved: see iovisor/bcc
•  So far just a preview of things to come
Links	
  
•  iovisor bcc:
•  https://github.com/iovisor/bcc
•  http://www.brendangregg.com/blog/2015-09-22/bcc-linux-4.3-tracing.html
•  http://blogs.microsoft.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/
•  BPF Off-CPU, Wakeup, Off-Wake & Chain Graphs:
•  http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html
•  http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html
•  http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html
•  Linux Performance:
•  http://www.brendangregg.com/linuxperf.html
•  Linux perf_events:
•  https://perf.wiki.kernel.org/index.php/Main_Page
•  http://www.brendangregg.com/perf.html
•  Flame Graphs:
•  http://techblog.netflix.com/2015/07/java-in-flames.html
•  http://www.brendangregg.com/flamegraphs.html
•  Netflix Tech Blog on Vector:
•  http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html
•  Wordcloud: https://www.jasondavies.com/wordcloud/
Feb	
  
2016	
  
•  Questions?
•  http://slideshare.net/brendangregg
•  http://www.brendangregg.com
•  bgregg@netflix.com
•  @brendangregg
Thanks to Alexei Starovoitov (Facebook), Brenden
Blanco (PLUMgrid), Daniel Borkmann (Cisco), Wang
Nan (Huawei), Sasha Goldshtein (Sela), and other
BPF and bcc contributors!

More Related Content

PDF
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
PDF
BPF: Tracing and more
Brendan Gregg
 
PDF
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
PDF
DPDK in Containers Hands-on Lab
Michelle Holley
 
PDF
Introduction to eBPF
RogerColl2
 
PDF
eBPF/XDP
Netronome
 
ODP
eBPF maps 101
SUSE Labs Taipei
 
PDF
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
BPF: Tracing and more
Brendan Gregg
 
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
DPDK in Containers Hands-on Lab
Michelle Holley
 
Introduction to eBPF
RogerColl2
 
eBPF/XDP
Netronome
 
eBPF maps 101
SUSE Labs Taipei
 
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 

What's hot (20)

PDF
BPF - in-kernel virtual machine
Alexei Starovoitov
 
PDF
EBPF and Linux Networking
PLUMgrid
 
PDF
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
PDF
Introduction to eBPF and XDP
lcplcp1
 
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
PPTX
eBPF Basics
Michael Kehoe
 
PDF
Xdp and ebpf_maps
lcplcp1
 
PDF
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
PPTX
Understanding eBPF in a Hurry!
Ray Jenkins
 
PDF
BPF Internals (eBPF)
Brendan Gregg
 
PPTX
Understanding DPDK
Denys Haryachyy
 
PPTX
Linux Network Stack
Adrien Mahieux
 
PDF
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
PDF
Linux Networking Explained
Thomas Graf
 
PDF
YOW2021 Computing Performance
Brendan Gregg
 
PPTX
Dataplane programming with eBPF: architecture and tools
Stefano Salsano
 
PDF
Building Network Functions with eBPF & BCC
Kernel TLV
 
PDF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
PDF
Linux kernel tracing
Viller Hsiao
 
BPF - in-kernel virtual machine
Alexei Starovoitov
 
EBPF and Linux Networking
PLUMgrid
 
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
Introduction to eBPF and XDP
lcplcp1
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
eBPF Basics
Michael Kehoe
 
Xdp and ebpf_maps
lcplcp1
 
Systems@Scale 2021 BPF Performance Getting Started
Brendan Gregg
 
Understanding eBPF in a Hurry!
Ray Jenkins
 
BPF Internals (eBPF)
Brendan Gregg
 
Understanding DPDK
Denys Haryachyy
 
Linux Network Stack
Adrien Mahieux
 
New Ways to Find Latency in Linux Using Tracing
ScyllaDB
 
Linux Networking Explained
Thomas Graf
 
YOW2021 Computing Performance
Brendan Gregg
 
Dataplane programming with eBPF: architecture and tools
Stefano Salsano
 
Building Network Functions with eBPF & BCC
Kernel TLV
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
Linux kernel tracing
Viller Hsiao
 
Ad

Viewers also liked (20)

PDF
Velocity 2015 linux perf tools
Brendan Gregg
 
PDF
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 
PDF
ACM Applicative System Methodology 2016
Brendan Gregg
 
PDF
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg
 
PDF
Netflix: From Clouds to Roots
Brendan Gregg
 
PDF
SREcon 2016 Performance Checklists for SREs
Brendan Gregg
 
POTX
Performance Tuning EC2 Instances
Brendan Gregg
 
PDF
Blazing Performance with Flame Graphs
Brendan Gregg
 
PDF
Linux Profiling at Netflix
Brendan Gregg
 
PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PDF
Linux Systems Performance 2016
Brendan Gregg
 
PPTX
Broken Linux Performance Tools 2016
Brendan Gregg
 
PDF
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
PPTX
Berkeley Packet Filters
Kernel TLV
 
DOC
Storage Area Network interview Questions
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
PDF
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
PDF
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
PDF
Docker networking
Badoo Development
 
PDF
Мониторь, автоматизируй Docker
Badoo Development
 
PDF
Docker в Badoo: ПМЖ или временная регистрация
Badoo Development
 
Velocity 2015 linux perf tools
Brendan Gregg
 
Velocity 2017 Performance analysis superpowers with Linux eBPF
Brendan Gregg
 
ACM Applicative System Methodology 2016
Brendan Gregg
 
Stop the Guessing: Performance Methodologies for Production Systems
Brendan Gregg
 
Netflix: From Clouds to Roots
Brendan Gregg
 
SREcon 2016 Performance Checklists for SREs
Brendan Gregg
 
Performance Tuning EC2 Instances
Brendan Gregg
 
Blazing Performance with Flame Graphs
Brendan Gregg
 
Linux Profiling at Netflix
Brendan Gregg
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Linux Systems Performance 2016
Brendan Gregg
 
Broken Linux Performance Tools 2016
Brendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Berkeley Packet Filters
Kernel TLV
 
Storage Area Network interview Questions
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Running Hadoop as Service in AltiScale Platform
InMobi Technology
 
ACM DEBS 2015: Realtime Streaming Analytics Patterns
Srinath Perera
 
Docker networking
Badoo Development
 
Мониторь, автоматизируй Docker
Badoo Development
 
Docker в Badoo: ПМЖ или временная регистрация
Badoo Development
 
Ad

Similar to Linux BPF Superpowers (20)

PDF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg
 
PDF
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
PDF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
PDF
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project
 
PDF
BPF Tools 2017
Brendan Gregg
 
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
PDF
ATO Linux Performance 2018
Brendan Gregg
 
PDF
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Anne Nicolas
 
PDF
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
PDF
Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For by Ta...
ScyllaDB
 
PDF
NetConf 2018 BPF Observability
Brendan Gregg
 
PDF
eBPF Perf Tools 2019
Brendan Gregg
 
PDF
Linux Tracing Superpowers by Eugene Pirogov
Pivorak MeetUp
 
PDF
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
PDF
eBPF in the view of a storage developer
Richárd Kovács
 
PDF
Kernel bug hunting
Andrea Righi
 
PDF
Security Monitoring with eBPF
Alex Maestretti
 
PDF
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
PDF
LISA2019 Linux Systems Performance
Brendan Gregg
 
PDF
test
WentingLiu4
 
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
Brendan Gregg
 
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
Brendan Gregg
 
bcc/BPF tools - Strategy, current tools, future challenges
IO Visor Project
 
BPF Tools 2017
Brendan Gregg
 
Linux 4.x Tracing Tools: Using BPF Superpowers
Brendan Gregg
 
ATO Linux Performance 2018
Brendan Gregg
 
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
Anne Nicolas
 
Kernel Recipes 2017: Performance Analysis with BPF
Brendan Gregg
 
Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For by Ta...
ScyllaDB
 
NetConf 2018 BPF Observability
Brendan Gregg
 
eBPF Perf Tools 2019
Brendan Gregg
 
Linux Tracing Superpowers by Eugene Pirogov
Pivorak MeetUp
 
re:Invent 2019 BPF Performance Analysis at Netflix
Brendan Gregg
 
eBPF in the view of a storage developer
Richárd Kovács
 
Kernel bug hunting
Andrea Righi
 
Security Monitoring with eBPF
Alex Maestretti
 
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
LISA2019 Linux Systems Performance
Brendan Gregg
 

More from Brendan Gregg (14)

PDF
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
PDF
Computing Performance: On the Horizon (2021)
Brendan Gregg
 
PDF
Performance Wins with BPF: Getting Started
Brendan Gregg
 
PDF
YOW2020 Linux Systems Performance
Brendan Gregg
 
PDF
LPC2019 BPF Tracing Tools
Brendan Gregg
 
PDF
LSFMM 2019 BPF Observability
Brendan Gregg
 
PDF
YOW2018 CTO Summit: Working at netflix
Brendan Gregg
 
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
PDF
FlameScope 2018
Brendan Gregg
 
PDF
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
PDF
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
PDF
LISA17 Container Performance Analysis
Brendan Gregg
 
PDF
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg
 
PDF
USENIX ATC 2017: Visualizing Performance with Flame Graphs
Brendan Gregg
 
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
Computing Performance: On the Horizon (2021)
Brendan Gregg
 
Performance Wins with BPF: Getting Started
Brendan Gregg
 
YOW2020 Linux Systems Performance
Brendan Gregg
 
LPC2019 BPF Tracing Tools
Brendan Gregg
 
LSFMM 2019 BPF Observability
Brendan Gregg
 
YOW2018 CTO Summit: Working at netflix
Brendan Gregg
 
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Brendan Gregg
 
FlameScope 2018
Brendan Gregg
 
Linux Performance 2018 (PerconaLive keynote)
Brendan Gregg
 
How Netflix Tunes EC2 Instances for Performance
Brendan Gregg
 
LISA17 Container Performance Analysis
Brendan Gregg
 
EuroBSDcon 2017 System Performance Analysis Methodologies
Brendan Gregg
 
USENIX ATC 2017: Visualizing Performance with Flame Graphs
Brendan Gregg
 

Recently uploaded (20)

PDF
Doc9.....................................
SofiaCollazos
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
Doc9.....................................
SofiaCollazos
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
The Future of Artificial Intelligence (AI)
Mukul
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Simple and concise overview about Quantum computing..pptx
mughal641
 

Linux BPF Superpowers

  • 1. Linux  4.x  Performance   Using  BPF  Superpowers   Brendan Gregg Senior Performance Architect Feb   2016  
  • 2. Ten  years  ago,   I  gave  a  talk  here     about  DTrace  tools…  
  • 4. Superpowers  are  coming  to  Linux     Solve performance issues that were previously impossible For example, full off-CPU analysis…
  • 7. Ideal  Thread  States   A starting point for deeper analysis
  • 8. Linux  Thread  States   Based on: TASK_RUNNING TASK_INTERRUPTIBLE TASK_UNINTERRUPTIBLE Still a useful starting point
  • 9. Linux  On-­‐CPU  Analysis   CPU  Flame  Graph   •  I'll start with on-CPU analysis: •  Split into user/kernel states using /proc, mpstat(1), ... •  perf_events ("perf") to analyze further: –  User & kernel stack sampling (as a CPU flame graph) –  CPI –  Should be easy, but…
  • 12. Java  Mixed-­‐Mode  CPU  Flame  Graph   Java JVM Kernel GC •  Fixed! –  Java –XX:+PreserveFramePointer –  Java perf-map-agent –  Linux perf_events
  • 14. Also,  CPI  Flame  Graph   Cycles Per Instruction -  red == instruction heavy -  blue == cycle heavy (likely mem stalls) zoomed:
  • 15. Linux  Off-­‐CPU  Analysis   On Linux, the state isn't helpful, but the code path is Off-CPU analysis by measuring blocked time with stack traces
  • 16. Off-­‐CPU  Time  Flame  Graph   From  hRp://www.brendangregg.com/blog/2016-­‐02-­‐01/linux-­‐wakeup-­‐offwake-­‐profiling.html   Stack depth Off-CPU time
  • 17. Off-­‐CPU  Time  (zoomed):  tar(1)   file read from disk directory read from disk Currently kernel stacks only; user stacks will add more context pipe write path read from disk fstat from disk
  • 18. Off-­‐CPU  Time:  more  states   lock contention sleep run queue latency Flame graph quantifies total time spent in states
  • 19. CPU  +  Off-­‐CPU  ==  See  Everything?  
  • 20. Off-­‐CPU  Time  (zoomed):  gzip(1)   Off-CPU doesn't always make sense: what is gzip blocked on?
  • 21. Wakeup  Time  Flame  Graph  
  • 22. Wakeup  Time  (zoomed):  gzip(1)   gzip(1) is blocked on tar(1)! tar cf - * | gzip > out.tar.gz Can't we associate off-CPU with wakeup stacks?
  • 24. Wakeup stacks are associated and merged in-kernel using BPF We couldn't do this before
  • 26. •  One wakeup stack is often not enough… •  Who woke the waker? Haven't  Solved  Everything  Yet…  
  • 28. Merging multiple wakeup stacks in kernel using BPF With enough stacks, all paths lead to metal
  • 29. Solve  Everything   CPU + off-CPU analysis can solve most issues Flame graph (profiling) types: 1.  CPU 2.  CPI 3.  Off-CPU time 4.  Wakeup time 5.  Off-wake time 6.  Chain BPF makes this all more practical different off-CPU analysis views, with more context and increasing measurement cost
  • 30. 2.  BPF   "One  of  the  more  interesbng  features  in  this   cycle  is  the  ability  to  aRach  eBPF  programs   (user-­‐defined,  sandboxed  bytecode  executed   by  the  kernel)  to  kprobes.  This  allows  user-­‐ defined  instrumentabon  on  a  live  kernel  image   that  can  never  crash,  hang  or  interfere  with  the   kernel  negabvely."   –  Ingo  Molnár  (Linux  developer)   Source:  hRps://lkml.org/lkml/2015/4/14/232  
  • 31. 2.  BPF   "crazy  stuff"   –  Alexei  Starovoitov  (eBPF  lead)   Source:  hRp://www.slideshare.net/AlexeiStarovoitov/bpf-­‐inkernel-­‐virtual-­‐machine  
  • 32. BPF   •  eBPF == enhanced Berkeley Packet Filter; now just BPF •  Integrated into Linux (in stages: 3.15, 3.19, 4.1, 4.5, …) •  Uses –  virtual networking –  tracing –  "crazy stuff" •  Front-ends –  samples/bpf (raw) –  bcc: Python, C –  Linux perf_events BPF  mascot  
  • 33. BPF  for  Tracing   •  Can do per-event output and in-kernel summary statistics (histograms, etc). BPF  bytecode   User  Program   1.  generate   2.  load   Kernel   kprobes   uprobes   tracepoints   BPF   maps   perf_output   per-­‐ event   data   stabsbcs   3.  async   read  
  • 34. Old  way:  TCP  Retransmits   •  tcpdump of all send & receive, dump to FS, post-process •  Overheads adds up on 10GbE+ send   receive   tcpdump   Kernel   file  system   1.  read   2.  dump   Analyzer   1.  read   2.  state  machine   3.  print   disks   buffer  
  • 35. New  way:  BPF  TCP  Retransmits   •  Just trace the retransmit functions •  Negligible overhead send   receive   tcpretrans  (bcc)   Kernel   tcp_retransmit_skb()   1.  Config  BPF  &  kprobe   2.  read,  print   send/recv   as-­‐is  
  • 36. BPF:  TCP  Retransmits   # ./tcpretrans TIME PID IP LADDR:LPORT T> RADDR:RPORT STATE 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED 01:55:05 0 4 10.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED 01:55:17 0 4 10.153.223.157:22 R> 69.53.245.40:22957 ESTABLISHED […] includes  kernel  state  
  • 37. Old:  Off-­‐CPU  Time  Stack  Profiling   •  perf_events tracing of sched events, post-process •  Despite buffering, usually high cost (>1M events/sec) perf  record   Kernel   scheduler   1.  async  read   2.  dump   perf  inject   1.  read   2.  rewrite   disks   perf  report/script   read,  process,  print   buffer   file  system   (or  pipe)  
  • 38. New:  BPF  Off-­‐CPU  Time  Stacks   •  Measure off-CPU time, add to map with key = stack, value = total time. Async read map. offcpuDme  (bcc)   Kernel   1.  Config  BPF  &  kprobe   2.  async  read  stacks   3.  symbol  translate   4.  print   maps   BPF   scheduler   finish_task_switch()  
  • 39. Stack  Trace  Hack   •  For my offcputime tool, I wrote a BPF stack walker:
  • 40. "Crazy  Stuff"   •  … using unrolled loops & goto:
  • 41. BPF  Stack  Traces   •  Proper BPF stack support just landed in net-next: •  Allows more than just chain graphs Date Sat, 20 Feb 2016 00:25:05 -0500 (EST) Subject Re: [PATCH net-next 0/3] bpf_get_stackid() and stack_trace map From David Miller <> From: Alexei Starovoitov <[email protected]> Date: Wed, 17 Feb 2016 19:58:56 -0800 > This patch set introduces new map type to store stack traces and > corresponding bpf_get_stackid() helper. ... Series applied, thanks Alexei.
  • 42. memleak   •  Real-time memory growth and leak analysis: •  Uses my stack hack, but will switch to BPF stacks soon •  By Sasha Goldshtein. Another bcc tool. # ./memleak.py -o 10 60 1 Attaching to kmalloc and kfree, Ctrl+C to quit. [01:27:34] Top 10 stacks with outstanding allocations: 72 bytes in 1 allocations from stack alloc_fdtable [kernel] (ffffffff8121960f) expand_files [kernel] (ffffffff8121986b) sys_dup2 [kernel] (ffffffff8121a68d) […] 2048 bytes in 1 allocations from stack alloc_fdtable [kernel] (ffffffff812195da) expand_files [kernel] (ffffffff8121986b) sys_dup2 [kernel] (ffffffff8121a68d) ] Trace  for  60s   Show  kernel   allocabons   older  than  10s   that  were  not   freed  
  • 43. 3.  bcc   •  BPF Compiler Collection –  https://github.com/iovisor/bcc •  Python front-end, C instrumentation •  Currently beta – in development! •  Some example tracing tools…
  • 44. execsnoop   •  Trace new processes: # ./execsnoop PCOMM PID RET ARGS bash 15887 0 /usr/bin/man ls preconv 15894 0 /usr/bin/preconv -e UTF-8 man 15896 0 /usr/bin/tbl man 15897 0 /usr/bin/nroff -mandoc -rLL=169n -rLT=169n -Tutf8 man 15898 0 /usr/bin/pager -s nroff 15900 0 /usr/bin/locale charmap nroff 15901 0 /usr/bin/groff -mtty-char -Tutf8 -mandoc -rLL=169n … groff 15902 0 /usr/bin/troff -mtty-char -mandoc -rLL=169n -rLT=169 … groff 15903 0 /usr/bin/grotty
  • 45. biolatency   •  Block device (disk) I/O latency distribution: # ./biolatency -mT 1 5 Tracing block device I/O... Hit Ctrl-C to end. 06:20:16 msecs : count distribution 0 -> 1 : 36 |**************************************| 2 -> 3 : 1 |* | 4 -> 7 : 3 |*** | 8 -> 15 : 17 |***************** | 16 -> 31 : 33 |********************************** | 32 -> 63 : 7 |******* | 64 -> 127 : 6 |****** | […]
  • 46. ext4slower   •  ext4 file system I/O, slower than a threshold: # ./ext4slower 1 Tracing ext4 operations slower than 1 ms TIME COMM PID T BYTES OFF_KB LAT(ms) FILENAME 06:49:17 bash 3616 R 128 0 7.75 cksum 06:49:17 cksum 3616 R 39552 0 1.34 [ 06:49:17 cksum 3616 R 96 0 5.36 2to3-2.7 06:49:17 cksum 3616 R 96 0 14.94 2to3-3.4 06:49:17 cksum 3616 R 10320 0 6.82 411toppm 06:49:17 cksum 3616 R 65536 0 4.01 a2p 06:49:17 cksum 3616 R 55400 0 8.77 ab 06:49:17 cksum 3616 R 36792 0 16.34 aclocal-1.14 06:49:17 cksum 3616 R 15008 0 19.31 acpi_listen 06:49:17 cksum 3616 R 6123 0 17.23 add-apt- repository 06:49:17 cksum 3616 R 6280 0 18.40 addpart 06:49:17 cksum 3616 R 27696 0 2.16 addr2line 06:49:17 cksum 3616 R 58080 0 10.11 ag 06:49:17 cksum 3616 R 906 0 6.30 ec2-meta-data […]
  • 47. bashreadline   •  Trace bash interactive commands system-wide: # ./bashreadline TIME PID COMMAND 05:28:25 21176 ls -l 05:28:28 21176 date 05:28:35 21176 echo hello world 05:28:43 21176 foo this command failed 05:28:45 21176 df -h 05:29:04 3059 echo another shell 05:29:13 21176 echo first shell again
  • 48. gethostlatency   •  Show latency for getaddrinfo/gethostbyname[2] calls: # ./gethostlatency TIME PID COMM LATms HOST 06:10:24 28011 wget 90.00 www.iovisor.org 06:10:28 28127 wget 0.00 www.iovisor.org 06:10:41 28404 wget 9.00 www.netflix.com 06:10:48 28544 curl 35.00 www.netflix.com.au 06:11:10 29054 curl 31.00 www.plumgrid.com 06:11:16 29195 curl 3.00 www.facebook.com 06:11:25 29404 curl 72.00 foo 06:11:28 29475 curl 1.00 foo
  • 49. trace   •  Trace custom events. Ad hoc analysis multitool: # trace 'sys_read (arg3 > 20000) "read %d bytes", arg3' TIME PID COMM FUNC - 05:18:23 4490 dd sys_read read 1048576 bytes 05:18:23 4490 dd sys_read read 1048576 bytes 05:18:23 4490 dd sys_read read 1048576 bytes 05:18:23 4490 dd sys_read read 1048576 bytes ^C
  • 51. 4.  Future  Work   •  All event sources •  Language improvements •  More tools: eg, TCP •  GUI support
  • 52. Linux  Event  Sources   done   XXX:  todo   XXX:  todo   XXX:  todo   done  
  • 54. More  Tools   •  eg, netstat(8)… $ netstat -s Ip: 7962754 total packets received 8 with invalid addresses 0 forwarded 0 incoming packets discarded 7962746 incoming packets delivered 8019427 requests sent out Icmp: 382 ICMP messages received 0 input ICMP message failed. ICMP input histogram: destination unreachable: 125 timeout in transit: 257 3410 ICMP messages sent 0 ICMP messages failed ICMP output histogram: destination unreachable: 3410 IcmpMsg: InType3: 125 InType11: 257 OutType3: 3410 Tcp: 17337 active connections openings 395515 passive connection openings 8953 failed connection attempts 240214 connection resets received 3 connections established 7198375 segments received 7504939 segments send out 62696 segments retransmited 10 bad segments received. 1072 resets sent InCsumErrors: 5 Udp: 759925 packets received 3412 packets to unknown port received. 0 packet receive errors 784370 packets sent UdpLite: TcpExt: 858 invalid SYN cookies received 8951 resets received for embryonic SYN_RECV sockets 14 packets pruned from receive queue because of socket buffer overrun 6177 TCP sockets finished time wait in fast timer 293 packets rejects in established connections because of timestamp 733028 delayed acks sent 89 delayed acks further delayed because of locked socket Quick ack mode was activated 13214 times 336520 packets directly queued to recvmsg prequeue. 43964 packets directly received from backlog 11406012 packets directly received from prequeue 1039165 packets header predicted 7066 packets header predicted and directly queued to user 1428960 acknowledgments not containing data received 1004791 predicted acknowledgments 1 times recovered from packet loss due to fast retransmit 5044 times recovered from packet loss due to SACK data 2 bad SACKs received Detected reordering 4 times using SACK Detected reordering 11 times using time stamp 13 congestion windows fully recovered 11 congestion windows partially recovered using Hoe heuristic TCPDSACKUndo: 39 2384 congestion windows recovered after partial ack 228 timeouts after SACK recovery 100 timeouts in loss state 5018 fast retransmits 39 forward retransmits 783 retransmits in slow start 32455 other TCP timeouts TCPLossProbes: 30233 TCPLossProbeRecovery: 19070 992 sack retransmits failed 18 times receiver scheduled too late for direct processing 705 packets collapsed in receive queue due to low socket buffer 13658 DSACKs sent for old packets 8 DSACKs sent for out of order packets 13595 DSACKs received 33 DSACKs for out of order packets received 32 connections reset due to unexpected data 108 connections reset due to early user close 1608 connections aborted due to timeout TCPSACKDiscard: 4 TCPDSACKIgnoredOld: 1 TCPDSACKIgnoredNoUndo: 8649 TCPSpuriousRTOs: 445 TCPSackShiftFallback: 8588 TCPRcvCoalesce: 95854 TCPOFOQueue: 24741 TCPOFOMerge: 8 TCPChallengeACK: 1441 TCPSYNChallenge: 5 TCPSpuriousRtxHostQueues: 1 TCPAutoCorking: 4823 IpExt: InOctets: 1561561375 OutOctets: 1509416943 InNoECTPkts: 8201572 InECT1Pkts: 2 InECT0Pkts: 3844 InCEPkts: 306
  • 56. BeRer  TCP  Tools   •  TCP retransmit by type and time •  Congestion algorithm metrics •  etc.
  • 57. GUI  Support   •  eg, Netflix Vector: open source instance analyzer:
  • 58. Summary   •  BPF in Linux 4.x makes many new things possible –  Stack-based thread state analysis (solve all issues!) –  Real-time memory growth/leak detection –  Better TCP metrics –  etc... •  Get involved: see iovisor/bcc •  So far just a preview of things to come
  • 59. Links   •  iovisor bcc: •  https://github.com/iovisor/bcc •  http://www.brendangregg.com/blog/2015-09-22/bcc-linux-4.3-tracing.html •  http://blogs.microsoft.co.il/sasha/2016/02/14/two-new-ebpf-tools-memleak-and-argdist/ •  BPF Off-CPU, Wakeup, Off-Wake & Chain Graphs: •  http://www.brendangregg.com/blog/2016-01-20/ebpf-offcpu-flame-graph.html •  http://www.brendangregg.com/blog/2016-02-01/linux-wakeup-offwake-profiling.html •  http://www.brendangregg.com/blog/2016-02-05/ebpf-chaingraph-prototype.html •  Linux Performance: •  http://www.brendangregg.com/linuxperf.html •  Linux perf_events: •  https://perf.wiki.kernel.org/index.php/Main_Page •  http://www.brendangregg.com/perf.html •  Flame Graphs: •  http://techblog.netflix.com/2015/07/java-in-flames.html •  http://www.brendangregg.com/flamegraphs.html •  Netflix Tech Blog on Vector: •  http://techblog.netflix.com/2015/04/introducing-vector-netflixs-on-host.html •  Wordcloud: https://www.jasondavies.com/wordcloud/
  • 60. Feb   2016   •  Questions? •  http://slideshare.net/brendangregg •  http://www.brendangregg.com •  [email protected] •  @brendangregg Thanks to Alexei Starovoitov (Facebook), Brenden Blanco (PLUMgrid), Daniel Borkmann (Cisco), Wang Nan (Huawei), Sasha Goldshtein (Sela), and other BPF and bcc contributors!