SlideShare a Scribd company logo
Shmulik Ladkani, 2018
Building Network Functions with eBPF & BCC
This work is licensed under a Creative Commons Attribution 4.0 International License.
Agenda
● Intro
● Theory
○ Classical BPF
○ eBPF
○ BCC
● Practice
○ Examples and demo
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
Berkeley Packet Filter
Berkeley Packet Filter
New Architecture for User-level Packet Capture
● McCanne/Jacobson 1993
● Standardized API
● Performant
Berkeley Packet Filter
● Allows user program to attach a filter onto a socket
● Available on most *nix systems
Design
● Abstract-machine architecture
○ Registers, memory, addressing modes…
○ Instruction set (load, store, branch, ALU…)
● In-kernel interpreter
Example program: assembly / machine instructions
(000) ldh [12] { 0x28, 0, 0, 0x0000000c },
(001) jeq #0x800 jt 2 jf 5 { 0x15, 0, 3, 0x00000800 },
(002) ldb [23] { 0x30, 0, 0, 0x00000017 },
(003) jeq #0x6 jt 4 jf 5 { 0x15, 0, 1, 0x00000006 },
(004) ret #262144 { 0x6, 0, 0, 0x00040000 },
(005) ret #0 { 0x6, 0, 0, 0x00000000 },
Modus Operandi
struct sock_filter code[] = {
/* ... machine instructions ... */
};
struct sock_fprog bpf = {
.filter = code,
.len = ARRAY_SIZE(code),
};
sock = socket(...);
setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
Applications
● Libpcap
○ Tcpdump, Wireshark, Nmap...
● DHCP stacks
● WPA 802.1x stacks
● Android 464XLAT
● android.net.NetworkUtils
● Custom user-space protocol stacks
Linux Enhancements
Packet Metadata Access
Extension Description
len skb->len
proto skb->protocol
type skb->pkt_type
ifidx skb->dev->ifindex
hatype skb->dev->type
mark skb->mark
rxhash skb->hash
vlan_tci skb_vlan_tag_get(skb)
vlan_avail skb_vlan_tag_present(skb)
vlan_tpid skb->vlan_proto
nla Netlink attribute of type X with offset A
nlan Nested Netlink attribute of type X with offset A
Linux Enhancements
Just-In-Time Compiler
● Converts BPF instructions directly into native code
● As of v3.0 (x86_64)
○ SPARC, PowerPC, ARM, ARM64, MIPS, s390 followed
Linux Enhancements
Hooking Points
● IPTables xt_bpf
○ Competitive with traditional u32 match
○ As of v3.9
○ iptables -A OUTPUT 
-m bpf --bytecode '4,48 0 0 9,21 0 1 6,6 0 0 1,6 0 0 0' -j ACCEPT
● TC cls_bpf
○ Alternative to ematch / u32 classification
○ As of v3.13
○ tc filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,' flowid 1:1
tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1
Linux Enhancements
Seccomp BPF
● Filters system calls using a BPF filter
○ Operates on syscall number and syscall arguments
○ As of v3.5
○
● Used by Chrome, Firefox, OpenSSH, Android…
static struct filter = {
/* ... */
// load syscall number
BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)),
// only allow ‘read’
BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_read, 0, 1),
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
};
/* ... */
prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &filterprog);
Summary
● Fixed filter program
● Few injection points
● Two domains
○ Packet filtering
○ Syscall filtering
● Functional, stateless
● Kernel data is immutable
● No kernel interaction
User-program injected into kernel to control behavior
Extended BPF
eBPF
● Abstract-machine engine running injected user programs
● On steroids
○ New domain (tracing/profiling)
○ Numerous hooking points
○ LLVM backend
○ Actions (mutates data)
○ Data-structures (“maps”)
○ Kernel callable helper functions
Applications (network)
● Network Security (DDoS, IDS, IPS …)
● Load Balancers
● Custom Statistics
● Monitoring
● Container Networking
● Custom Forwarding Stacks
● Network Functions
● Write
○ Restricted C
● Compile
○ clang & llc
● Load
○ bpf(BPF_PROG_LOAD, ...)
● Attach
○ Subsystem dependent
Modus Operandi
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 256,
};
SEC("socket1") int bpf_prog1(struct __sk_buff *skb)
{
int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
long *value;
if (skb->pkt_type != PACKET_OUTGOING)
return 0;
value = bpf_map_lookup_elem(&my_map, &index);
if (value)
__sync_fetch_and_add(value, skb->len);
return 0;
}
samples/bpf/sockex1_kern.c
load_bpf_file(filename); // assigns prog_fd, map_fd
sock = open_raw_sock("lo");
setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0]));
f = popen("ping -c5 localhost", "r");
for (i = 0; i < 5; i++) {
long long tcp_cnt, udp_cnt, icmp_cnt;
key = IPPROTO_TCP;
bpf_map_lookup_elem(map_fd[0], &key, &tcp_cnt);
key = IPPROTO_UDP;
bpf_map_lookup_elem(map_fd[0], &key, &udp_cnt);
key = IPPROTO_ICMP;
bpf_map_lookup_elem(map_fd[0], &key, &icmp_cnt);
printf("TCP %lld UDP %lld ICMP %lld bytesn", tcp_cnt, udp_cnt, icmp_cnt);
sleep(1);
}
samples/bpf/sockex1_user.c
eBPF Maps
● Key-value store
○ Keeps program state
○ Accessible from the eBPF program
○ Accessible from userspace
● Allows context aware behavior
● Numerous data structures
BPF_MAP_TYPE_HASH
BPF_MAP_TYPE_ARRAY
BPF_MAP_TYPE_LRU_HASH
BPF_MAP_TYPE_LPM_TRIE
more ...
Determines: context, whence, access rights
BPF_PROG_TYPE_SOCKET_FILTER packet filter
BPF_PROG_TYPE_SCHED_CLS tc classifier
BPF_PROG_TYPE_SCHED_ACT tc action
BPF_PROG_TYPE_LWT_* lightweight tunnel filter
BPF_PROG_TYPE_KPROBE kprobe filter
BPF_PROG_TYPE_TRACEPOINT tracepoint filter
BPF_PROG_TYPE_PERF_EVENT perf event filter
BPF_PROG_TYPE_XDP packet filter from XDP
BPF_PROG_TYPE_CGROUP_SKB packet filter for control groups
BPF_PROG_TYPE_CGROUP_SOCK same, allowed to modify socket options
Program Types
Helper Functions
● eBPF program may call a predefined set of functions
● Differs by program type
● Examples:
BPF_FUNC_skb_load_bytes
BPF_FUNC_csum_diff
BPF_FUNC_skb_get_tunnel_key
BPF_FUNC_get_hash_recalc
...
BPF_FUNC_skb_store_bytes
BPF_FUNC_skb_pull_data
BPF_FUNC_l3_csum_replace
BPF_FUNC_l4_csum_replace
BPF_FUNC_redirect
BPF_FUNC_clone_redirect
BPF_FUNC_skb_vlan_push
BPF_FUNC_skb_vlan_pop
BPF_FUNC_skb_change_proto
BPF_FUNC_skb_set_tunnel_key
...
BCC
BPF Compiler Collection
● Toolkit for creating and using eBPF
● Makes eBPF programs easier to write
○ Kernel instrumentation in C
○ Frontends in Python and Lua
● Numerous examples
● Documentation and tutorials
Example #1
Custom Statistics
Histogram of packets by their size
Example #2
Custom Filtering
Drop egress ARP Requests for specific Target Addresses
Example #3
Custom Network Function
Network Load Balancer
Example #3 - Topology
Server1
VIP 192.0.2.50
10.50.1.9
Server2
VIP 192.0.2.50
10.50.2.9
Test Machine
10.33.33.10
10.33.33.11
10.33.33.12
10.33.33.13
10.33.33.14
Load Balancer
192.0.2.50 dev multigre0
Set GRE tunnel destination by flow hash
Src: 10.33.33.10
Dst: 192.0.2.50
Src: 10.50.1.1
Dst: 10.50.1.9
Src: 10.33.33.10
Dst: 192.0.2.50
Further Topics
● bpfilter
● Open vSwitch eBPF datapath
● XDP
● Hardware Offloads
● Tracing / Profiling
Thank You!

More Related Content

What's hot (20)

PDF
eBPF in the view of a storage developer
Richárd Kovács
 
PPTX
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
PDF
netfilter and iptables
Kernel TLV
 
PDF
BPF Internals (eBPF)
Brendan Gregg
 
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
PDF
Replacing iptables with eBPF in Kubernetes with Cilium
Michal Rostecki
 
PPTX
Linux Network Stack
Adrien Mahieux
 
PDF
Ixgbe internals
SUSE Labs Taipei
 
PDF
Faster packet processing in Linux: XDP
Daniel T. Lee
 
PDF
Meetup 2009
HuaiEnTseng
 
PDF
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
PDF
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
PDF
BPF - in-kernel virtual machine
Alexei Starovoitov
 
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
PDF
Using eBPF for High-Performance Networking in Cilium
ScyllaDB
 
PDF
Linux Linux Traffic Control
SUSE Labs Taipei
 
PDF
DoS and DDoS mitigations with eBPF, XDP and DPDK
Marian Marinov
 
PDF
Introduction to eBPF
RogerColl2
 
PPTX
eBPF Workshop
Michael Kehoe
 
PDF
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
eBPF in the view of a storage developer
Richárd Kovács
 
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
netfilter and iptables
Kernel TLV
 
BPF Internals (eBPF)
Brendan Gregg
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
Thomas Graf
 
Replacing iptables with eBPF in Kubernetes with Cilium
Michal Rostecki
 
Linux Network Stack
Adrien Mahieux
 
Ixgbe internals
SUSE Labs Taipei
 
Faster packet processing in Linux: XDP
Daniel T. Lee
 
Meetup 2009
HuaiEnTseng
 
Performance Wins with eBPF: Getting Started (2021)
Brendan Gregg
 
Linux 4.x Tracing: Performance Analysis with bcc/BPF
Brendan Gregg
 
BPF - in-kernel virtual machine
Alexei Starovoitov
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Using eBPF for High-Performance Networking in Cilium
ScyllaDB
 
Linux Linux Traffic Control
SUSE Labs Taipei
 
DoS and DDoS mitigations with eBPF, XDP and DPDK
Marian Marinov
 
Introduction to eBPF
RogerColl2
 
eBPF Workshop
Michael Kehoe
 
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 

Similar to Building Network Functions with eBPF & BCC (20)

PPTX
Berkeley Packet Filters
Kernel TLV
 
PPTX
eBPF Basics
Michael Kehoe
 
PDF
Introduction to eBPF and XDP
lcplcp1
 
PDF
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
PDF
DEF CON 27 - JEFF DILEO - evil e bpf in depth
Felipe Prado
 
PDF
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
PDF
BPF - All your packets belong to me
_xhr_
 
PDF
eBPF Tooling and Debugging Infrastructure
Netronome
 
PPTX
Dataplane programming with eBPF: architecture and tools
Stefano Salsano
 
PDF
Socket Programming- Data Link Access
LJ PROJECTS
 
PDF
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
 
PPTX
Making our networking stack truly extensible
Olivier Bonaventure
 
PDF
DCSF 19 eBPF Superpowers
Docker, Inc.
 
PDF
eBPF — Divulging The Hidden Super Power.pdf
SGBSeo
 
PDF
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
PDF
Andrea Righi - Spying on the Linux kernel for fun and profit
linuxlab_conf
 
PDF
Spying on the Linux kernel for fun and profit
Andrea Righi
 
PDF
eBPF/XDP
Netronome
 
PDF
eBPF — Divulging The Hidden Super Power.pdf
seo18
 
PDF
Transparent eBPF Offload: Playing Nice with the Linux Kernel
Open-NFP
 
Berkeley Packet Filters
Kernel TLV
 
eBPF Basics
Michael Kehoe
 
Introduction to eBPF and XDP
lcplcp1
 
Efficient System Monitoring in Cloud Native Environments
Gergely Szabó
 
DEF CON 27 - JEFF DILEO - evil e bpf in depth
Felipe Prado
 
ebpf and IO Visor: The What, how, and what next!
Affan Syed
 
BPF - All your packets belong to me
_xhr_
 
eBPF Tooling and Debugging Infrastructure
Netronome
 
Dataplane programming with eBPF: architecture and tools
Stefano Salsano
 
Socket Programming- Data Link Access
LJ PROJECTS
 
Introduction of eBPF - 時下最夯的Linux Technology
Jace Liang
 
Making our networking stack truly extensible
Olivier Bonaventure
 
DCSF 19 eBPF Superpowers
Docker, Inc.
 
eBPF — Divulging The Hidden Super Power.pdf
SGBSeo
 
UM2019 Extended BPF: A New Type of Software
Brendan Gregg
 
Andrea Righi - Spying on the Linux kernel for fun and profit
linuxlab_conf
 
Spying on the Linux kernel for fun and profit
Andrea Righi
 
eBPF/XDP
Netronome
 
eBPF — Divulging The Hidden Super Power.pdf
seo18
 
Transparent eBPF Offload: Playing Nice with the Linux Kernel
Open-NFP
 
Ad

More from Kernel TLV (20)

PDF
DPDK In Depth
Kernel TLV
 
PDF
SGX Trusted Execution Environment
Kernel TLV
 
PDF
Fun with FUSE
Kernel TLV
 
PPTX
Kernel Proc Connector and Containers
Kernel TLV
 
PPTX
Bypassing ASLR Exploiting CVE 2015-7545
Kernel TLV
 
PDF
Present Absence of Linux Filesystem Security
Kernel TLV
 
PDF
OpenWrt From Top to Bottom
Kernel TLV
 
PDF
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
PDF
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Kernel TLV
 
PDF
File Systems: Why, How and Where
Kernel TLV
 
PDF
KernelTLV Speaker Guidelines
Kernel TLV
 
PDF
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 
PDF
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
PDF
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
PPTX
DMA Survival Guide
Kernel TLV
 
PPSX
FD.IO Vector Packet Processing
Kernel TLV
 
PPTX
WiFi and the Beast
Kernel TLV
 
PPTX
Introduction to DPDK
Kernel TLV
 
PDF
FreeBSD and Drivers
Kernel TLV
 
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
Kernel TLV
 
DPDK In Depth
Kernel TLV
 
SGX Trusted Execution Environment
Kernel TLV
 
Fun with FUSE
Kernel TLV
 
Kernel Proc Connector and Containers
Kernel TLV
 
Bypassing ASLR Exploiting CVE 2015-7545
Kernel TLV
 
Present Absence of Linux Filesystem Security
Kernel TLV
 
OpenWrt From Top to Bottom
Kernel TLV
 
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Kernel TLV
 
File Systems: Why, How and Where
Kernel TLV
 
KernelTLV Speaker Guidelines
Kernel TLV
 
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
DMA Survival Guide
Kernel TLV
 
FD.IO Vector Packet Processing
Kernel TLV
 
WiFi and the Beast
Kernel TLV
 
Introduction to DPDK
Kernel TLV
 
FreeBSD and Drivers
Kernel TLV
 
Specializing the Data Path - Hooking into the Linux Network Stack
Kernel TLV
 
Ad

Recently uploaded (20)

PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Thread In Android-Mastering Concurrency for Responsive Apps.pdf
Nabin Dhakal
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 

Building Network Functions with eBPF & BCC

  • 1. Shmulik Ladkani, 2018 Building Network Functions with eBPF & BCC This work is licensed under a Creative Commons Attribution 4.0 International License.
  • 2. Agenda ● Intro ● Theory ○ Classical BPF ○ eBPF ○ BCC ● Practice ○ Examples and demo
  • 6. Berkeley Packet Filter New Architecture for User-level Packet Capture ● McCanne/Jacobson 1993 ● Standardized API ● Performant
  • 7. Berkeley Packet Filter ● Allows user program to attach a filter onto a socket ● Available on most *nix systems
  • 8. Design ● Abstract-machine architecture ○ Registers, memory, addressing modes… ○ Instruction set (load, store, branch, ALU…) ● In-kernel interpreter Example program: assembly / machine instructions (000) ldh [12] { 0x28, 0, 0, 0x0000000c }, (001) jeq #0x800 jt 2 jf 5 { 0x15, 0, 3, 0x00000800 }, (002) ldb [23] { 0x30, 0, 0, 0x00000017 }, (003) jeq #0x6 jt 4 jf 5 { 0x15, 0, 1, 0x00000006 }, (004) ret #262144 { 0x6, 0, 0, 0x00040000 }, (005) ret #0 { 0x6, 0, 0, 0x00000000 },
  • 9. Modus Operandi struct sock_filter code[] = { /* ... machine instructions ... */ }; struct sock_fprog bpf = { .filter = code, .len = ARRAY_SIZE(code), }; sock = socket(...); setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf));
  • 10. Applications ● Libpcap ○ Tcpdump, Wireshark, Nmap... ● DHCP stacks ● WPA 802.1x stacks ● Android 464XLAT ● android.net.NetworkUtils ● Custom user-space protocol stacks
  • 11. Linux Enhancements Packet Metadata Access Extension Description len skb->len proto skb->protocol type skb->pkt_type ifidx skb->dev->ifindex hatype skb->dev->type mark skb->mark rxhash skb->hash vlan_tci skb_vlan_tag_get(skb) vlan_avail skb_vlan_tag_present(skb) vlan_tpid skb->vlan_proto nla Netlink attribute of type X with offset A nlan Nested Netlink attribute of type X with offset A
  • 12. Linux Enhancements Just-In-Time Compiler ● Converts BPF instructions directly into native code ● As of v3.0 (x86_64) ○ SPARC, PowerPC, ARM, ARM64, MIPS, s390 followed
  • 13. Linux Enhancements Hooking Points ● IPTables xt_bpf ○ Competitive with traditional u32 match ○ As of v3.9 ○ iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 6,6 0 0 1,6 0 0 0' -j ACCEPT ● TC cls_bpf ○ Alternative to ematch / u32 classification ○ As of v3.13 ○ tc filter add dev em1 parent 1: bpf bytecode '1,6 0 0 4294967295,' flowid 1:1 tc filter add dev em1 parent 1: bpf bytecode-file /var/bpf/tcp-syn flowid 1:1
  • 14. Linux Enhancements Seccomp BPF ● Filters system calls using a BPF filter ○ Operates on syscall number and syscall arguments ○ As of v3.5 ○ ● Used by Chrome, Firefox, OpenSSH, Android… static struct filter = { /* ... */ // load syscall number BPF_STMT(BPF_LD+BPF_W+BPF_ABS, offsetof(struct seccomp_data, nr)), // only allow ‘read’ BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, SYS_read, 0, 1), BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW) BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL) }; /* ... */ prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &filterprog);
  • 15. Summary ● Fixed filter program ● Few injection points ● Two domains ○ Packet filtering ○ Syscall filtering ● Functional, stateless ● Kernel data is immutable ● No kernel interaction User-program injected into kernel to control behavior
  • 17. eBPF ● Abstract-machine engine running injected user programs ● On steroids ○ New domain (tracing/profiling) ○ Numerous hooking points ○ LLVM backend ○ Actions (mutates data) ○ Data-structures (“maps”) ○ Kernel callable helper functions
  • 18. Applications (network) ● Network Security (DDoS, IDS, IPS …) ● Load Balancers ● Custom Statistics ● Monitoring ● Container Networking ● Custom Forwarding Stacks ● Network Functions
  • 19. ● Write ○ Restricted C ● Compile ○ clang & llc ● Load ○ bpf(BPF_PROG_LOAD, ...) ● Attach ○ Subsystem dependent Modus Operandi
  • 20. struct bpf_map_def SEC("maps") my_map = { .type = BPF_MAP_TYPE_ARRAY, .key_size = sizeof(u32), .value_size = sizeof(long), .max_entries = 256, }; SEC("socket1") int bpf_prog1(struct __sk_buff *skb) { int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol)); long *value; if (skb->pkt_type != PACKET_OUTGOING) return 0; value = bpf_map_lookup_elem(&my_map, &index); if (value) __sync_fetch_and_add(value, skb->len); return 0; } samples/bpf/sockex1_kern.c
  • 21. load_bpf_file(filename); // assigns prog_fd, map_fd sock = open_raw_sock("lo"); setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, prog_fd, sizeof(prog_fd[0])); f = popen("ping -c5 localhost", "r"); for (i = 0; i < 5; i++) { long long tcp_cnt, udp_cnt, icmp_cnt; key = IPPROTO_TCP; bpf_map_lookup_elem(map_fd[0], &key, &tcp_cnt); key = IPPROTO_UDP; bpf_map_lookup_elem(map_fd[0], &key, &udp_cnt); key = IPPROTO_ICMP; bpf_map_lookup_elem(map_fd[0], &key, &icmp_cnt); printf("TCP %lld UDP %lld ICMP %lld bytesn", tcp_cnt, udp_cnt, icmp_cnt); sleep(1); } samples/bpf/sockex1_user.c
  • 22. eBPF Maps ● Key-value store ○ Keeps program state ○ Accessible from the eBPF program ○ Accessible from userspace ● Allows context aware behavior ● Numerous data structures BPF_MAP_TYPE_HASH BPF_MAP_TYPE_ARRAY BPF_MAP_TYPE_LRU_HASH BPF_MAP_TYPE_LPM_TRIE more ...
  • 23. Determines: context, whence, access rights BPF_PROG_TYPE_SOCKET_FILTER packet filter BPF_PROG_TYPE_SCHED_CLS tc classifier BPF_PROG_TYPE_SCHED_ACT tc action BPF_PROG_TYPE_LWT_* lightweight tunnel filter BPF_PROG_TYPE_KPROBE kprobe filter BPF_PROG_TYPE_TRACEPOINT tracepoint filter BPF_PROG_TYPE_PERF_EVENT perf event filter BPF_PROG_TYPE_XDP packet filter from XDP BPF_PROG_TYPE_CGROUP_SKB packet filter for control groups BPF_PROG_TYPE_CGROUP_SOCK same, allowed to modify socket options Program Types
  • 24. Helper Functions ● eBPF program may call a predefined set of functions ● Differs by program type ● Examples: BPF_FUNC_skb_load_bytes BPF_FUNC_csum_diff BPF_FUNC_skb_get_tunnel_key BPF_FUNC_get_hash_recalc ... BPF_FUNC_skb_store_bytes BPF_FUNC_skb_pull_data BPF_FUNC_l3_csum_replace BPF_FUNC_l4_csum_replace BPF_FUNC_redirect BPF_FUNC_clone_redirect BPF_FUNC_skb_vlan_push BPF_FUNC_skb_vlan_pop BPF_FUNC_skb_change_proto BPF_FUNC_skb_set_tunnel_key ...
  • 25. BCC
  • 26. BPF Compiler Collection ● Toolkit for creating and using eBPF ● Makes eBPF programs easier to write ○ Kernel instrumentation in C ○ Frontends in Python and Lua ● Numerous examples ● Documentation and tutorials
  • 27. Example #1 Custom Statistics Histogram of packets by their size
  • 28. Example #2 Custom Filtering Drop egress ARP Requests for specific Target Addresses
  • 29. Example #3 Custom Network Function Network Load Balancer
  • 30. Example #3 - Topology Server1 VIP 192.0.2.50 10.50.1.9 Server2 VIP 192.0.2.50 10.50.2.9 Test Machine 10.33.33.10 10.33.33.11 10.33.33.12 10.33.33.13 10.33.33.14 Load Balancer 192.0.2.50 dev multigre0 Set GRE tunnel destination by flow hash Src: 10.33.33.10 Dst: 192.0.2.50 Src: 10.50.1.1 Dst: 10.50.1.9 Src: 10.33.33.10 Dst: 192.0.2.50
  • 31. Further Topics ● bpfilter ● Open vSwitch eBPF datapath ● XDP ● Hardware Offloads ● Tracing / Profiling