SlideShare a Scribd company logo
13
Most read
16
Most read
17
Most read
Kernel Networking Walkthrough
LinuxCon 2015, Seattle
Thomas Graf
Kernel & Open vSwitch Team
Noiro Networks (Cisco)
Agenda
● Getting packets from/to the NIC
● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO
● Packet processing
● RX Handler, IP Processing, TCP Processing, TCP Fast Open
● Queuing from/to userspace
● Socket Buffers, Flow Control, TCP Small Queues
● Q&A
Touring the Network Stack
Expectation Reality
How does a packet get in and out of
the Network Stack?
Receive & Transmit Process
Ring Buffer
DMA
Parse
L2 & IP
Parse
TCP/UDP
Socket Buffer
Task /
Container
read()
Ring Buffer
Construct
IP
Construct
TCP/UDP
Local?
Socket Buffer
Forward
Route?
write()
NIC Network Stack
(Kernel Space)
Process
(User Space)
The 3 ways into the Network Stack
Ring Buffer
Network
Stack
Interrupt Driven
A
Ring Buffer
Network
Stack
NAPI based Polling poll()
B
Ring Buffer Network
Stack
Busy Polling busy_poll()
Task
C
RSS – Receive Side Scaling
● NIC distributes packets across multiple RX queues
allowing for parallel processing.
● Separate IRQ per RX queue, thus selects CPU to run
hardware interrupt handler on.
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
CPU 1
CPU 2
CPU 1
CPU 2
filter
RPS – Receive Packet Steering
● Software filter to select CPU # for processing
● Use it to ...
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
CPU 1
CPU 2
CPU 3
CPU 1
CPU 2
CPU 3
... redo queue - CPU mapping ... distribute single queue to
multiple CPUs
Hardware Offload
● RX/TX Checksumming
● Perform CPU intensive checksumming in
hardware.
● Virtual LAN filtering and tag stripping
● Strip 802.1Q header and store VLAN ID
in network packet meta data.
● Filter out unsubscribed VLANs.
● Segmentation Offload
Generic Receive Offload
(ethtool -K eth0 gro on)
Ring Buffer
Network
Stack
poll()
NAPI based GRO
MTU
GRO
Up to 64K
It's more effective to process 1x64K bytes packet
instead of 40x1500 bytes packets.
Segmentation Offload
(ethtool -K eth0 tso on)
(ethtool -K eth0 gso on)
Ring Buffer
Network
Stack
Generic Segmentation Offload (GSO)
ethtool -K eth0 gso on
MTU
TCP Segmentation Offload (TSO)
ethtool -K eth0 tso on
MTU
Up to 64K
How does a packet get through the
Network Stack?
(c) Karen Sagovac
Packet Processing
Link Layer
Ingress QoS
Proto Handler
IPv4
IPv6
ARP
IPX
...
Drop
The Feast!
RX Handler
Open vSwitch
Team
Bonding
Bridge
macvlan
macvtap
Packet Socket
ETH_P_ALL
tcpdump
IP Processing
IP
Handler Route Lookup
PREROUTING
IPv4
Construction
Route Lookup
Local Output
OUTPUT
POSTROUTINGLink Layer
FORWARD
Forwarding
L4
(TCP, ...)
Local Delivery
INPUT
User
Space
TCP Processing
IP
Socket Filter
Receive TCP
Parse TCP
Lookup Socket
Backlog
socket locked
Receive Socket Buffer
Prequeue
task exists
process context ← softirq
Task
poll()read()
TCP Fast Open
(net.ipv4.tcp_fastopen)
2nd
Req SYN
SYN+ACK
ACK+HTTP GET
Data
2x RTT
SYN+Cookie+HTTP GET
SYN+ACK+Data
2nd
Req
1x RTT
Client Server
SYN
SYN+ACK
ACK+HTTP GET
1st
Req
Data
2x RTT2x RTT
Regular
Client Server
SYN
SYN+ACK+Cookie
ACK+HTTP GET
1st
Req
Data
2x RTT
Fast Open
Memory Accounting & Flow Control
Socket Buffers & Flow Control
(net.ipv4.tcp_{r|w}mem)
ssh
TX Ring Buffer
TCP/IP
Socket Buffer
wmem
overlimit?
Block or EWOULDBLOCK
wmem += packet-size
ssh
RX Ring Buffer
TCP/IP
Socket Buffer
rmem -= packet-size
rmem
overlimit?
Reduce TCP Window
rmem += packet-size
wmem -= packet-size
write()
TCP Small Queues
(net.ipv4.tcp_limit_output_bytes)
ssh
TX Ring Buffer
Driver
TCP/IP
Socket Buffer
write()
Queuing Discipline
torrent
Socket Buffer
write()
TSQ: max 128Kb in flight per socket
Q&A
Contact:
● E-Mail: tgraf@suug.ch
● Twitter: @tgraf__

More Related Content

What's hot (20)

PDF
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
ODP
eBPF maps 101
SUSE Labs Taipei
 
PDF
Fun with Network Interfaces
Kernel TLV
 
PDF
Deploying IPv6 on OpenStack
Vietnam Open Infrastructure User Group
 
PDF
Intel DPDK Step by Step instructions
Hisaki Ohara
 
PDF
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
PDF
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
PPSX
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
PDF
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
ShapeBlue
 
PPTX
FD.io VPP事始め
tetsusat
 
PDF
Introduction to eBPF
RogerColl2
 
PDF
Kernel Recipes 2019 - XDP closer integration with network stack
Anne Nicolas
 
PDF
netfilter and iptables
Kernel TLV
 
PDF
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
PPTX
Understanding eBPF in a Hurry!
Ray Jenkins
 
PPTX
OVN - Basics and deep dive
Trinath Somanchi
 
PPTX
eBPF Basics
Michael Kehoe
 
PDF
Faster packet processing in Linux: XDP
Daniel T. Lee
 
PDF
Introduction to eBPF and XDP
lcplcp1
 
Meet cute-between-ebpf-and-tracing
Viller Hsiao
 
eBPF maps 101
SUSE Labs Taipei
 
Fun with Network Interfaces
Kernel TLV
 
Deploying IPv6 on OpenStack
Vietnam Open Infrastructure User Group
 
Intel DPDK Step by Step instructions
Hisaki Ohara
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
ShapeBlue
 
FD.io VPP事始め
tetsusat
 
Introduction to eBPF
RogerColl2
 
Kernel Recipes 2019 - XDP closer integration with network stack
Anne Nicolas
 
netfilter and iptables
Kernel TLV
 
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
Understanding eBPF in a Hurry!
Ray Jenkins
 
OVN - Basics and deep dive
Trinath Somanchi
 
eBPF Basics
Michael Kehoe
 
Faster packet processing in Linux: XDP
Daniel T. Lee
 
Introduction to eBPF and XDP
lcplcp1
 

Similar to LinuxCon 2015 Linux Kernel Networking Walkthrough (20)

PDF
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
PDF
CCIE
Lahcene Berkani
 
PDF
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
PDF
Using Mikrotik Switch Features to Improve Your Network
GLC Networks
 
PPTX
Enterprise network design multi layer network and security.pptx
bipinbhattarai12
 
PDF
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
PDF
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
abdenour boussioud
 
PPTX
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PPTX
Switching techniques in networking and uses
lochanraj1
 
PPTX
linux networking laboratory presentation .pptx
AnuradhaJadiya1
 
PPT
Ccna3 mod9-vtp
Milton Bengui
 
PPTX
Packet Walk(s) In Kubernetes
Don Jayakody
 
PDF
packet traveling (pre cloud)
iman darabi
 
PDF
Geep networking stack-linuxkernel
Kiran Divekar
 
PDF
Hp a5500
Michel Hidalgo
 
PDF
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
PPTX
12 ethernet-wifi
Olivier Bonaventure
 
PDF
Mikrotik Bridge Deep Dive
GLC Networks
 
PDF
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
PDF
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
Using Mikrotik Switch Features to Improve Your Network
GLC Networks
 
Enterprise network design multi layer network and security.pptx
bipinbhattarai12
 
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
abdenour boussioud
 
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
Switching techniques in networking and uses
lochanraj1
 
linux networking laboratory presentation .pptx
AnuradhaJadiya1
 
Ccna3 mod9-vtp
Milton Bengui
 
Packet Walk(s) In Kubernetes
Don Jayakody
 
packet traveling (pre cloud)
iman darabi
 
Geep networking stack-linuxkernel
Kiran Divekar
 
Hp a5500
Michel Hidalgo
 
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
12 ethernet-wifi
Olivier Bonaventure
 
Mikrotik Bridge Deep Dive
GLC Networks
 
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
Ad

More from Thomas Graf (15)

PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
PDF
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Thomas Graf
 
PDF
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
PDF
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
PDF
Cilium - Network security for microservices
Thomas Graf
 
PDF
Linux Native, HTTP Aware Network Security
Thomas Graf
 
PDF
BPF: Next Generation of Programmable Datapath
Thomas Graf
 
PDF
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
PDF
Cilium - BPF & XDP for containers
Thomas Graf
 
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
PDF
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
PDF
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
PDF
2015 FOSDEM - OVS Stateful Services
Thomas Graf
 
PDF
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Thomas Graf
 
PDF
SDN & NFV Introduction - Open Source Data Center Networking
Thomas Graf
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Thomas Graf
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
Cilium - Network security for microservices
Thomas Graf
 
Linux Native, HTTP Aware Network Security
Thomas Graf
 
BPF: Next Generation of Programmable Datapath
Thomas Graf
 
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
Cilium - BPF & XDP for containers
Thomas Graf
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
2015 FOSDEM - OVS Stateful Services
Thomas Graf
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Thomas Graf
 
SDN & NFV Introduction - Open Source Data Center Networking
Thomas Graf
 
Ad

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 

LinuxCon 2015 Linux Kernel Networking Walkthrough

  • 1. Kernel Networking Walkthrough LinuxCon 2015, Seattle Thomas Graf Kernel & Open vSwitch Team Noiro Networks (Cisco)
  • 2. Agenda ● Getting packets from/to the NIC ● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO ● Packet processing ● RX Handler, IP Processing, TCP Processing, TCP Fast Open ● Queuing from/to userspace ● Socket Buffers, Flow Control, TCP Small Queues ● Q&A
  • 3. Touring the Network Stack Expectation Reality
  • 4. How does a packet get in and out of the Network Stack?
  • 5. Receive & Transmit Process Ring Buffer DMA Parse L2 & IP Parse TCP/UDP Socket Buffer Task / Container read() Ring Buffer Construct IP Construct TCP/UDP Local? Socket Buffer Forward Route? write() NIC Network Stack (Kernel Space) Process (User Space)
  • 6. The 3 ways into the Network Stack Ring Buffer Network Stack Interrupt Driven A Ring Buffer Network Stack NAPI based Polling poll() B Ring Buffer Network Stack Busy Polling busy_poll() Task C
  • 7. RSS – Receive Side Scaling ● NIC distributes packets across multiple RX queues allowing for parallel processing. ● Separate IRQ per RX queue, thus selects CPU to run hardware interrupt handler on. RX-queue-1 RX-queue-2 RX-queue-3 RX-queue-4 CPU 1 CPU 2 CPU 1 CPU 2 filter
  • 8. RPS – Receive Packet Steering ● Software filter to select CPU # for processing ● Use it to ... RX-queue-1 RX-queue-2 RX-queue-3 RX-queue-4 CPU 1 CPU 2 CPU 3 CPU 1 CPU 2 CPU 3 ... redo queue - CPU mapping ... distribute single queue to multiple CPUs
  • 9. Hardware Offload ● RX/TX Checksumming ● Perform CPU intensive checksumming in hardware. ● Virtual LAN filtering and tag stripping ● Strip 802.1Q header and store VLAN ID in network packet meta data. ● Filter out unsubscribed VLANs. ● Segmentation Offload
  • 10. Generic Receive Offload (ethtool -K eth0 gro on) Ring Buffer Network Stack poll() NAPI based GRO MTU GRO Up to 64K It's more effective to process 1x64K bytes packet instead of 40x1500 bytes packets.
  • 11. Segmentation Offload (ethtool -K eth0 tso on) (ethtool -K eth0 gso on) Ring Buffer Network Stack Generic Segmentation Offload (GSO) ethtool -K eth0 gso on MTU TCP Segmentation Offload (TSO) ethtool -K eth0 tso on MTU Up to 64K
  • 12. How does a packet get through the Network Stack? (c) Karen Sagovac
  • 13. Packet Processing Link Layer Ingress QoS Proto Handler IPv4 IPv6 ARP IPX ... Drop The Feast! RX Handler Open vSwitch Team Bonding Bridge macvlan macvtap Packet Socket ETH_P_ALL tcpdump
  • 14. IP Processing IP Handler Route Lookup PREROUTING IPv4 Construction Route Lookup Local Output OUTPUT POSTROUTINGLink Layer FORWARD Forwarding L4 (TCP, ...) Local Delivery INPUT User Space
  • 15. TCP Processing IP Socket Filter Receive TCP Parse TCP Lookup Socket Backlog socket locked Receive Socket Buffer Prequeue task exists process context ← softirq Task poll()read()
  • 16. TCP Fast Open (net.ipv4.tcp_fastopen) 2nd Req SYN SYN+ACK ACK+HTTP GET Data 2x RTT SYN+Cookie+HTTP GET SYN+ACK+Data 2nd Req 1x RTT Client Server SYN SYN+ACK ACK+HTTP GET 1st Req Data 2x RTT2x RTT Regular Client Server SYN SYN+ACK+Cookie ACK+HTTP GET 1st Req Data 2x RTT Fast Open
  • 17. Memory Accounting & Flow Control
  • 18. Socket Buffers & Flow Control (net.ipv4.tcp_{r|w}mem) ssh TX Ring Buffer TCP/IP Socket Buffer wmem overlimit? Block or EWOULDBLOCK wmem += packet-size ssh RX Ring Buffer TCP/IP Socket Buffer rmem -= packet-size rmem overlimit? Reduce TCP Window rmem += packet-size wmem -= packet-size write()
  • 19. TCP Small Queues (net.ipv4.tcp_limit_output_bytes) ssh TX Ring Buffer Driver TCP/IP Socket Buffer write() Queuing Discipline torrent Socket Buffer write() TSQ: max 128Kb in flight per socket