SlideShare a Scribd company logo
DPDK
Data Plane Development Kit
Ariel Waizel ariel.waizel@gmail.com
2Confidential
Who Am I?
• Programmer for 12 years
– Mostly C/C++ for cyber and networks applications.
– Some higher languages for machine learning and generic DevOps (Python and Java).
• M.sc in Information System Engineering (a.k.a machine learning).
• Currently I’m a solution architect at Contextream ( HPE)
– The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.).
– Appeal: Everything is open-source!
– I work with costumers on end-to-end solutions and create POCs.
• Gaming for fun.
3Confidential
Let’s Talk About DPDK
• What is DPDK?
• Why use it?
• How good is it?
• How does it work?
4Confidential
The Challenge
• Network application, done in software, supporting native network speeds.
• Network speeds:
– 10 Gb per NIC (Network Interface Card).
– 40 Gb per NIC.
– 100 Gb?
• Network applications:
– Network device implemented in software (Router, switch, http proxy, etc.).
– Network services for virtual machines (Openstack, etc.).
– Security. protect your applications from DDOS.
5Confidential
What is DPDK?
• Set of UIO (User IO) drivers and user space libraries.
• Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native
speed:
– 10 or 40 Gb NICs.
– Speed is the most (only?) important criteria.
– Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use).
• All traffic bypasses the kernel (We’ll get to why).
– When a NIC is controlled by a DPDK driver, it’s invisible to the kernel.
• Open source (BSD-3 for most, GPL for Linux Kernel related parts).
6Confidential
Why DPDK Should Interest Kernel Devs?
• Bypassing the kernel is importance because of performance - Intriguing by itself.
– At the very least, know the “competition”.
• DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground
for kernel developers, to learn performance guidelines.
7Confidential
Why bypassing the kernel is a necessity.
Why Use DPDK?
8Confidential
10Gb – Crunching Numbers
• Minimum Ethernet packet: 64 bytes + 20 preamble.
• Maximum number of pps: 14,880,952
– 10^10 bps /(84 bytes * 8)
• Minimum time to process a single packet: 67.2 ns
• Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
9Confidential
Time (ish) per Operation (July 2014)
Time (Expected)Operation
32 nsCache Miss
4.3 nsL2 cache access
7.9 nsL3 cache access
8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic)
41.85 ns (75.34 for audit-syscall)syscall
80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator)
77 nsSKB (Linux Kernel packet struct) Memory Manager
48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor)
Up to several Cache Misses.TLB Miss
* Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014.
Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
10Confidential
Time Per Operation – The Big No-Nos
Average TimeOperation
Micro seconds (1000 ns)Context Switch
Micro secondsPage Fault
Conclusion: Must pin CPUs and pre-allocate resources
11Confidential
Benchmarks at the end.
As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
12Confidential
How It Works.
DPDK Architecture
13Confidential
DPDK – What do you get?
• UIO drivers
• PMD per hardware NIC:
– PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit).
– Mapping PCI memory and registers.
– Mapping user memory (for example: packet memory buffers) into the NIC.
– Configuration of specific HW accelerations in the NIC.
• User space libraries:
– Initialize PMDs
– Threading (builds on pthread).
– CPU Management
– Memory Management (Huge pages only!).
– Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
14Confidential
DPDK APP
DPDK APP
From NIC to Process – Pipe Line modelNIC
RX Q
RX Q
PMD
PMD
TX Q
TX Q
DPDK APP
RING
RINGIgb_uio
* igb_uio is the DPDK standard kernel uio driver for device control plane
15Confidential
From NIC to Process – Run To Completion ModelNIC
RX Q
RX Q
TX Q
TX Q
DPDK APP + PMD
DPDK APP + PMDIgb_uio
16Confidential
Software Configuration
• C Code
– GCC 4.5.X or later.
• Required:
– Kernel version >= 2.6.34
– Hugetablefs (For best performance use 1G pages, which require GRUB configuration).
• Recommended:
– isolcpus in GRUB configuration: isolates CPUs from the scheduler.
• Compilation
– DPDK applications are statically compiled/linked with the DPDK libraries, for best performance.
– Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory
• Setup:
– Best to use tools/dpdk-setup.sh to setup/build the environment.
– Use tools/dpdk-devbind.py
o -- status let’s you see the available NICs and their corresponding drivers.
o -bind let’s you bind a NIC to a driver. Igb-uio, for example.
– Run the application with the appropriate arguments.
17Confidential
Software Architecture
• PMD drivers are just user space pthreads that call specific EAL functions
– These EAL functions have concrete implementations per NIC, and this costs couple of indirections.
– Access to RX/TX descriptors is direct.
– Uses UIO driver for specific control changes (like configuring interrupts).
• Most DPDK libraries are not thread-safe.
– PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC.
• All inner-thread communication is based on librte_ring.
– A mp-mc lockless non-resizable queue ring implementation.
– Optimized for DPDK purposes.
• All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
18Confidential
Software Architecture
19Confidential
Application Bring up
20Confidential
Code Example – Basic Forwarding
• Main Function:
DPDK Init
Get all available NICS (binded with igb_uio)
Initialize packet buffers
Initialize NICs.
21Confidential
Code Example – port_init
NIC init: Set number of queues
Rx queue init: Set packet buffer pool for queue
* Uses librte_ring to be thread safe
Tx queue init: No need for buffer pool
Start Getting Packets
22Confidential
Code Example – lcore_main
PMD
23Confidential
Igb_uio
• For Intel Gigabit NICs.
– Simple enough to work for most NICs, nonetheless.
• Basically:
– Calls pci_enable_device.
– Enables bus mastering on the device (pci_set_master).
– Requests all BARs and mas them using ioremap.
– Setups ioports.
– Sets the dma mask to 64-bit.
• Code to support SRIOV and xen.
24Confidential
rte_ring
• Fixed sized, “lockless”, queue ring
• Non Preemptive.
• Supports multiple/single producer/consumer, and bulk actions.
• Uses:
– Single array of pointers.
– Head/tail pointers for both producer and consumer (total 4 pointers).
• To enqueue (Just like dequeue):
– Until successful:
o Save in local variable the current head_ptr.
o head_next = head_ptr + num_objects
o CAS the head_ptr to head_next
– Insert objects.
– Until successful:
o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr
• Analysis:
– Light weight.
– In theory, both loops are costly.
– In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
25Confidential
rte_mempool
• Smart memory allocation
• Allocate the start of each memory buffer at a different memory channel/rank:
– Most applications only want to see the first 64 bytes of the packet (Ether+ip header).
– Requests for memory at different channels, same rank, are done concurrently by the memory controller.
– Requests for memory at different ranks can be managed effectively by the memory controller.
– Pads objects until gcd(obj_size,num_ranks*num_channels) == 1.
• Maintain a per-core cache, bulk requests to the mempool ring.
• Allocate memory based on:
– NUMA
– Contiguous virtual memory (Means also contiguous physical memory for huge pages).
• Non- preemptive.
– Same lcore must not context switch to another task using the same mempool.
26Confidential
Linux kernel Vs. DPDK
Benchmarks
27Confidential
Linux Kernel Benchmarks, single core
Feb. 2016July 2014Benchmark
14.8 Mpps4 MppsTX
12 Mpps (experimental)6.4 MppsRX (Dump at driver)
2 Mpps1 MppsL3 Forwarding (RX+Filter+TX)
12 Mpps6 MppsL3 forwarding (Multi Core)
* Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
28Confidential
DPDK Benchmarks (March 2016)
Multi CoreSingle CoreBenchmark
Linear Increase22 MppsL3 forwarding (PHY-PHY)
Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY)
Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY)
Linear Increase2 MppsVM to VM
* Intel Open Network Platform Release 2.1 Performance Test Report March 2016.
• All tests with:
• 4X40Gb ports
• E5-2695 V4 2.1Ghz Processor
• 16X1GB Huge Pages, 2048X2MB Huge Pages
29Confidential
DPDK Benchmarks Figures
PHY-PHY PHY-OVS-PHY VM-VM
30Confidential
DPDK Pros and Cons
31Confidential
DPDK Advantages
• Best forwarding performance solution to/from PHY/process/VM to date.
– Best single core performance.
– Scales: linear performance increase per core.
• Active and longstanding community (from 2012).
– DPDK.org
– Full of tutorials, examples and complementary features.
• Active popular products
– OVS-DPDK is the main solution for high speed networking in openstack.
– 6WIND.
– TRex.
• Great virtualization support.
– Deploy at the host of the guest environment.
32Confidential
DPDK Disadvantages
• Security
• Isolated ecosystem:
– Hard to use linux kernel infrastructure (While there are precedents).
• Requires modified code in applications to use:
– DPDK processes use a very specific API.
– DPDK application can’t interact transparently with Linux processes (important for transparent networking applications
like Firewall , DDOS mitigation, etc.).
o Solved for interaction with VMs by the vhost Library.
• Requires Huge pages (XDP doesn’t).

More Related Content

What's hot (20)

PDF
DPDK & Layer 4 Packet Processing
Michelle Holley
 
PPTX
Understanding DPDK algorithmics
Denys Haryachyy
 
PPTX
DPDK KNI interface
Denys Haryachyy
 
PDF
Intel DPDK Step by Step instructions
Hisaki Ohara
 
PPSX
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
ODP
nftables - the evolution of Linux Firewall
Marian Marinov
 
ODP
Dpdk performance
Stephen Hemminger
 
PDF
Introduction to eBPF and XDP
lcplcp1
 
PDF
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
PDF
Faster packet processing in Linux: XDP
Daniel T. Lee
 
PDF
Using GTP on Linux with libgtpnl
Kentaro Ebisawa
 
PDF
EBPF and Linux Networking
PLUMgrid
 
PDF
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PPTX
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
PPTX
Linux Network Stack
Adrien Mahieux
 
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
PPTX
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
PPTX
OVS v OVS-DPDK
Md Safiyat Reza
 
DPDK & Layer 4 Packet Processing
Michelle Holley
 
Understanding DPDK algorithmics
Denys Haryachyy
 
DPDK KNI interface
Denys Haryachyy
 
Intel DPDK Step by Step instructions
Hisaki Ohara
 
FD.io Vector Packet Processing (VPP)
Kirill Tsym
 
nftables - the evolution of Linux Firewall
Marian Marinov
 
Dpdk performance
Stephen Hemminger
 
Introduction to eBPF and XDP
lcplcp1
 
eBPF Trace from Kernel to Userspace
SUSE Labs Taipei
 
Faster packet processing in Linux: XDP
Daniel T. Lee
 
Using GTP on Linux with libgtpnl
Kentaro Ebisawa
 
EBPF and Linux Networking
PLUMgrid
 
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
Linux Network Stack
Adrien Mahieux
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
OVS v OVS-DPDK
Md Safiyat Reza
 

Similar to Introduction to DPDK (20)

PPTX
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
PDF
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
PDF
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
harryvanhaaren
 
PDF
100 M pps on PC.
Redge Technologies
 
PPTX
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
PDF
Making Networking Apps Scream on Windows with DPDK
Michelle Holley
 
PDF
DPDK Summit 2015 - Intel - Keith Wiles
Jim St. Leger
 
PPTX
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PDF
7 hands on
videos
 
PDF
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applic...
poetacloarzt
 
PDF
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK
 
PDF
2014_DPDK_slides.pdf
eceschmidt
 
PDF
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
PDF
3 additional dpdk_theory(1)
videos
 
PDF
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
PPTX
G rpc talk with intel (3)
Intel
 
PDF
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
PDF
5 pipeline arch_rationale
videos
 
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
harryvanhaaren
 
100 M pps on PC.
Redge Technologies
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Making Networking Apps Scream on Windows with DPDK
Michelle Holley
 
DPDK Summit 2015 - Intel - Keith Wiles
Jim St. Leger
 
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
7 hands on
videos
 
Data Plane Development Kit A Guide To The User Spacebased Fast Network Applic...
poetacloarzt
 
LF_DPDK17_Accelerating P4-based Dataplane with DPDK
LF_DPDK
 
2014_DPDK_slides.pdf
eceschmidt
 
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
3 additional dpdk_theory(1)
videos
 
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
G rpc talk with intel (3)
Intel
 
Linux Kernel vs DPDK: HTTP Performance Showdown
ScyllaDB
 
5 pipeline arch_rationale
videos
 
Ceph Day Taipei - Accelerate Ceph via SPDK
Ceph Community
 
Ad

More from Kernel TLV (20)

PDF
Building Network Functions with eBPF & BCC
Kernel TLV
 
PDF
SGX Trusted Execution Environment
Kernel TLV
 
PDF
Fun with FUSE
Kernel TLV
 
PPTX
Kernel Proc Connector and Containers
Kernel TLV
 
PPTX
Bypassing ASLR Exploiting CVE 2015-7545
Kernel TLV
 
PDF
Present Absence of Linux Filesystem Security
Kernel TLV
 
PDF
OpenWrt From Top to Bottom
Kernel TLV
 
PDF
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
PDF
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Kernel TLV
 
PDF
File Systems: Why, How and Where
Kernel TLV
 
PDF
netfilter and iptables
Kernel TLV
 
PDF
KernelTLV Speaker Guidelines
Kernel TLV
 
PDF
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 
PDF
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
PDF
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
PPTX
DMA Survival Guide
Kernel TLV
 
PPTX
WiFi and the Beast
Kernel TLV
 
PDF
FreeBSD and Drivers
Kernel TLV
 
PDF
Specializing the Data Path - Hooking into the Linux Network Stack
Kernel TLV
 
PPTX
Linux Interrupts
Kernel TLV
 
Building Network Functions with eBPF & BCC
Kernel TLV
 
SGX Trusted Execution Environment
Kernel TLV
 
Fun with FUSE
Kernel TLV
 
Kernel Proc Connector and Containers
Kernel TLV
 
Bypassing ASLR Exploiting CVE 2015-7545
Kernel TLV
 
Present Absence of Linux Filesystem Security
Kernel TLV
 
OpenWrt From Top to Bottom
Kernel TLV
 
Make Your Containers Faster: Linux Container Performance Tools
Kernel TLV
 
Emerging Persistent Memory Hardware and ZUFS - PM-based File Systems in User ...
Kernel TLV
 
File Systems: Why, How and Where
Kernel TLV
 
netfilter and iptables
Kernel TLV
 
KernelTLV Speaker Guidelines
Kernel TLV
 
Userfaultfd: Current Features, Limitations and Future Development
Kernel TLV
 
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
Linux Kernel Cryptographic API and Use Cases
Kernel TLV
 
DMA Survival Guide
Kernel TLV
 
WiFi and the Beast
Kernel TLV
 
FreeBSD and Drivers
Kernel TLV
 
Specializing the Data Path - Hooking into the Linux Network Stack
Kernel TLV
 
Linux Interrupts
Kernel TLV
 
Ad

Recently uploaded (20)

PDF
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PPTX
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
PPTX
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
PPTX
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
PPT
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PDF
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 
Streamline Contractor Lifecycle- TECH EHS Solution
TECH EHS Solution
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
How Apagen Empowered an EPC Company with Engineering ERP Software
SatishKumar2651
 
The Role of a PHP Development Company in Modern Web Development
SEO Company for School in Delhi NCR
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Feb 2021 Cohesity first pitch presentation.pptx
enginsayin1
 
MiniTool Power Data Recovery Full Crack Latest 2025
muhammadgurbazkhan
 
MergeSortfbsjbjsfk sdfik k
RafishaikIT02044
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Powering GIS with FME and VertiGIS - Peak of Data & AI 2025
Safe Software
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Executive Business Intelligence Dashboards
vandeslie24
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Comprehensive Guide: Shoviv Exchange to Office 365 Migration Tool 2025
Shoviv Software
 

Introduction to DPDK

  • 2. 2Confidential Who Am I? • Programmer for 12 years – Mostly C/C++ for cyber and networks applications. – Some higher languages for machine learning and generic DevOps (Python and Java). • M.sc in Information System Engineering (a.k.a machine learning). • Currently I’m a solution architect at Contextream ( HPE) – The company specializes in software defined networks for carrier grade companies (Like Bezek, Partner, etc.). – Appeal: Everything is open-source! – I work with costumers on end-to-end solutions and create POCs. • Gaming for fun.
  • 3. 3Confidential Let’s Talk About DPDK • What is DPDK? • Why use it? • How good is it? • How does it work?
  • 4. 4Confidential The Challenge • Network application, done in software, supporting native network speeds. • Network speeds: – 10 Gb per NIC (Network Interface Card). – 40 Gb per NIC. – 100 Gb? • Network applications: – Network device implemented in software (Router, switch, http proxy, etc.). – Network services for virtual machines (Openstack, etc.). – Security. protect your applications from DDOS.
  • 5. 5Confidential What is DPDK? • Set of UIO (User IO) drivers and user space libraries. • Goal: forward network packet to/from NIC (Network Interface Card) from/to user application at native speed: – 10 or 40 Gb NICs. – Speed is the most (only?) important criteria. – Only forwards the packets – not a network stack (But there’re helpful libraries and examples to use). • All traffic bypasses the kernel (We’ll get to why). – When a NIC is controlled by a DPDK driver, it’s invisible to the kernel. • Open source (BSD-3 for most, GPL for Linux Kernel related parts).
  • 6. 6Confidential Why DPDK Should Interest Kernel Devs? • Bypassing the kernel is importance because of performance - Intriguing by itself. – At the very least, know the “competition”. • DPDK is a very light-weight, low level, performance driven framework – Makes it a good learning ground for kernel developers, to learn performance guidelines.
  • 7. 7Confidential Why bypassing the kernel is a necessity. Why Use DPDK?
  • 8. 8Confidential 10Gb – Crunching Numbers • Minimum Ethernet packet: 64 bytes + 20 preamble. • Maximum number of pps: 14,880,952 – 10^10 bps /(84 bytes * 8) • Minimum time to process a single packet: 67.2 ns • Minimum CPU cycles: 201 cycles on a 3 Ghz CPU (1 Ghz -> 1 cycle per ns).
  • 9. 9Confidential Time (ish) per Operation (July 2014) Time (Expected)Operation 32 nsCache Miss 4.3 nsL2 cache access 7.9 nsL3 cache access 8.25 ns (16.5 ns for unlock too)“LOCK” operation (like Atomic) 41.85 ns (75.34 for audit-syscall)syscall 80 ns for alloc + freeSLUB Allocator (Linux Kernel buffer allocator) 77 nsSKB (Linux Kernel packet struct) Memory Manager 48 ns minimum, between 58 to 68 ns averageQdisc (tx queue descriptor) Up to several Cache Misses.TLB Miss * Red Hat Challenge 10Gbit/s Jesper Dangaard Brouer et al. Netfilter Workshop July 2014. Conclusion: Must use batching (Send/rcv several packet together at the same time, amortize costs)
  • 10. 10Confidential Time Per Operation – The Big No-Nos Average TimeOperation Micro seconds (1000 ns)Context Switch Micro secondsPage Fault Conclusion: Must pin CPUs and pre-allocate resources
  • 11. 11Confidential Benchmarks at the end. As of Today: DPDK can handle 11X the traffic Linux Kernel Can!
  • 13. 13Confidential DPDK – What do you get? • UIO drivers • PMD per hardware NIC: – PMD (Poll Mode Driver) support for RX/TX (Receive and Transmit). – Mapping PCI memory and registers. – Mapping user memory (for example: packet memory buffers) into the NIC. – Configuration of specific HW accelerations in the NIC. • User space libraries: – Initialize PMDs – Threading (builds on pthread). – CPU Management – Memory Management (Huge pages only!). – Hashing, Scheduler, pipeline (packet steering), etc. – High performance support libraries for the application.
  • 14. 14Confidential DPDK APP DPDK APP From NIC to Process – Pipe Line modelNIC RX Q RX Q PMD PMD TX Q TX Q DPDK APP RING RINGIgb_uio * igb_uio is the DPDK standard kernel uio driver for device control plane
  • 15. 15Confidential From NIC to Process – Run To Completion ModelNIC RX Q RX Q TX Q TX Q DPDK APP + PMD DPDK APP + PMDIgb_uio
  • 16. 16Confidential Software Configuration • C Code – GCC 4.5.X or later. • Required: – Kernel version >= 2.6.34 – Hugetablefs (For best performance use 1G pages, which require GRUB configuration). • Recommended: – isolcpus in GRUB configuration: isolates CPUs from the scheduler. • Compilation – DPDK applications are statically compiled/linked with the DPDK libraries, for best performance. – Export RTE_SDK and RTE_TARGET to develop the application from a misc. directory • Setup: – Best to use tools/dpdk-setup.sh to setup/build the environment. – Use tools/dpdk-devbind.py o -- status let’s you see the available NICs and their corresponding drivers. o -bind let’s you bind a NIC to a driver. Igb-uio, for example. – Run the application with the appropriate arguments.
  • 17. 17Confidential Software Architecture • PMD drivers are just user space pthreads that call specific EAL functions – These EAL functions have concrete implementations per NIC, and this costs couple of indirections. – Access to RX/TX descriptors is direct. – Uses UIO driver for specific control changes (like configuring interrupts). • Most DPDK libraries are not thread-safe. – PMD drivers are non-preemptive: Can’t have 2 PMDs handle the same HW queue on the same NIC. • All inner-thread communication is based on librte_ring. – A mp-mc lockless non-resizable queue ring implementation. – Optimized for DPDK purposes. • All resources like memory (malloc), threads, descriptor queues, etc. are initialized at the start.
  • 20. 20Confidential Code Example – Basic Forwarding • Main Function: DPDK Init Get all available NICS (binded with igb_uio) Initialize packet buffers Initialize NICs.
  • 21. 21Confidential Code Example – port_init NIC init: Set number of queues Rx queue init: Set packet buffer pool for queue * Uses librte_ring to be thread safe Tx queue init: No need for buffer pool Start Getting Packets
  • 23. 23Confidential Igb_uio • For Intel Gigabit NICs. – Simple enough to work for most NICs, nonetheless. • Basically: – Calls pci_enable_device. – Enables bus mastering on the device (pci_set_master). – Requests all BARs and mas them using ioremap. – Setups ioports. – Sets the dma mask to 64-bit. • Code to support SRIOV and xen.
  • 24. 24Confidential rte_ring • Fixed sized, “lockless”, queue ring • Non Preemptive. • Supports multiple/single producer/consumer, and bulk actions. • Uses: – Single array of pointers. – Head/tail pointers for both producer and consumer (total 4 pointers). • To enqueue (Just like dequeue): – Until successful: o Save in local variable the current head_ptr. o head_next = head_ptr + num_objects o CAS the head_ptr to head_next – Insert objects. – Until successful: o Update tail_ptr = head_next + 1 when tail_ptr == head_ptr • Analysis: – Light weight. – In theory, both loops are costly. – In practice, as all threads are cpu bound, the amortized cost is low for the first loop, and very unlikely at the second loop.
  • 25. 25Confidential rte_mempool • Smart memory allocation • Allocate the start of each memory buffer at a different memory channel/rank: – Most applications only want to see the first 64 bytes of the packet (Ether+ip header). – Requests for memory at different channels, same rank, are done concurrently by the memory controller. – Requests for memory at different ranks can be managed effectively by the memory controller. – Pads objects until gcd(obj_size,num_ranks*num_channels) == 1. • Maintain a per-core cache, bulk requests to the mempool ring. • Allocate memory based on: – NUMA – Contiguous virtual memory (Means also contiguous physical memory for huge pages). • Non- preemptive. – Same lcore must not context switch to another task using the same mempool.
  • 27. 27Confidential Linux Kernel Benchmarks, single core Feb. 2016July 2014Benchmark 14.8 Mpps4 MppsTX 12 Mpps (experimental)6.4 MppsRX (Dump at driver) 2 Mpps1 MppsL3 Forwarding (RX+Filter+TX) 12 Mpps6 MppsL3 forwarding (Multi Core) * Red Hat The 100Gbit/s Challenge Jesper Dangaard Brouer et al. DevConf Feb. 2016.
  • 28. 28Confidential DPDK Benchmarks (March 2016) Multi CoreSingle CoreBenchmark Linear Increase22 MppsL3 forwarding (PHY-PHY) Linear Increase11 MppsSwitch Forwarding (PHY-OVS-PHY) Near Linear Increase3.4 MppsVM Forwarding (PHY-VM-PHY) Linear Increase2 MppsVM to VM * Intel Open Network Platform Release 2.1 Performance Test Report March 2016. • All tests with: • 4X40Gb ports • E5-2695 V4 2.1Ghz Processor • 16X1GB Huge Pages, 2048X2MB Huge Pages
  • 31. 31Confidential DPDK Advantages • Best forwarding performance solution to/from PHY/process/VM to date. – Best single core performance. – Scales: linear performance increase per core. • Active and longstanding community (from 2012). – DPDK.org – Full of tutorials, examples and complementary features. • Active popular products – OVS-DPDK is the main solution for high speed networking in openstack. – 6WIND. – TRex. • Great virtualization support. – Deploy at the host of the guest environment.
  • 32. 32Confidential DPDK Disadvantages • Security • Isolated ecosystem: – Hard to use linux kernel infrastructure (While there are precedents). • Requires modified code in applications to use: – DPDK processes use a very specific API. – DPDK application can’t interact transparently with Linux processes (important for transparent networking applications like Firewall , DDOS mitigation, etc.). o Solved for interaction with VMs by the vhost Library. • Requires Huge pages (XDP doesn’t).