SlideShare a Scribd company logo
DPDK	Summit	- San	Jose	– 2017
BMAcc: Accelerating P4-
Based Data Plane with
DPDK
#DPDKSummit
PEILONG	LI *, XIAOBAN WU *, YAN	LUO *,	LIANG-
MIN WANG +,	MARC	PEPIN	+,	ATUL KWATRA+,	
AND	JOHN	MORGAN	+
*	UNIVERSITY	OF	MASSACHUSETTS	LOWELL
+ INTEL	CORPORATION
2#DPDKSummit
Agenda
u Background	of	P4	and	BMv2
u Problems	and	design	motivations
u Overview	of	BMAcc &	performance	optimizations
u Performance	evaluation
u Future	directions
u Conclusion
3
P4 Language and P4 Behavior Model
v2
u Programming	Protocol-Independent	Packet	Processors	(P4)
u Simple	semantics,	customizable	headers	&	dataplane functions
u P4	program	à P4	Frontend	Compiler	à Python	IR
à Backend	Compiler	à Target	(software	switch,	NPU,	etc.)
u P4	Behavior	Model	version	2	(BMv2)
u BMv1	is	deprecated:	requires	re-compilation	for	every	P4	program.
u BMv2	is	a	static	executable:	software	switch	consists	of	building	blocks	
(parser,	deparser,	match-action	tables,	etc.)
u Configured	by	JSON:	P4	program	à p4c-bm	à JSON	config à BMv2	(static)
4
Problems of BMv2
u A	great	software	switch	to	verify	the	“behavior”	of	a	P4	program
u Poor	performance	as	a	software	switch
u 99.993%	packet	drop	rate	with	64-byte	packet	on	10	Gbps link
u Uses	libpcap,	Linux	NIC	driver,	single-threaded,	single	RX	queue	(no	RSS),	
unnecessary	memory	copy,	etc.
add port
polling and
receiving packets
init_from_command_line_options
push packets into
in_buffer
receive
in_buffer
Thread 0
take packets from
in_buffer
P4 pipeline
push packets to
out_bufferThread 1
pipeline
out_buffer
Thread 2
Take the packets from
out_buffer and
transmit
transmit
Thread 1 Thread 3
Receiving TransmittingProcessingInitialization
5
Motivations
u Design an Accelerated Data Plane	BMAcc for P4:
u Not a P4 compiler (e.g. PISCES [1], P4ELTE [2]), a P4	target	on multicore platforms.
u A substitute of BMv2 but with line rate performance.
u PerformanceAcceleration:
u Leverages DPDK libraries and PMD driver for faster packet I/O.
u Applies multiple optimization techniques: reduced memory copy, multithreading, SSE instr.
u Transparentto P4 Programs:
u Support all P4 programsand DPDK-compatible platforms.
u Not require P4 source code, only JSON config files.
[1] Shahbaz et al. 2016. PISCES: A Programmable, Protocol-Independent Software Switch. In SIGCOMM '16.
[2] Laki et al. 2016. High speed packet forwarding compiled from protocol independent data plane specifications.
In SIGCOMM '16.
6
Design Overview and Three
Optimizations
u The Design Overview
u Three Optimizations
u Opt 1: ① PCAP à DPDK; ② Linux driver à PMD; ③ Single thread à RSS multi-queue
u Opt 2: ① rm redundantmem copy in Receive; ② rm MUTEX for each parsed header
u Opt 3: ① P4 LPM à DPDK LPM; ② SSE instructionsused byDPDK.
add port
Notify the tasks for
slave cores
init_from_command_line_options
Master: Thread 0
Recv packets from
RX queues
Pipeline Table
Lookup with SSE
Send packets to
TX ring
Slaves: Thread 1, 2, 3, ..., N
Multithreaded tasks
Detect PMD
Devices
Slave cores
waiting for tasks
DPDK EAL RX Queues
......
TX Ring
......
7
Evaluation Setup
u 2 Intel Supermicro Servers [1] w/ 2 *	10 GbE NIC cards on each server.
u SM1:
u TX: pktgen - 10 Gbps trafficwith random dst IP
u RX: count the received packets
u SM2:
u BMv2 Simple Router target.
u Lookup table,forward/drop.
DPDK
pktgen
BMv2
Simple
Router
TX
RX
RX
TX Forward
Check rule
10 Gbps
SuperMicro Server 1 SuperMicro Server 2
[1] Intel Supermicro server 1U based on Intel® Xeon® processors D-1540 @ 2.0 GHz, Niantic 82599 10 GbE NIC.
8
Test 1: Hardware Performance
Verification
u NIC	TX	Capability:
u TX end always sends 64*1024*1024 packets with 256*1024 distinct flows
u
99.85% 99.70%
98.66%
98.00%
100.00%
1500 750 64
PacketReceiving
Rate
Packet Size (Byte)
Packet Receiving Performance Test
RX Rate
u NIC	RX	Capability:
u RX end onlyreceives packets and
then drops.
u No transmissionto the out-port.
Packet Size (byte) Throughput Framing rate Duration
1500 9.8 Gbps 9.9 Gbps 79 Sec
750 9.7 Gbps 9.9 Gbps 41 Sec
64 7.7 Gbps 9.9 Gbps 4.7 Sec
9
Test 2: Performance on a Single Core
with Different Optimizations
u Vanilla	BMv2	only	supports	single	thread.
u Performance	comparison:	PCAP	(vanilla)	à Opt1	à Opt	1,2	à Opt	1,2,3
1. Almost all DPDK
versions outperform
PCAP
2. More opts à higher
performance
3. Opt1 single core
version: DPDK has
no evident
performance gain
10
Test 3: Performance on 8 Cores with
Different Optimizations
u PCAP	is	not	on	the	chart
u Performance	comparison:	Opt	1	à Opt	1,2	à Opt	1,2,3
1. 3x, 5.5x, 23x
increase over single
core vanilla BMv2
for large, mid, and
small packet sizes
2. Reach line rate for
large and mid sized
packets with 3 opts
3. How about 64-byte
packet?
11
Test 4: Find the Performance Killer for
Small Packets
u Five major stages in P4 Processing:
u RX à Parser à LPM à Deparser à TX
u Gradually add stages to this pipelineto find the biggest performance drop
u In experiment: 4 Cores, 64-byte packet
1. Perf impact breakdown:
TX: 20%
Parser: 58%
Deparser: 5%
LPM: 9%
2. TX+RX à Similar to l3fwd (80%
PRR as reported)
3. Parser – creates NEW objects for
each packet à time
consuming
Similar to
l3fwd
58%
Drop
12
Test 5: Performance with Various # of
Cores
u Take the	Opt 1,2,3 case	(the most optimized)
1. Large packet
reaches line rate w/
4 cores; mid packet
w/ 8 cores
2. Performance is
almost proportional
to # of cores
3. Not shown here, but
the results are
consistent with Opt 1
and Opt 1-2.
13
Test 6: The Performance of LPM
Processing
u P4 LPM: leverages Judy for creating and accessing dynamic arrays
u DPDK LPM: SSE instructionsand cache friendly data structures
1. DPDK-LPM is slightly better
for all cases
2. DPDK-LPM performance
benefit is more evident
when ruleset is smaller and
processing cores are fewer
because of the overhead of
Judy library.
14
Conclusion and Future Work
u The DPDK-accelerated BMv2 reaches 10 Gbps line rate for mid & large-
sized packets, and yields 23x performance boost on the small packets.
u To address the Parser impact on 64-byte packet, we need to pre-allocate	
memory spaces for Packet instances	
u We proposed multiplepractical optimizationson the BMv2 which are
instrumental to all P4-based data plane designs on multicore platforms.
u We conductedin-depth performance study on the proposed BMAcc
system	from	architecture	and	software	perspectives.
Questions?
Peilong	Li,	Ph.D.
Research	Assistant	Professor
https://peilong.github.io
UMass	Lowell	ACANETS	Lab
http://acanets.uml.edu

More Related Content

What's hot (20)

PDF
LF_DPDK17_Lagopus Router
LF_DPDK
 
PDF
LF_DPDK17_DPDK with KNI – Pushing the Performance of an SDWAN Gateway to High...
LF_DPDK
 
PDF
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
PDF
Accelerate Service Function Chaining Vertical Solution with DPDK
OPNFV
 
PDF
Performance challenges in software networking
Stephen Hemminger
 
PDF
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK
 
PDF
LF_DPDK17_Event Adapters - Connecting Devices to Eventdev
LF_DPDK
 
PDF
DPDK Support for New HW Offloads
Netronome
 
PDF
DPACC Acceleration Progress and Demonstration
OPNFV
 
PPTX
Packet Framework - Cristian Dumitrescu
harryvanhaaren
 
PDF
DPDK Summit 2015 - HP - Al Sanders
Jim St. Leger
 
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
ODP
Integrating Linux routing with FusionCLI™
Stephen Hemminger
 
PDF
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
PDF
Accelerating Networked Applications with Flexible Packet Processing
Open-NFP
 
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
PDF
OpenContrail, Real Speed: Offloading vRouter
Open-NFP
 
PPTX
Vigor 3910 docker firmware quick start
Jimmy Tu
 
PDF
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 
LF_DPDK17_Lagopus Router
LF_DPDK
 
LF_DPDK17_DPDK with KNI – Pushing the Performance of an SDWAN Gateway to High...
LF_DPDK
 
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
Accelerate Service Function Chaining Vertical Solution with DPDK
OPNFV
 
Performance challenges in software networking
Stephen Hemminger
 
LF_DPDK17_ OpenVswitch hardware offload over DPDK
LF_DPDK
 
LF_DPDK17_Event Adapters - Connecting Devices to Eventdev
LF_DPDK
 
DPDK Support for New HW Offloads
Netronome
 
DPACC Acceleration Progress and Demonstration
OPNFV
 
Packet Framework - Cristian Dumitrescu
harryvanhaaren
 
DPDK Summit 2015 - HP - Al Sanders
Jim St. Leger
 
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
Integrating Linux routing with FusionCLI™
Stephen Hemminger
 
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
Accelerating Networked Applications with Flexible Packet Processing
Open-NFP
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
OpenContrail, Real Speed: Offloading vRouter
Open-NFP
 
Vigor 3910 docker firmware quick start
Jimmy Tu
 
Network Programming: Data Plane Development Kit (DPDK)
Andriy Berestovskyy
 

Similar to LF_DPDK17_Accelerating P4-based Dataplane with DPDK (20)

PPTX
Introduction to DPDK
Kernel TLV
 
PDF
Making Networking Apps Scream on Windows with DPDK
Michelle Holley
 
PDF
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
PDF
DPDK In Depth
Kernel TLV
 
PPTX
Understanding DPDK
Denys Haryachyy
 
PDF
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
harryvanhaaren
 
PDF
A Library for Emerging High-Performance Computing Clusters
Intel® Software
 
PPTX
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
PDF
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
PPTX
dpdk acceleration techniques ncdşs şdcnş
rxtx1024
 
PDF
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
PDF
100 M pps on PC.
Redge Technologies
 
PDF
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
PDF
Dev Conf 2017 - Meeting nfv networking requirements
Flavio Leitner
 
PPTX
Dpdk applications
Vipin Varghese
 
PPTX
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
PDF
Intel dpdk Tutorial
Saifuddin Kaijar
 
PDF
DPDK Summit 2015 - Intel - Keith Wiles
Jim St. Leger
 
PDF
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
PDF
2014_DPDK_slides.pdf
eceschmidt
 
Introduction to DPDK
Kernel TLV
 
Making Networking Apps Scream on Windows with DPDK
Michelle Holley
 
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
DPDK In Depth
Kernel TLV
 
Understanding DPDK
Denys Haryachyy
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
harryvanhaaren
 
A Library for Emerging High-Performance Computing Clusters
Intel® Software
 
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
dpdk acceleration techniques ncdşs şdcnş
rxtx1024
 
What are latest new features that DPDK brings into 2018?
Michelle Holley
 
100 M pps on PC.
Redge Technologies
 
Designing HPC & Deep Learning Middleware for Exascale Systems
inside-BigData.com
 
Dev Conf 2017 - Meeting nfv networking requirements
Flavio Leitner
 
Dpdk applications
Vipin Varghese
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
Intel dpdk Tutorial
Saifuddin Kaijar
 
DPDK Summit 2015 - Intel - Keith Wiles
Jim St. Leger
 
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
2014_DPDK_slides.pdf
eceschmidt
 
Ad

More from LF_DPDK (20)

PDF
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK
 
PDF
LF_DPDK17_DPDK Membership Library
LF_DPDK
 
PDF
LF_DPDK17_Accelerating NFV with VMware's Enhanced Network Stack (ENS) and Int...
LF_DPDK
 
PDF
LF_DPDK17_testpmd: swissknife for NFV
LF_DPDK
 
PDF
LF_DPDK17_Make DPDK's software traffic manager a deployable solution for vBNG
LF_DPDK
 
PDF
LF_DPDK17_DPDK on Microsoft Azure
LF_DPDK
 
PDF
LF_DPDK17_VPP Host Stack
LF_DPDK
 
PDF
LF_DPDK17_Accelerating Packet Processing with FPGA NICs
LF_DPDK
 
PDF
LF_DPDK17_rte_security: enhancing IPSEC offload
LF_DPDK
 
PDF
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK
 
PDF
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK
 
PDF
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK
 
PDF
LF_DPDK17_rte_raw_device: implementing programmable accelerators using generi...
LF_DPDK
 
PDF
LF_DPDK17_Technical Roadmap
LF_DPDK
 
PDF
LF_DPDK17_Abstract APIs for DPDK and ODP
LF_DPDK
 
PDF
LF_DPDK17_mediated devices: better userland IO
LF_DPDK
 
PDF
LF_DPDK17_Enhanced Memory Management
LF_DPDK
 
PDF
LF_DPDK17_Reflections on Mirroring With DPDK
LF_DPDK
 
PDF
LF_DPDK17_Implementation and Testing of Soft Patch Panel
LF_DPDK
 
PDF
LF_DPDK_Accelerate storage service via SPDK
LF_DPDK
 
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK
 
LF_DPDK17_DPDK Membership Library
LF_DPDK
 
LF_DPDK17_Accelerating NFV with VMware's Enhanced Network Stack (ENS) and Int...
LF_DPDK
 
LF_DPDK17_testpmd: swissknife for NFV
LF_DPDK
 
LF_DPDK17_Make DPDK's software traffic manager a deployable solution for vBNG
LF_DPDK
 
LF_DPDK17_DPDK on Microsoft Azure
LF_DPDK
 
LF_DPDK17_VPP Host Stack
LF_DPDK
 
LF_DPDK17_Accelerating Packet Processing with FPGA NICs
LF_DPDK
 
LF_DPDK17_rte_security: enhancing IPSEC offload
LF_DPDK
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK
 
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK
 
LF_DPDK17_Flexible and Extensible support for new protocol processing with DP...
LF_DPDK
 
LF_DPDK17_rte_raw_device: implementing programmable accelerators using generi...
LF_DPDK
 
LF_DPDK17_Technical Roadmap
LF_DPDK
 
LF_DPDK17_Abstract APIs for DPDK and ODP
LF_DPDK
 
LF_DPDK17_mediated devices: better userland IO
LF_DPDK
 
LF_DPDK17_Enhanced Memory Management
LF_DPDK
 
LF_DPDK17_Reflections on Mirroring With DPDK
LF_DPDK
 
LF_DPDK17_Implementation and Testing of Soft Patch Panel
LF_DPDK
 
LF_DPDK_Accelerate storage service via SPDK
LF_DPDK
 
Ad

Recently uploaded (20)

PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 

LF_DPDK17_Accelerating P4-based Dataplane with DPDK

  • 1. DPDK Summit - San Jose – 2017 BMAcc: Accelerating P4- Based Data Plane with DPDK #DPDKSummit PEILONG LI *, XIAOBAN WU *, YAN LUO *, LIANG- MIN WANG +, MARC PEPIN +, ATUL KWATRA+, AND JOHN MORGAN + * UNIVERSITY OF MASSACHUSETTS LOWELL + INTEL CORPORATION
  • 2. 2#DPDKSummit Agenda u Background of P4 and BMv2 u Problems and design motivations u Overview of BMAcc & performance optimizations u Performance evaluation u Future directions u Conclusion
  • 3. 3 P4 Language and P4 Behavior Model v2 u Programming Protocol-Independent Packet Processors (P4) u Simple semantics, customizable headers & dataplane functions u P4 program à P4 Frontend Compiler à Python IR à Backend Compiler à Target (software switch, NPU, etc.) u P4 Behavior Model version 2 (BMv2) u BMv1 is deprecated: requires re-compilation for every P4 program. u BMv2 is a static executable: software switch consists of building blocks (parser, deparser, match-action tables, etc.) u Configured by JSON: P4 program à p4c-bm à JSON config à BMv2 (static)
  • 4. 4 Problems of BMv2 u A great software switch to verify the “behavior” of a P4 program u Poor performance as a software switch u 99.993% packet drop rate with 64-byte packet on 10 Gbps link u Uses libpcap, Linux NIC driver, single-threaded, single RX queue (no RSS), unnecessary memory copy, etc. add port polling and receiving packets init_from_command_line_options push packets into in_buffer receive in_buffer Thread 0 take packets from in_buffer P4 pipeline push packets to out_bufferThread 1 pipeline out_buffer Thread 2 Take the packets from out_buffer and transmit transmit Thread 1 Thread 3 Receiving TransmittingProcessingInitialization
  • 5. 5 Motivations u Design an Accelerated Data Plane BMAcc for P4: u Not a P4 compiler (e.g. PISCES [1], P4ELTE [2]), a P4 target on multicore platforms. u A substitute of BMv2 but with line rate performance. u PerformanceAcceleration: u Leverages DPDK libraries and PMD driver for faster packet I/O. u Applies multiple optimization techniques: reduced memory copy, multithreading, SSE instr. u Transparentto P4 Programs: u Support all P4 programsand DPDK-compatible platforms. u Not require P4 source code, only JSON config files. [1] Shahbaz et al. 2016. PISCES: A Programmable, Protocol-Independent Software Switch. In SIGCOMM '16. [2] Laki et al. 2016. High speed packet forwarding compiled from protocol independent data plane specifications. In SIGCOMM '16.
  • 6. 6 Design Overview and Three Optimizations u The Design Overview u Three Optimizations u Opt 1: ① PCAP à DPDK; ② Linux driver à PMD; ③ Single thread à RSS multi-queue u Opt 2: ① rm redundantmem copy in Receive; ② rm MUTEX for each parsed header u Opt 3: ① P4 LPM à DPDK LPM; ② SSE instructionsused byDPDK. add port Notify the tasks for slave cores init_from_command_line_options Master: Thread 0 Recv packets from RX queues Pipeline Table Lookup with SSE Send packets to TX ring Slaves: Thread 1, 2, 3, ..., N Multithreaded tasks Detect PMD Devices Slave cores waiting for tasks DPDK EAL RX Queues ...... TX Ring ......
  • 7. 7 Evaluation Setup u 2 Intel Supermicro Servers [1] w/ 2 * 10 GbE NIC cards on each server. u SM1: u TX: pktgen - 10 Gbps trafficwith random dst IP u RX: count the received packets u SM2: u BMv2 Simple Router target. u Lookup table,forward/drop. DPDK pktgen BMv2 Simple Router TX RX RX TX Forward Check rule 10 Gbps SuperMicro Server 1 SuperMicro Server 2 [1] Intel Supermicro server 1U based on Intel® Xeon® processors D-1540 @ 2.0 GHz, Niantic 82599 10 GbE NIC.
  • 8. 8 Test 1: Hardware Performance Verification u NIC TX Capability: u TX end always sends 64*1024*1024 packets with 256*1024 distinct flows u 99.85% 99.70% 98.66% 98.00% 100.00% 1500 750 64 PacketReceiving Rate Packet Size (Byte) Packet Receiving Performance Test RX Rate u NIC RX Capability: u RX end onlyreceives packets and then drops. u No transmissionto the out-port. Packet Size (byte) Throughput Framing rate Duration 1500 9.8 Gbps 9.9 Gbps 79 Sec 750 9.7 Gbps 9.9 Gbps 41 Sec 64 7.7 Gbps 9.9 Gbps 4.7 Sec
  • 9. 9 Test 2: Performance on a Single Core with Different Optimizations u Vanilla BMv2 only supports single thread. u Performance comparison: PCAP (vanilla) à Opt1 à Opt 1,2 à Opt 1,2,3 1. Almost all DPDK versions outperform PCAP 2. More opts à higher performance 3. Opt1 single core version: DPDK has no evident performance gain
  • 10. 10 Test 3: Performance on 8 Cores with Different Optimizations u PCAP is not on the chart u Performance comparison: Opt 1 à Opt 1,2 à Opt 1,2,3 1. 3x, 5.5x, 23x increase over single core vanilla BMv2 for large, mid, and small packet sizes 2. Reach line rate for large and mid sized packets with 3 opts 3. How about 64-byte packet?
  • 11. 11 Test 4: Find the Performance Killer for Small Packets u Five major stages in P4 Processing: u RX à Parser à LPM à Deparser à TX u Gradually add stages to this pipelineto find the biggest performance drop u In experiment: 4 Cores, 64-byte packet 1. Perf impact breakdown: TX: 20% Parser: 58% Deparser: 5% LPM: 9% 2. TX+RX à Similar to l3fwd (80% PRR as reported) 3. Parser – creates NEW objects for each packet à time consuming Similar to l3fwd 58% Drop
  • 12. 12 Test 5: Performance with Various # of Cores u Take the Opt 1,2,3 case (the most optimized) 1. Large packet reaches line rate w/ 4 cores; mid packet w/ 8 cores 2. Performance is almost proportional to # of cores 3. Not shown here, but the results are consistent with Opt 1 and Opt 1-2.
  • 13. 13 Test 6: The Performance of LPM Processing u P4 LPM: leverages Judy for creating and accessing dynamic arrays u DPDK LPM: SSE instructionsand cache friendly data structures 1. DPDK-LPM is slightly better for all cases 2. DPDK-LPM performance benefit is more evident when ruleset is smaller and processing cores are fewer because of the overhead of Judy library.
  • 14. 14 Conclusion and Future Work u The DPDK-accelerated BMv2 reaches 10 Gbps line rate for mid & large- sized packets, and yields 23x performance boost on the small packets. u To address the Parser impact on 64-byte packet, we need to pre-allocate memory spaces for Packet instances u We proposed multiplepractical optimizationson the BMv2 which are instrumental to all P4-based data plane designs on multicore platforms. u We conductedin-depth performance study on the proposed BMAcc system from architecture and software perspectives.