SlideShare a Scribd company logo
Porting IDS/IPS Applications to DPDK
Platform
2
Agenda
• IDS/IPS Application Packet Pipeline
• Explore into
Bottlenecks Solutions
I/O PCIe Slot-NUMA map
CPU Custom Libraries
Application Packet Filter, Lookup, Distribution and
Modeling
Ecosystem ViritI/O, Proc-Info, SIMD, Custom Lookup
Look into Suricata
3
Worker Thread
RX NIC
Capture Decode Stream Detect Output
TX NIC
Suricata is a free and open source, mature, fast and robust network threat detection engine. The Suricata engine is capable of real time intrusion
detection (IDS), inline intrusion prevention (IPS), network security monitoring (NSM) and offline pcap processing.
Suricata's multi-threaded architecture can support high performance multi-core and multiprocesser systems, Jonkman said." -- (Computerworld)
Flow identification
Stream Identification
Stream Capture
Buffers & Flows limit
Copies
Exact match
Pattern match
4
IDS-IPS in Passive & Active mode
Network I/O (Multiple 10Gbit/s Interfaces)
Control, Configuration
and Stats (CLI and
Socket interface)
High Speed User Space TCP
and SSL stack configured in
proxy mode.
Clear Text
Encrypted Encrypted
Dive into Bottlenecks
Do we need to re-invent the Intrusion Detection, Intrusion Prevention or Network
Security Monitoring utility?
6
SoC using PCIe virtual dev library
Network I/O
Config
&
Mgmt
TCP SSL
StackDPDK PMD &
MBUF Manager
User Space
• Keep packet in User Space
• Reduce latency between NIC to NIC
• Smart Filter
DPDK PMD library to rescue the I/O bottleneck
7
0
100
200
300
400
500
600
700
800
900
1000
64 byte RX 64 byte TX 1500 byte RX 1500 byte TX
480
150
780
220
625
273
944
473
MBITS/SEC
PACKET SIZE
Packet
NIC to NIC PCIe 1 queue
SOC allowed up to 32 bi directional PCIe user space queues
Worker Threads Offloading
DPDK worker threads for running RX-TX with Suricata workers (CPU & Application
Model)
8
9
Suricata using DPDK
RX NIC TX NIC
Capture Decode Stream Detect Output
Worker Threads
Capture Decode Stream Detect Output
RSS HASH
Parse for
metadata
Match for
rule set
Buffer & Zero
Copy
10
Improvement (No Pkt Process)
DPDK AF-Workers DPDK AF-Workers
Byte 64 Byte 1500
P1 RX 1000 499 1000 826
P1 TX 382 251 1000 416
P2 RX 1000 475 1000 825
P2 TX 382 213 1000 472
1000
499
1000
826
382
251
1000
416
1000
475
1000
825
382
213
1000
472
0
200
400
600
800
1000
1200
MBITS/SEC
P1 RX P1 TX P2 RX P2 TX
Eco-System
12
Setup
Super micro 4 core Xeon at 2.6Ghz and onboard 2 * 1G i350 (2x PCIe Gen2)
DPDK 1 core - 2 worker cores, 1 DPDK RX-TX. AF-Workers - 3 worker cores
• Distributed lcore and NIC. ie: single socket interfaces single NIC (4 * 10G).
• Single Machine for processing, filter, flow and Suricata.
• Reduced packet latency, since there no inter NIC-NIC transmission.
• Localized user DPDK and custom Suricata helps in zero copy.
Learnings
Reality Check - All Done?
Unexpected Follow on!!!
14
Feed back
1. Works Partially
2. Worse Throughput
EXPECTATION
15
./testSlot.py
+++ DPDK NIC to Physical slot Mapping +++
++++++++++++++++++++++++++++++++++
Bus: 04:00.2 Slot: 5 Node: 0 Driver: igb_uio
Bus: 08:00.3 Slot: 1 Node: 0 Driver: igb_uio
Bus: 08:00.0 Slot: 1 Node: 0 Driver: igb_uio
Bus: 83:00.1 Slot: 2 Node: 1 Driver: igb_uio
Bus: 85:00.1 Slot: 4 Node: 1 Driver: igb_uio
Bus: 85:00.2 Slot: 4 Node: 1 Driver: igb_uio
Bus: 85:00.3 Slot: 4 Node: 1 Driver: igb_uio
----------------------------------------------------
PCIe address: Numa Slot: Physical slot
16
• populateNodeInfo() & displayNodeInfo()
INFO: DPDK Ver: DPDK 16.11.0 rte_eal_process_type: Primary!!
NODE: 0 -- PORT --
^^ 1G ports: 0x0 count: 0 // nodePtr->port1G_map[0]
^^ 10G ports: 0x7 count: 3 // nodePtr->port10G_map[0]
^^ 40G ports: 0x0 count: 0 // nodePtr->port40G_map[0]
NODE: 1 -- PORT --
^^ 1G ports: 0x0 count: 0// nodePtr->port1G_map[1]
^^ 10G ports: 0xf count: 4 // nodePtr->port10G_map[1]
^^ 40G ports: 0x0 count: 0 // nodePtr->port40G_map[1]
• port1G_init, port10G_init, port40G_init
• getCount1gPorts, getCount10gPorts, getCount40gPorts
Numa Wrapper - Coremask & PortMask per NUMA
17
VirtIO Hurdles
1. Device start & stop not working
2. Link state set up & down fails
3. LSR call back does not work
4. Application proc-info does not shows stats for right primary
application.
5. Application proc-info corrupts rte_dev_data when pcap in use
Timer Hurdles
18
1. Timers used in
1. Reassembly – IP
2. Protocols – TCP, Path Monitoring, ARP
3. Scheduling, Event expiry
4. Stats
2. Latency
1. tick to Software
2. Expiry
3. Application notification
4. Starting
5. Threshold tick values (demo)
19
__attribute__((always_inline)) inline int avxChecksumV2(const char * const target, size_t targetLength)
{
unsigned int checksum = 0;
size_t offset = 0;
uint16_t *buff = NULL;
__m256i vec, lVec, hVec, sum;
if(targetLength >= 32) {
for(; offset <= targetLength - 32; offset += 32) {
vec = _mm256_loadu_si256((__m256i const *)(target + offset));
lVec = _mm256_unpacklo_epi16(vec, (__m256i) {(uint64_t) 0x0, (uint64_t) 0x0, (uint64_t)
0x0, (uint64_t) 0x0});
hVec = _mm256_unpackhi_epi16(vec, (__m256i) {(uint64_t) 0x0, (uint64_t) 0x0, (uint64_t)
0x0, (uint64_t) 0x0});
sum = _mm256_add_epi32(lVec, hVec);
sum = _mm256_hadd_epi32(sum, sum);
sum = _mm256_hadd_epi32(sum, sum);
sum = _mm256_hadd_epi32(sum, sum);
checksum += _mm256_extract_epi16(sum, 0) + _mm256_extract_epi16(sum, 15);
}
}
if (targetLength - offset >= 2) {
for(;(targetLength -offset) >= 2; offset+=2)
checksum += (*(uint16_t *) ((uint8_t *) target + offset) );
offset -= 2;
}
if (targetLength - offset)
checksum += *((uint8_t *) target + offset);
checksum = ((checksum & 0xffff0000) >> 16) + (checksum & 0xffff);
checksum = ((checksum & 0xffff0000) >> 16) + (checksum & 0xffff);
return checksum;
}
Comparison:
0.468157 task-clock (msec) # 0.461 CPUs utilized ( +- 1.66% )
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
42 page-faults # 0.091 M/sec ( +- 0.58% )
5,63,229 cycles # 1.203 GHz ( +- 1.82% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
4,04,106 instructions # 0.72 insns per cycle ( +- 0.43% )
78,541 branches # 167.765 M/sec ( +- 0.50% )
3,538 branch-misses # 4.50% of all branches ( +- 0.30% )
0.446172 task-clock (msec) # 0.471 CPUs utilized ( +- 4.43% )
0 context-switches # 0.000 K/sec
0 cpu-migrations # 0.000 K/sec
42 page-faults # 0.095 M/sec ( +- 0.47% )
5,39,201 cycles # 1.209 GHz ( +- 3.08% )
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
4,03,597 instructions # 0.75 insns per cycle ( +- 0.64% )
78,162 branches # 175.184 M/sec ( +- 0.53% )
3,199 branch-misses # 4.09% of all branches ( +- 6.27% )
SIMD Checksum Vectorization isn’t something that will work for all programs,
but if yours is data-intensive, perhaps running simulations,
processing graphics, or repeated financial calculations,
consider vectorization. It might only take a slight rewrite of
your program’s data structures and layout to have the
compiler auto-vectorize it.
Guest OS
Final Solution
21
References
• https://blog.selectel.com/introduction-dpdk-architecture-principles/
• https://github.com/vipinpv85?tab=repositories
• http://blog.talosintelligence.com/2010/07/innovation-you-keep-using-
that-
word.html?utm_source=feedburner&utm_medium=feed&utm_campaig
n=Feed:+Vrt+(Sourcefire+VRT+-
+Vulnerability+Research,+Snort+Rules+and+Explosions
• http://blog.talosintelligence.com/2010/06/single-threaded-data-
processing.html
22
PROC-INFO
Enhancement for fetching Primary Application Port stats
23
Primary Port Details
• Intf: 0 Speed: 10000 Duplex: Full Status: up
• - driver:; - if_index: 0
• - driver: Pcap PMD; - if_index: 5
• - driver: net_virtio PMD; - if_index: 7
• -- ADDR - domain:bus:devid:function
0000:0000:06.0; == PCI ID - vendor:device:sub-
vendor:sub-device 1af4:1000:1af4:0001
• - driver: net_virtio; - if_index: 0
• -- ADDR - domain:bus:devid:function
0000:0000:07.0; == PCI ID - vendor:device:sub-
vendor:sub-device 1af4:1000:1af4:0001
• Intf: 0 Speed: 10000 Duplex: Full Status: up
• - driver: net_virtio; - if_index: 0
• - driver: Pcap PM; - if_index: 5
• - driver: Pcap PMD; - if_index: 7
• -- ADDR - domain:bus:devid:function
0000:0000:06.0; == PCI ID - vendor:device:sub-
vendor:sub-device 1af4:1000:1af4:0001
• - driver: net_virtio; - if_index: 0
• -- ADDR - domain:bus:devid:function
0000:0000:07.0; == PCI ID - vendor:device:sub-
vendor:sub-device 1af4:1000:1af4:0001
24
25
New proc-info stats• ######################## NIC statistics for port 0 ########################
• - name: eth_pcap0; - DPDK Port id: 05: veth_fp_adk1@veth_k_adk1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group
default qlen 1000; link/ether a2:dd:48:c2:65:33 brd ff:ff:ff:ff:ff:ff
• ######################## NIC statistics for port 1 ########################
• - name: eth_pcap1; - DPDK Port id: 17: veth_fp_adk2@veth_k_adk2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group
default qlen 1000; link/ether 0e:42:c4:8b:bf:4c brd ff:ff:ff:ff:ff:ff
• ######################## NIC statistics for port 2 ########################
• - name: 0000:00:06.0; - DPDK Port id: 2; - numa node: -1 mtu: 1500 dev_started: 1 promiscuous: 1; - dev_link: speed: 10000 duplex: 1 autoneg: 0 status: 1; - kdrv: 1
• - mac_addrs: 52:54:00:c3:1d:a8; - min_rx_buf_size: 2176; - all_multicast: 0 dev_flags: 1; - nb_rx_queues: 1 nb_tx_queues: 1
• ######################## NIC statistics for port 3 ########################
• - name: 0000:00:07.0; - DPDK Port id: 3; - numa node: -1 mtu: 1500 dev_started: 1 promiscuous: 1; - dev_link: speed: 10000 duplex: 1 autoneg: 0 status: 1; - kdrv: 1
• - mac_addrs: 52:54:00:b5:95:1b; - min_rx_buf_size: 2176; - all_multicast: 0 dev_flags: 1; - nb_rx_queues: 1 nb_tx_queues: 1
Custom Lookup
16pt Intel Clear Subhead
26
27
28
1600 950 625
8000
2400 2970
30000
12000
7500
0
5000
10000
15000
20000
25000
30000
35000
1024 2048 4096
CONNECTION/SEC
KEY SIZE
Linked List Array Hash Array

More Related Content

What's hot (20)

PPT
Layer 3 messages (2G)
Abdulrahman Fady
 
PPT
Ericsson optimization opti
Terra Sacrifice
 
PPTX
VLSI design flow.pptx
Krishna Kishore
 
PDF
UVM Methodology Tutorial
Arrow Devices
 
PPTX
Vxlan deep dive session rev0.5 final
KwonSun Bae
 
PDF
Understanding cts log_messages
Mujahid Mohammed
 
PPTX
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...
VLSI SYSTEM Design
 
DOC
08 gsm bss network kpi (immediate assignment success rate) optimization manual
tharinduwije
 
PPTX
Powerplanning
VLSI SYSTEM Design
 
PDF
CCNA CheatSheet
Eng. Emad Al-Atoum
 
PDF
VMware NSX and Arista L2 Hardware VTEP Gateway Integration
Bayu Wibowo
 
PPTX
Vlsi physical design automation on partitioning
Sushil Kundu
 
PDF
Sta by usha_mehta
Usha Mehta
 
PDF
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
PPTX
Evo bsc-8200
Zineddine Menani
 
PPT
Abis Over IP/Abis Optimization on-site Workshop
etkisizcom
 
PDF
2021 NPM CENTRAL PROGRAM OVERVIEW_Rev1.pdf
M.Subhi Rosdian
 
PDF
SRv6 study
Hiro Mura
 
DOC
10 gsm bss network kpi (uplink downlink balance) optimization manual[1].doc
tharinduwije
 
Layer 3 messages (2G)
Abdulrahman Fady
 
Ericsson optimization opti
Terra Sacrifice
 
VLSI design flow.pptx
Krishna Kishore
 
UVM Methodology Tutorial
Arrow Devices
 
Vxlan deep dive session rev0.5 final
KwonSun Bae
 
Understanding cts log_messages
Mujahid Mohammed
 
Define Width and Height of Core and Die (http://www.vlsisystemdesign.com/PD-F...
VLSI SYSTEM Design
 
08 gsm bss network kpi (immediate assignment success rate) optimization manual
tharinduwije
 
Powerplanning
VLSI SYSTEM Design
 
CCNA CheatSheet
Eng. Emad Al-Atoum
 
VMware NSX and Arista L2 Hardware VTEP Gateway Integration
Bayu Wibowo
 
Vlsi physical design automation on partitioning
Sushil Kundu
 
Sta by usha_mehta
Usha Mehta
 
A Deep Dive into Apache Cassandra for .NET Developers
Luke Tillman
 
Evo bsc-8200
Zineddine Menani
 
Abis Over IP/Abis Optimization on-site Workshop
etkisizcom
 
2021 NPM CENTRAL PROGRAM OVERVIEW_Rev1.pdf
M.Subhi Rosdian
 
SRv6 study
Hiro Mura
 
10 gsm bss network kpi (uplink downlink balance) optimization manual[1].doc
tharinduwije
 

Similar to DPDK layer for porting IPS-IDS (20)

PPTX
Dpdk applications
Vipin Varghese
 
PDF
Buiding a better Userspace - The current and future state of QEMU and KVM int...
aliguori
 
PPTX
Mirabilis Design- NoC Webinar- 15th-Oct 2024
Deepak Shankar
 
PPTX
Performance out of the box developers
Michelle Holley
 
PDF
6 profiling tools
videos
 
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
 
PDF
100 M pps on PC.
Redge Technologies
 
PPTX
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
PDF
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
PPTX
Linux Network Stack
Adrien Mahieux
 
PDF
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
 
PDF
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
PDF
MeetBSD2014 Performance Analysis
Brendan Gregg
 
PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
PDF
P4/FPGA, Packet Acceleration
Liz Warner
 
PPTX
The n00bs guide to ovs dpdk
markdgray
 
PDF
Dev Conf 2017 - Meeting nfv networking requirements
Flavio Leitner
 
PDF
Deep Dive on Amazon EC2 Instances (March 2017)
Julien SIMON
 
Dpdk applications
Vipin Varghese
 
Buiding a better Userspace - The current and future state of QEMU and KVM int...
aliguori
 
Mirabilis Design- NoC Webinar- 15th-Oct 2024
Deepak Shankar
 
Performance out of the box developers
Michelle Holley
 
6 profiling tools
videos
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
 
100 M pps on PC.
Redge Technologies
 
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PROIDEA
 
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
High Performance Networking Leveraging the DPDK and Growing Community
6WIND
 
Linux Network Stack
Adrien Mahieux
 
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
ScyllaDB
 
DPDK Integration: A Product's Journey - Roger B. Melton
harryvanhaaren
 
MeetBSD2014 Performance Analysis
Brendan Gregg
 
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
Jim St. Leger
 
P4/FPGA, Packet Acceleration
Liz Warner
 
The n00bs guide to ovs dpdk
markdgray
 
Dev Conf 2017 - Meeting nfv networking requirements
Flavio Leitner
 
Deep Dive on Amazon EC2 Instances (March 2017)
Julien SIMON
 
Ad

More from Vipin Varghese (10)

PPTX
Dynamic user trace
Vipin Varghese
 
PPTX
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
PPTX
Debug generic process
Vipin Varghese
 
PPTX
Dpdk – IoT packet analyzer
Vipin Varghese
 
PPTX
Mmap failure analysis
Vipin Varghese
 
DOCX
Dpdk frame pipeline for ips ids suricata
Vipin Varghese
 
DOCX
Poll mode driver integration into dpdk
Vipin Varghese
 
DOCX
Optimizations for ssl tls certificate lookup
Vipin Varghese
 
DOCX
Optimizations for ssl tls certificate caching on multicore
Vipin Varghese
 
DOCX
Fast i pv4 lookup using local memory
Vipin Varghese
 
Dynamic user trace
Vipin Varghese
 
Debug dpdk process bottleneck & painpoints
Vipin Varghese
 
Debug generic process
Vipin Varghese
 
Dpdk – IoT packet analyzer
Vipin Varghese
 
Mmap failure analysis
Vipin Varghese
 
Dpdk frame pipeline for ips ids suricata
Vipin Varghese
 
Poll mode driver integration into dpdk
Vipin Varghese
 
Optimizations for ssl tls certificate lookup
Vipin Varghese
 
Optimizations for ssl tls certificate caching on multicore
Vipin Varghese
 
Fast i pv4 lookup using local memory
Vipin Varghese
 
Ad

Recently uploaded (20)

PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PDF
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PDF
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
PDF
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
PDF
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Presentation about variables and constant.pptx
kr2589474
 
Activate_Methodology_Summary presentatio
annapureddyn
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
WatchTraderHub - Watch Dealer software with inventory management and multi-ch...
WatchDealer Pavel
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Infrastructure planning and resilience - Keith Hastings.pptx.pdf
Safe Software
 
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
How Agentic AI Networks are Revolutionizing Collaborative AI Ecosystems in 2025
ronakdubey419
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 

DPDK layer for porting IPS-IDS

  • 1. Porting IDS/IPS Applications to DPDK Platform
  • 2. 2 Agenda • IDS/IPS Application Packet Pipeline • Explore into Bottlenecks Solutions I/O PCIe Slot-NUMA map CPU Custom Libraries Application Packet Filter, Lookup, Distribution and Modeling Ecosystem ViritI/O, Proc-Info, SIMD, Custom Lookup
  • 3. Look into Suricata 3 Worker Thread RX NIC Capture Decode Stream Detect Output TX NIC Suricata is a free and open source, mature, fast and robust network threat detection engine. The Suricata engine is capable of real time intrusion detection (IDS), inline intrusion prevention (IPS), network security monitoring (NSM) and offline pcap processing. Suricata's multi-threaded architecture can support high performance multi-core and multiprocesser systems, Jonkman said." -- (Computerworld) Flow identification Stream Identification Stream Capture Buffers & Flows limit Copies Exact match Pattern match
  • 4. 4 IDS-IPS in Passive & Active mode Network I/O (Multiple 10Gbit/s Interfaces) Control, Configuration and Stats (CLI and Socket interface) High Speed User Space TCP and SSL stack configured in proxy mode. Clear Text Encrypted Encrypted
  • 5. Dive into Bottlenecks Do we need to re-invent the Intrusion Detection, Intrusion Prevention or Network Security Monitoring utility?
  • 6. 6 SoC using PCIe virtual dev library Network I/O Config & Mgmt TCP SSL StackDPDK PMD & MBUF Manager User Space • Keep packet in User Space • Reduce latency between NIC to NIC • Smart Filter DPDK PMD library to rescue the I/O bottleneck
  • 7. 7 0 100 200 300 400 500 600 700 800 900 1000 64 byte RX 64 byte TX 1500 byte RX 1500 byte TX 480 150 780 220 625 273 944 473 MBITS/SEC PACKET SIZE Packet NIC to NIC PCIe 1 queue SOC allowed up to 32 bi directional PCIe user space queues
  • 8. Worker Threads Offloading DPDK worker threads for running RX-TX with Suricata workers (CPU & Application Model) 8
  • 9. 9 Suricata using DPDK RX NIC TX NIC Capture Decode Stream Detect Output Worker Threads Capture Decode Stream Detect Output RSS HASH Parse for metadata Match for rule set Buffer & Zero Copy
  • 10. 10 Improvement (No Pkt Process) DPDK AF-Workers DPDK AF-Workers Byte 64 Byte 1500 P1 RX 1000 499 1000 826 P1 TX 382 251 1000 416 P2 RX 1000 475 1000 825 P2 TX 382 213 1000 472 1000 499 1000 826 382 251 1000 416 1000 475 1000 825 382 213 1000 472 0 200 400 600 800 1000 1200 MBITS/SEC P1 RX P1 TX P2 RX P2 TX
  • 12. 12 Setup Super micro 4 core Xeon at 2.6Ghz and onboard 2 * 1G i350 (2x PCIe Gen2) DPDK 1 core - 2 worker cores, 1 DPDK RX-TX. AF-Workers - 3 worker cores • Distributed lcore and NIC. ie: single socket interfaces single NIC (4 * 10G). • Single Machine for processing, filter, flow and Suricata. • Reduced packet latency, since there no inter NIC-NIC transmission. • Localized user DPDK and custom Suricata helps in zero copy. Learnings
  • 13. Reality Check - All Done? Unexpected Follow on!!!
  • 14. 14 Feed back 1. Works Partially 2. Worse Throughput EXPECTATION
  • 15. 15 ./testSlot.py +++ DPDK NIC to Physical slot Mapping +++ ++++++++++++++++++++++++++++++++++ Bus: 04:00.2 Slot: 5 Node: 0 Driver: igb_uio Bus: 08:00.3 Slot: 1 Node: 0 Driver: igb_uio Bus: 08:00.0 Slot: 1 Node: 0 Driver: igb_uio Bus: 83:00.1 Slot: 2 Node: 1 Driver: igb_uio Bus: 85:00.1 Slot: 4 Node: 1 Driver: igb_uio Bus: 85:00.2 Slot: 4 Node: 1 Driver: igb_uio Bus: 85:00.3 Slot: 4 Node: 1 Driver: igb_uio ---------------------------------------------------- PCIe address: Numa Slot: Physical slot
  • 16. 16 • populateNodeInfo() & displayNodeInfo() INFO: DPDK Ver: DPDK 16.11.0 rte_eal_process_type: Primary!! NODE: 0 -- PORT -- ^^ 1G ports: 0x0 count: 0 // nodePtr->port1G_map[0] ^^ 10G ports: 0x7 count: 3 // nodePtr->port10G_map[0] ^^ 40G ports: 0x0 count: 0 // nodePtr->port40G_map[0] NODE: 1 -- PORT -- ^^ 1G ports: 0x0 count: 0// nodePtr->port1G_map[1] ^^ 10G ports: 0xf count: 4 // nodePtr->port10G_map[1] ^^ 40G ports: 0x0 count: 0 // nodePtr->port40G_map[1] • port1G_init, port10G_init, port40G_init • getCount1gPorts, getCount10gPorts, getCount40gPorts Numa Wrapper - Coremask & PortMask per NUMA
  • 17. 17 VirtIO Hurdles 1. Device start & stop not working 2. Link state set up & down fails 3. LSR call back does not work 4. Application proc-info does not shows stats for right primary application. 5. Application proc-info corrupts rte_dev_data when pcap in use
  • 18. Timer Hurdles 18 1. Timers used in 1. Reassembly – IP 2. Protocols – TCP, Path Monitoring, ARP 3. Scheduling, Event expiry 4. Stats 2. Latency 1. tick to Software 2. Expiry 3. Application notification 4. Starting 5. Threshold tick values (demo)
  • 19. 19 __attribute__((always_inline)) inline int avxChecksumV2(const char * const target, size_t targetLength) { unsigned int checksum = 0; size_t offset = 0; uint16_t *buff = NULL; __m256i vec, lVec, hVec, sum; if(targetLength >= 32) { for(; offset <= targetLength - 32; offset += 32) { vec = _mm256_loadu_si256((__m256i const *)(target + offset)); lVec = _mm256_unpacklo_epi16(vec, (__m256i) {(uint64_t) 0x0, (uint64_t) 0x0, (uint64_t) 0x0, (uint64_t) 0x0}); hVec = _mm256_unpackhi_epi16(vec, (__m256i) {(uint64_t) 0x0, (uint64_t) 0x0, (uint64_t) 0x0, (uint64_t) 0x0}); sum = _mm256_add_epi32(lVec, hVec); sum = _mm256_hadd_epi32(sum, sum); sum = _mm256_hadd_epi32(sum, sum); sum = _mm256_hadd_epi32(sum, sum); checksum += _mm256_extract_epi16(sum, 0) + _mm256_extract_epi16(sum, 15); } } if (targetLength - offset >= 2) { for(;(targetLength -offset) >= 2; offset+=2) checksum += (*(uint16_t *) ((uint8_t *) target + offset) ); offset -= 2; } if (targetLength - offset) checksum += *((uint8_t *) target + offset); checksum = ((checksum & 0xffff0000) >> 16) + (checksum & 0xffff); checksum = ((checksum & 0xffff0000) >> 16) + (checksum & 0xffff); return checksum; } Comparison: 0.468157 task-clock (msec) # 0.461 CPUs utilized ( +- 1.66% ) 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 42 page-faults # 0.091 M/sec ( +- 0.58% ) 5,63,229 cycles # 1.203 GHz ( +- 1.82% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 4,04,106 instructions # 0.72 insns per cycle ( +- 0.43% ) 78,541 branches # 167.765 M/sec ( +- 0.50% ) 3,538 branch-misses # 4.50% of all branches ( +- 0.30% ) 0.446172 task-clock (msec) # 0.471 CPUs utilized ( +- 4.43% ) 0 context-switches # 0.000 K/sec 0 cpu-migrations # 0.000 K/sec 42 page-faults # 0.095 M/sec ( +- 0.47% ) 5,39,201 cycles # 1.209 GHz ( +- 3.08% ) <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 4,03,597 instructions # 0.75 insns per cycle ( +- 0.64% ) 78,162 branches # 175.184 M/sec ( +- 0.53% ) 3,199 branch-misses # 4.09% of all branches ( +- 6.27% ) SIMD Checksum Vectorization isn’t something that will work for all programs, but if yours is data-intensive, perhaps running simulations, processing graphics, or repeated financial calculations, consider vectorization. It might only take a slight rewrite of your program’s data structures and layout to have the compiler auto-vectorize it.
  • 21. 21 References • https://blog.selectel.com/introduction-dpdk-architecture-principles/ • https://github.com/vipinpv85?tab=repositories • http://blog.talosintelligence.com/2010/07/innovation-you-keep-using- that- word.html?utm_source=feedburner&utm_medium=feed&utm_campaig n=Feed:+Vrt+(Sourcefire+VRT+- +Vulnerability+Research,+Snort+Rules+and+Explosions • http://blog.talosintelligence.com/2010/06/single-threaded-data- processing.html
  • 22. 22
  • 23. PROC-INFO Enhancement for fetching Primary Application Port stats 23
  • 24. Primary Port Details • Intf: 0 Speed: 10000 Duplex: Full Status: up • - driver:; - if_index: 0 • - driver: Pcap PMD; - if_index: 5 • - driver: net_virtio PMD; - if_index: 7 • -- ADDR - domain:bus:devid:function 0000:0000:06.0; == PCI ID - vendor:device:sub- vendor:sub-device 1af4:1000:1af4:0001 • - driver: net_virtio; - if_index: 0 • -- ADDR - domain:bus:devid:function 0000:0000:07.0; == PCI ID - vendor:device:sub- vendor:sub-device 1af4:1000:1af4:0001 • Intf: 0 Speed: 10000 Duplex: Full Status: up • - driver: net_virtio; - if_index: 0 • - driver: Pcap PM; - if_index: 5 • - driver: Pcap PMD; - if_index: 7 • -- ADDR - domain:bus:devid:function 0000:0000:06.0; == PCI ID - vendor:device:sub- vendor:sub-device 1af4:1000:1af4:0001 • - driver: net_virtio; - if_index: 0 • -- ADDR - domain:bus:devid:function 0000:0000:07.0; == PCI ID - vendor:device:sub- vendor:sub-device 1af4:1000:1af4:0001 24
  • 25. 25 New proc-info stats• ######################## NIC statistics for port 0 ######################## • - name: eth_pcap0; - DPDK Port id: 05: veth_fp_adk1@veth_k_adk1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000; link/ether a2:dd:48:c2:65:33 brd ff:ff:ff:ff:ff:ff • ######################## NIC statistics for port 1 ######################## • - name: eth_pcap1; - DPDK Port id: 17: veth_fp_adk2@veth_k_adk2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000; link/ether 0e:42:c4:8b:bf:4c brd ff:ff:ff:ff:ff:ff • ######################## NIC statistics for port 2 ######################## • - name: 0000:00:06.0; - DPDK Port id: 2; - numa node: -1 mtu: 1500 dev_started: 1 promiscuous: 1; - dev_link: speed: 10000 duplex: 1 autoneg: 0 status: 1; - kdrv: 1 • - mac_addrs: 52:54:00:c3:1d:a8; - min_rx_buf_size: 2176; - all_multicast: 0 dev_flags: 1; - nb_rx_queues: 1 nb_tx_queues: 1 • ######################## NIC statistics for port 3 ######################## • - name: 0000:00:07.0; - DPDK Port id: 3; - numa node: -1 mtu: 1500 dev_started: 1 promiscuous: 1; - dev_link: speed: 10000 duplex: 1 autoneg: 0 status: 1; - kdrv: 1 • - mac_addrs: 52:54:00:b5:95:1b; - min_rx_buf_size: 2176; - all_multicast: 0 dev_flags: 1; - nb_rx_queues: 1 nb_tx_queues: 1
  • 26. Custom Lookup 16pt Intel Clear Subhead 26
  • 27. 27
  • 28. 28 1600 950 625 8000 2400 2970 30000 12000 7500 0 5000 10000 15000 20000 25000 30000 35000 1024 2048 4096 CONNECTION/SEC KEY SIZE Linked List Array Hash Array