SlideShare a Scribd company logo
FreeBSD/VPC
Introduction to Virtual Private Cloud
Virtualization Status
Compute Isolation Status
• bhyve(4) is a stable, performant hypervisor

• ZFS zvol passthrough to guests works well

TIP: use multiple zvols per guest and stripe IO across
devices using lvcreate(8)

• Good CPU isolation - CPU pinned for low-density
workloads

• Perfect memory isolation (guest memory is wired)
Compute Isolation
em0
Guest 1
Customer A
Guest 2
Customer B
Network Isolation Status
• Network isolation is not core to bhyve(4) today

• Use of VNET(9) for manipulating FIBS for tap(4)
interfaces is possible, but limited and not performant
Guest Workloads
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
bridge0
tap51 tap52
Incomplete Solution
• bhyve(4) guests run customer workloads

• Cloud providers need a single FIB for the underlay
network

• Guests run in isolated overlay networks

• How do you map guests to their respective overlay
network?
Multi-Host Network Isolation
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
???
Possible Solution
• Plug a tap(4) device into a guest
• Plug tap(4) device into a bridge(4)
• Plug the physical or cloned interface into the underlay
bridge(4)
if_bridge(4) Approach
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
tap51
bridge0
tap50 tap52
bridge2bridge1
Possible Solution++
• Plug a tap(4) device into a guest
• Plug tap(4) device into a bridge(4)
• Plug a vxlan(4) interface into a per-subnet bridge(4)
• Plug the vxlan(4) into an underlay bridge(4) instance
• Plug the physical or cloned interface into the underlay
bridge(4)
Fuster Cluck Isolation
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
bridge1
tap51 tap52tap50
bridge0
vxlan1vxlan0
bridge2
Sad Performance
• Performance was "uninteresting"
• 1-2Gbps?
Problems with

tap(4)/bridge(4)/vxlan(4)/VNET(9)
• tap(4) is slow

• bridge(4) is slow

• vxlan(4) sends received packets through ip_input()
twice (i.e. "sub-optimal")

• VNET(9) virtualizes underlay networks, not overlay networks

• How do you ARP between VMs in the overlay network?

• How do you perform vxlan(4) encap?
uvxbridge(8) POC
uvxbridge(8) POC provides a high PPS interface and
guest VXLAN encapsulation using netmap(9)/
ptnetmap(4):



https://github.com/joyent/uvxbridge
Guest Workloads
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
tap51 tap52tap50
uvxbridge
Reasonable Performance
• 21Gbps guest-to-guest on the same host
• 15Gbps guest-to-guest across the wire
• Supports AES PSK encryption for cross-DC traffic
• Incomplete, but a successful POC
New Approach
• Build a network isolation subsystem aligned with the
capabilities of network hardware
• Reduce the number of times a packet is copied
• Reduce context switches
• Build a bottom-up abstraction rooted in the capabilities of
hardware, not a hardware implementation defined in terms
of an administrative policy
FreeBSD/VPC
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1vpcsw0
FreeBSD/VPC
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1
vpcsw0
vpcp0
vpcp1
vpcp3 vpcp4
vpcp5
FreeBSD/VPC Multi-Host
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
???
VXLAN to the Rescue
• Encapsulates all IP packets as UDP

• Adds a preamble to IP packet

• Tags packets and with a VXLAN ID, known as a VNI

• VXLAN is similar to VLAN tagging, but embeds tagging in
the IP header, not in the L2 frame
VXLAN
Encapsulated Ethernet Packet
Physical Ethernet Frame
1500B
OuterFrameChecksum
OuterEthernetHeader
OuterIPHeader
OuterUDPHeader
VXLANHeader
InnerSourceMAC(SMAC)
InnerDestMAC(DMAC)
802.1QHeader
Payload
EtherType
FreeBSD/VPC Multi-Host
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VNI 987
VNI 123 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
vpc(4) Interfaces
• vpcsw(4) - switches packets - one switch per customer per host,
multiple subnets supported in the same switch

• vmnic(4) - dedicated guest NIC, looks like a virtio network device
to guests

• vpcp(4) - plugs vmnic(4) ports into vpcsw(4) switches

• vpci(4) - Non-bhyve(4) interface, usable in jails(2)

• ethlink(4) - Performs unencapsulated packet forwarding, wraps
a cloned or physical ethernet interface

• vpclink(4) - Performs VXLAN encapsulation
Interface Design
ioctl(2)
• Initially used ioctl(2)
• Unable to completely secure via capsicum(4) - able to
specify flags, but not scope the target device
• ioctl(2) is the kernel

equivalent of an HTTP PUT,

the interface for everything

without a design



"Hi ifconfig(8), I'm

looking at you."
One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?



One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with ...

One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with Jackson Pollock.

One Device Per Object?
• New device in /dev for every new interface:
/dev/vpcsw0

/dev/vpcsw1

/dev/vpcp0
• Query via reads to individual devices in/dev
• Control devices via writes
• Security via devd(8)?
• Using VFS ACLs to secure network primitives?
Da Vinci mashed up with Jackson Pollock.

Kernel API design hate crime.
New System Calls
• vpc_open(2) - Creates a new VPC descriptor

• vpc_ctl(2) - Manipulates VPC descriptors

• Capsicum-like, intended for privilege separation

• Intended for idempotent tooling

• Makes aggressive use of UUIDs as operator handles to
be compatible with Triton
vpc_open(2)
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
• Creates a new "VPC descriptor"
• Similar to open(2)
• Manipulate descriptor via vcp_ctl(2)
• Responds to close(2)
• Priv-sep native security semantics
• Version aware
• System Call 580
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
16 bytes, UUID-like:



type ID struct {

TimeLow uint32

TimeMid uint16

TimeHi uint16

ClockSeqHi uint8

ObjType ObjType

Node [6]byte // Default MAC address vpc(4)

}
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
• All VPC objects have a MAC address
• Reused the node component of the vpc_id to set the MAC
address
vpc_handle_type_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
typedef struct {

uint64_t vht_version:4;

uint64_t vht_pad1:4;

uint64_t vht_obj_type:8;

uint64_t vht_pad2:48;

} vpc_handle_type_t;
vpc_obj_type
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
enum vpc_obj_type {

VPC_OBJ_INVALID = 0,

VPC_OBJ_SWITCH = 1,

VPC_OBJ_PORT = 2,

VPC_OBJ_ROUTER = 3,

VPC_OBJ_NAT = 4,

VPC_OBJ_VPCLINK = 5,

VPC_OBJ_VMNIC = 6,

VPC_OBJ_MGMT = 7,

VPC_OBJ_ETHLINK = 8,

VPC_OBJ_META = 9,

VPC_OBJ_TYPE_ANY = 10,

VPC_OBJ_TYPE_MAX = 10,

};
vpc_id_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_handle_type_t obj_type,

vpc_flags_t flags);
• 16 Versions
• 255 Object Types
• 4080 Object Types ought to be enough for anybody.
vpc_flags_t
int

vpc_open(const vpc_id_t *vpc_id,

vpc_type_t obj_type,

vpc_flags_t flags);
#define VPC_F_CREATE (1ULL << 0)

#define VPC_F_OPEN (1ULL << 1)

#define VPC_F_READ (1ULL << 2)

#define VPC_F_WRITE (1ULL << 3)
NOTE: going to revisit flags to enable a bit field for
capabilities per VPC Object Type
vpc_ctl(2)
int

vpc_ctl(int vpcd, vpc_op_t op,

size_t keylen, const void *key,

size_t *vallen, void *buf);
• Only interface for manipulating a VPC object
• Available operations different per VPC Object
• Inspired by ioctl(2), capsicum(4), and HTTP
• System Call 581
vpc_ctl(2)
enum vpc_vpcsw_op_type {
VPC_VPCSW_INVALID = 0,
VPC_VPCSW_PORT_ADD = 1,
VPC_VPCSW_PORT_DEL = 2,
VPC_VPCSW_PORT_UPLINK_SET = 3,
VPC_VPCSW_PORT_UPLINK_GET = 4,
VPC_VPCSW_STATE_GET = 5,
VPC_VPCSW_STATE_SET = 6,
VPC_VPCSW_RESET = 7,
VPC_VPCSW_RESPONSE_NDV4 = 8,
VPC_VPCSW_RESPONSE_NDV6 = 9,
VPC_VPCSW_RESPONSE_DHCPV4 = 10,
VPC_VPCSW_RESPONSE_DHCPV6 = 11,
VPC_VPCSW_OP_TYPE_MAX = 11,
};
vpc_ctl(2)
enum vpc_vpcp_op_type {
VPC_VPCP_INVALID = 0,
VPC_VPCP_CONNECT = 1,
VPC_VPCP_DISCONNECT = 2,
VPC_VPCP_VNI_GET = 3,
VPC_VPCP_VNI_SET = 4,
VPC_VPCP_VTAG_GET = 5,
VPC_VPCP_VTAG_SET = 6,
VPC_VPCP_UNUSED7 = 7,
VPC_VPCP_UNUSED8 = 8,
VPC_VPCP_PEER_ID_GET = 9,
VPC_VPCP_MAX = 9,
};
vpc_ctl(2)
enum vpc_vmnic_op_type {
VPC_VMNIC_INVALID = 0,
VPC_VMNIC_NQUEUES_GET = 1,
VPC_VMNIC_NQUEUES_SET = 2,
VPC_VMNIC_UNUSED3 = 3,
VPC_VMNIC_UNUSED4 = 4,
VPC_VMNIC_UNUSED5 = 5,
VPC_VMNIC_UNUSED6 = 6,
VPC_VMNIC_ATTACH = 7,
VPC_VMNIC_MSIX = 8,
VPC_VMNIC_FREEZE = 9,
VPC_VMNIC_UNFREEZE = 10,
VPC_VMNIC_OP_TYPE_MAX = 10,
};
What's it all mean?
• Network interfaces are first-class objects
• Network interfaces are pluggable
• Configurations can be arbitrarily complex
• All vpc(4) interfaces are iflib(9) and can be
tcpdump(8)'ed
• Performance is nice
Tooling
Tooling
• Able to be cross-compiled
• Developer-friendly
• Stable ABI
• Idempotent Operations
• No a priori knowledge of the target
OS required (i.e. no headers required
to cross-compile tooling)
Tooling
• Able to be cross-compiled
• Developer-friendly
• Stable ABI
• Idempotent Operations
• No a priori knowledge of the target
OS required (i.e. no headers required
to cross-compile tooling)
ELI5: vpc(4) edition
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VNI 987
VNI 123 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
ELI5: vpc(4) Assumptions
• Guest is running Ubuntu or CentOS Linux

• Multi-Queue TX/RX in Host

• Multiple network queues available in the guest

• Underlay hosts can pass traffic

• Overlay hosts are in the same subnet
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic0 vmnic2
vpcsw1
vpcsw0
vpcp0
vpcp1
vpcp3 vpcp4
vpcp5
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic2
vpcsw1
vpcp3 vpcp4
vpcp5
ELI5: vpcsw(4) Setup
VNI=123

VLAN=456

VPCSW0_ID=da64c3f3-095d-91e5-df01-5aabcfc52468
vpc switch create 

--switch-id=${VPCSW0_ID} 

--vni=${VNI} 

--vlan=${VLAN}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vpcsw1
ELI5: vmnic(4) Setup
VMNIC0_ID=07f95a11-6788-2ae7-c306-ba95cff1db38

VPCP0_ID=fd436f9c-1f77-11e8-8002-0cc47a6c7d1e



vpc vmnic create 

--vmnic-id=${VMNIC0_ID}



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--port-id=${VPCP0_ID}



vpc switch port connect 

--port-id=${VPCP0_ID} 

--interface-id=${VMNIC0_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
vpcsw1
vpcp3
ELI5: vmnic(4) Setup
VMNIC1_ID=a774ba3a-1f77-11e8-8006-0cc47a6c7d1e

VPCP1_ID=0ebf50e1-1f79-11e8-8002-0cc47a6c7d1e



vpc vmnic create 

--vmnic-id=${VMNIC1_ID}



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--port-id=${VPCP1_ID}



vpc switch port connect 

--port-id=${VPCP1_ID} 

--interface-id=${VMNIC1_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1 vmnic2
vpcsw1
vpcp3 vpcp4
ELI5: ethlink(4) Setup
ETHLINK0_ID=5c4acd32-1b8d-11e8-b408-0cc47a6c7d1e

UPLINK_PORT_ID=ea58b648-203b-a707-cdf6-7a552c8d5295

UPLINK_IF=em0



vpc switch port add 

--switch-id=${VPCSW0_ID} 

--uplink 

--port-id=${UPLINK_PORT_ID} 

--l2-name=${UPLINK_IF} 

--ethlink-id=${ETHLINK0_ID}
ELI5: vpc(4) edition
em0
Guest 3
Customer B
Guest 2
Customer B
vmnic1
ethlink0
vmnic2
vpcsw1
vpcp3 vpcp4
vpcp5
ELI5: vpc(4) + VXLAN
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VLAN 456
VNI 987
VNI 123
VTAG 456 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
10.65.5.161
10.65.5.162
ELI5: vpclink(4) Setup
• How do you do ARP or IPv6 Neighbor Discovery?
• Broadcast?
• Multicast?
ELI5: vpc(4) edition
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
VNI 123
VLAN 456
VNI 987
VNI 123
VTAG 456 VNI 987
VXLAN
Packets
em0
Guest 1
Customer A
Guest 3
Customer B
Guest 2
Customer B
vpclink0
vmnic0
vpcsw1vpcsw0
vmnic1 vmnic2
10.65.5.161
10.65.5.162
???
VTEP: VXLAN Tunnel End Point
• vpcsw(4) traps broadcast packets
• If the packet is:
• a broadcast packet and
• either an IPv4 ARP or IPv6 ND packet

Packet is added to a knote (part of kqueue(2)) and passed to all
kqueue(2) subscribers filtering for EVFILT_VPC filters
• Packet payload is stored in user-allocated buffer pointed to by ext[0]
• Userland utility parses packet and performs lookup
• Overlay and underlay forwarding information written back to the kernel
via vpc_ctl(2)
• vpcsw(4) caches forwarding information for the src/dst MAC tuple
Ongoing Work
• Firewalling - integrated at vpcp(4)

• Routing

• NAT

• Userland Control Plane (including setup and teardown of
bhyve(4) guests via something not a shell script)
vpc(8)
% doas vpc vm create
% doas vpc vm run
• bhyve(4) VM creation and launch tool
• Unifies network isolation with compute isolation
Desktop Software + vpc(4)
• vpc(4) + NAT == Desktop Software Hotness

• Ability to sandbox VMs with no prior knowledge of the
host's networking

• Uses:

% vagrant up
% packer build
No more dependency on pf(4) or dnsmasq(8)
Code
• Kernel:

https://github.com/joyent/freebsd/tree/projects/VPC

• Kernel Libraries:

https://github.com/joyent/freebsd/tree/projects/VPC/
libexec/go/src/go.freebsd.org/sys/vpc

• Userland tooling:

https://github.com/sean-/vpc



Future home:



https://github.com/joyent-/vpc
Questions?

More Related Content

What's hot (20)

PPTX
Open v switch20150410b
Richard Kuo
 
PPTX
Docker networking basics & coupling with Software Defined Networks
Adrien Blind
 
PPTX
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...
Toshiaki Hatano
 
PPTX
Docker Networking
Kingston Smiler
 
PPTX
Docker networking
lakshman kumar Vit.Lakshman
 
PDF
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...
The Linux Foundation
 
PDF
How VXLAN works on Linux
Etsuji Nakai
 
PDF
Osdc2014 openstack networking yves_fauser
yfauser
 
PDF
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
The Linux Foundation
 
PPTX
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
James Denton
 
PPTX
XenTT: Deterministic Systems Analysis in Xen
The Linux Foundation
 
PDF
Linux Tag 2014 OpenStack Networking
yfauser
 
PPTX
The Basic Introduction of Open vSwitch
Te-Yen Liu
 
PPTX
Training open stack networking -neutron
Haifeng Yan (颜海峰)
 
PDF
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
The Linux Foundation
 
PDF
OpenStack networking
Sim Janghoon
 
PPTX
Single Host Docker Networking
allingeek
 
PDF
SDNDS.TW Mininet
NCTU
 
PDF
OpenStack Neutron Tutorial
mestery
 
Open v switch20150410b
Richard Kuo
 
Docker networking basics & coupling with Software Defined Networks
Adrien Blind
 
Linux Native VXLAN Integration - CloudStack Collaboration Conference 2013, Sa...
Toshiaki Hatano
 
Docker Networking
Kingston Smiler
 
Docker networking
lakshman kumar Vit.Lakshman
 
LCNA14: Security in the Cloud: Containers, KVM, and Xen - George Dunlap, Citr...
The Linux Foundation
 
How VXLAN works on Linux
Etsuji Nakai
 
Osdc2014 openstack networking yves_fauser
yfauser
 
XPDS14: OpenXT - Security and the Properties of a Xen Virtualisation Platform...
The Linux Foundation
 
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
James Denton
 
XenTT: Deterministic Systems Analysis in Xen
The Linux Foundation
 
Linux Tag 2014 OpenStack Networking
yfauser
 
The Basic Introduction of Open vSwitch
Te-Yen Liu
 
Training open stack networking -neutron
Haifeng Yan (颜海峰)
 
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
The Linux Foundation
 
OpenStack networking
Sim Janghoon
 
Single Host Docker Networking
allingeek
 
SDNDS.TW Mininet
NCTU
 
OpenStack Neutron Tutorial
mestery
 

Similar to FreeBSD VPC Introduction (20)

PDF
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
Jim St. Leger
 
PDF
LinuxConJapan2014_makita_0_MACVLAN.pdf
DanielHanganu2
 
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
 
PDF
Net1674 final emea
VMworld
 
PPTX
VXLAN Integration with CloudStack Advanced Zone
Yoshikazu Nojima
 
PDF
WAN - trends and use cases
MarketingArrowECS_CZ
 
PDF
Buiding a better Userspace - The current and future state of QEMU and KVM int...
aliguori
 
PDF
Firewalls and Virtualization - pfSense Hangout June 2014
Netgate
 
PDF
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
PDF
Recent advance in netmap/VALE(mSwitch)
micchie
 
PDF
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
PDF
Known basic of NFV Features
Raul Leite
 
PDF
Contemporary Linux Networking
Maximilan Wilhelm
 
PDF
Ake hedman why we need to unite and why vscp is a solution to a problem
WithTheBest
 
PDF
Iot with-the-best & VSCP
Ake Hedman
 
PDF
All Bow To OpenSolaris Crossbow
Ahmed Abdalla
 
PDF
All Bow To Open Solaris Crossbow H4ck3rz Due
SinarShebl
 
PDF
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld
 
PPTX
VMware vSphere 4.1 deep dive - part 2
Louis Göhl
 
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
Jim St. Leger
 
LinuxConJapan2014_makita_0_MACVLAN.pdf
DanielHanganu2
 
VMworld 2016: vSphere 6.x Host Resource Deep Dive
VMworld
 
Net1674 final emea
VMworld
 
VXLAN Integration with CloudStack Advanced Zone
Yoshikazu Nojima
 
WAN - trends and use cases
MarketingArrowECS_CZ
 
Buiding a better Userspace - The current and future state of QEMU and KVM int...
aliguori
 
Firewalls and Virtualization - pfSense Hangout June 2014
Netgate
 
Devconf2017 - Can VMs networking benefit from DPDK
Maxime Coquelin
 
Recent advance in netmap/VALE(mSwitch)
micchie
 
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
Known basic of NFV Features
Raul Leite
 
Contemporary Linux Networking
Maximilan Wilhelm
 
Ake hedman why we need to unite and why vscp is a solution to a problem
WithTheBest
 
Iot with-the-best & VSCP
Ake Hedman
 
All Bow To OpenSolaris Crossbow
Ahmed Abdalla
 
All Bow To Open Solaris Crossbow H4ck3rz Due
SinarShebl
 
VMworld 2014: Advanced Topics & Future Directions in Network Virtualization w...
VMworld
 
VMware vSphere 4.1 deep dive - part 2
Louis Göhl
 
Ad

More from Sean Chittenden (14)

PDF
BSDCan '19 Core Update
Sean Chittenden
 
PDF
pg_prefaulter: Scaling WAL Performance
Sean Chittenden
 
PDF
Universal Userland
Sean Chittenden
 
PDF
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden
 
PDF
Codified PostgreSQL Schema
Sean Chittenden
 
PDF
PostgreSQL + ZFS best practices
Sean Chittenden
 
PDF
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Sean Chittenden
 
PDF
Production Readiness Strategies in an Automated World
Sean Chittenden
 
PDF
FreeBSD: Dev to Prod
Sean Chittenden
 
PDF
PostgreSQL on ZFS Lightning Talk
Sean Chittenden
 
PDF
Dynamic Database Credentials: Security Contingency Planning
Sean Chittenden
 
PDF
PostgreSQL High-Availability and Geographic Locality using consul
Sean Chittenden
 
PDF
Modern tooling to assist with developing applications on FreeBSD
Sean Chittenden
 
PDF
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
BSDCan '19 Core Update
Sean Chittenden
 
pg_prefaulter: Scaling WAL Performance
Sean Chittenden
 
Universal Userland
Sean Chittenden
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden
 
Codified PostgreSQL Schema
Sean Chittenden
 
PostgreSQL + ZFS best practices
Sean Chittenden
 
Incrementalism: An Industrial Strategy For Adopting Modern Automation
Sean Chittenden
 
Production Readiness Strategies in an Automated World
Sean Chittenden
 
FreeBSD: Dev to Prod
Sean Chittenden
 
PostgreSQL on ZFS Lightning Talk
Sean Chittenden
 
Dynamic Database Credentials: Security Contingency Planning
Sean Chittenden
 
PostgreSQL High-Availability and Geographic Locality using consul
Sean Chittenden
 
Modern tooling to assist with developing applications on FreeBSD
Sean Chittenden
 
Creating PostgreSQL-as-a-Service at Scale
Sean Chittenden
 
Ad

Recently uploaded (20)

PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
PDF
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Function & Procedure: Function Vs Procedure in PL/SQL
Shani Tiwari
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Dipole Tech Innovations – Global IT Solutions for Business Growth
dipoletechi3
 
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 

FreeBSD VPC Introduction

  • 3. Compute Isolation Status • bhyve(4) is a stable, performant hypervisor • ZFS zvol passthrough to guests works well
 TIP: use multiple zvols per guest and stripe IO across devices using lvcreate(8) • Good CPU isolation - CPU pinned for low-density workloads • Perfect memory isolation (guest memory is wired)
  • 5. Network Isolation Status • Network isolation is not core to bhyve(4) today • Use of VNET(9) for manipulating FIBS for tap(4) interfaces is possible, but limited and not performant
  • 6. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge0 tap51 tap52
  • 7. Incomplete Solution • bhyve(4) guests run customer workloads • Cloud providers need a single FIB for the underlay network • Guests run in isolated overlay networks • How do you map guests to their respective overlay network?
  • 8. Multi-Host Network Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B ???
  • 9. Possible Solution • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug the physical or cloned interface into the underlay bridge(4)
  • 10. if_bridge(4) Approach em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 bridge0 tap50 tap52 bridge2bridge1
  • 11. Possible Solution++ • Plug a tap(4) device into a guest • Plug tap(4) device into a bridge(4) • Plug a vxlan(4) interface into a per-subnet bridge(4) • Plug the vxlan(4) into an underlay bridge(4) instance • Plug the physical or cloned interface into the underlay bridge(4)
  • 12. Fuster Cluck Isolation em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B bridge1 tap51 tap52tap50 bridge0 vxlan1vxlan0 bridge2
  • 13. Sad Performance • Performance was "uninteresting" • 1-2Gbps?
  • 14. Problems with
 tap(4)/bridge(4)/vxlan(4)/VNET(9) • tap(4) is slow • bridge(4) is slow • vxlan(4) sends received packets through ip_input() twice (i.e. "sub-optimal") • VNET(9) virtualizes underlay networks, not overlay networks • How do you ARP between VMs in the overlay network? • How do you perform vxlan(4) encap?
  • 15. uvxbridge(8) POC uvxbridge(8) POC provides a high PPS interface and guest VXLAN encapsulation using netmap(9)/ ptnetmap(4):
 
 https://github.com/joyent/uvxbridge
  • 16. Guest Workloads em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B tap51 tap52tap50 uvxbridge
  • 17. Reasonable Performance • 21Gbps guest-to-guest on the same host • 15Gbps guest-to-guest across the wire • Supports AES PSK encryption for cross-DC traffic • Incomplete, but a successful POC
  • 18. New Approach • Build a network isolation subsystem aligned with the capabilities of network hardware • Reduce the number of times a packet is copied • Reduce context switches • Build a bottom-up abstraction rooted in the capabilities of hardware, not a hardware implementation defined in terms of an administrative policy
  • 19. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1vpcsw0
  • 20. FreeBSD/VPC em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  • 21. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 ???
  • 22. VXLAN to the Rescue • Encapsulates all IP packets as UDP • Adds a preamble to IP packet • Tags packets and with a VXLAN ID, known as a VNI • VXLAN is similar to VLAN tagging, but embeds tagging in the IP header, not in the L2 frame
  • 23. VXLAN Encapsulated Ethernet Packet Physical Ethernet Frame 1500B OuterFrameChecksum OuterEthernetHeader OuterIPHeader OuterUDPHeader VXLANHeader InnerSourceMAC(SMAC) InnerDestMAC(DMAC) 802.1QHeader Payload EtherType
  • 24. FreeBSD/VPC Multi-Host em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  • 25. vpc(4) Interfaces • vpcsw(4) - switches packets - one switch per customer per host, multiple subnets supported in the same switch • vmnic(4) - dedicated guest NIC, looks like a virtio network device to guests • vpcp(4) - plugs vmnic(4) ports into vpcsw(4) switches • vpci(4) - Non-bhyve(4) interface, usable in jails(2) • ethlink(4) - Performs unencapsulated packet forwarding, wraps a cloned or physical ethernet interface • vpclink(4) - Performs VXLAN encapsulation
  • 27. ioctl(2) • Initially used ioctl(2) • Unable to completely secure via capsicum(4) - able to specify flags, but not scope the target device • ioctl(2) is the kernel
 equivalent of an HTTP PUT,
 the interface for everything
 without a design
 
 "Hi ifconfig(8), I'm
 looking at you."
  • 28. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives?
 

  • 29. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with ...

  • 30. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.

  • 31. One Device Per Object? • New device in /dev for every new interface: /dev/vpcsw0
 /dev/vpcsw1
 /dev/vpcp0 • Query via reads to individual devices in/dev • Control devices via writes • Security via devd(8)? • Using VFS ACLs to secure network primitives? Da Vinci mashed up with Jackson Pollock.
 Kernel API design hate crime.
  • 32. New System Calls • vpc_open(2) - Creates a new VPC descriptor • vpc_ctl(2) - Manipulates VPC descriptors • Capsicum-like, intended for privilege separation • Intended for idempotent tooling • Makes aggressive use of UUIDs as operator handles to be compatible with Triton
  • 33. vpc_open(2) int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); • Creates a new "VPC descriptor" • Similar to open(2) • Manipulate descriptor via vcp_ctl(2) • Responds to close(2) • Priv-sep native security semantics • Version aware • System Call 580
  • 34. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); 16 bytes, UUID-like:
 
 type ID struct {
 TimeLow uint32
 TimeMid uint16
 TimeHi uint16
 ClockSeqHi uint8
 ObjType ObjType
 Node [6]byte // Default MAC address vpc(4)
 }
  • 35. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • All VPC objects have a MAC address • Reused the node component of the vpc_id to set the MAC address
  • 36. vpc_handle_type_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); typedef struct {
 uint64_t vht_version:4;
 uint64_t vht_pad1:4;
 uint64_t vht_obj_type:8;
 uint64_t vht_pad2:48;
 } vpc_handle_type_t;
  • 37. vpc_obj_type int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); enum vpc_obj_type {
 VPC_OBJ_INVALID = 0,
 VPC_OBJ_SWITCH = 1,
 VPC_OBJ_PORT = 2,
 VPC_OBJ_ROUTER = 3,
 VPC_OBJ_NAT = 4,
 VPC_OBJ_VPCLINK = 5,
 VPC_OBJ_VMNIC = 6,
 VPC_OBJ_MGMT = 7,
 VPC_OBJ_ETHLINK = 8,
 VPC_OBJ_META = 9,
 VPC_OBJ_TYPE_ANY = 10,
 VPC_OBJ_TYPE_MAX = 10,
 };
  • 38. vpc_id_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_handle_type_t obj_type,
 vpc_flags_t flags); • 16 Versions • 255 Object Types • 4080 Object Types ought to be enough for anybody.
  • 39. vpc_flags_t int
 vpc_open(const vpc_id_t *vpc_id,
 vpc_type_t obj_type,
 vpc_flags_t flags); #define VPC_F_CREATE (1ULL << 0)
 #define VPC_F_OPEN (1ULL << 1)
 #define VPC_F_READ (1ULL << 2)
 #define VPC_F_WRITE (1ULL << 3) NOTE: going to revisit flags to enable a bit field for capabilities per VPC Object Type
  • 40. vpc_ctl(2) int
 vpc_ctl(int vpcd, vpc_op_t op,
 size_t keylen, const void *key,
 size_t *vallen, void *buf); • Only interface for manipulating a VPC object • Available operations different per VPC Object • Inspired by ioctl(2), capsicum(4), and HTTP • System Call 581
  • 41. vpc_ctl(2) enum vpc_vpcsw_op_type { VPC_VPCSW_INVALID = 0, VPC_VPCSW_PORT_ADD = 1, VPC_VPCSW_PORT_DEL = 2, VPC_VPCSW_PORT_UPLINK_SET = 3, VPC_VPCSW_PORT_UPLINK_GET = 4, VPC_VPCSW_STATE_GET = 5, VPC_VPCSW_STATE_SET = 6, VPC_VPCSW_RESET = 7, VPC_VPCSW_RESPONSE_NDV4 = 8, VPC_VPCSW_RESPONSE_NDV6 = 9, VPC_VPCSW_RESPONSE_DHCPV4 = 10, VPC_VPCSW_RESPONSE_DHCPV6 = 11, VPC_VPCSW_OP_TYPE_MAX = 11, };
  • 42. vpc_ctl(2) enum vpc_vpcp_op_type { VPC_VPCP_INVALID = 0, VPC_VPCP_CONNECT = 1, VPC_VPCP_DISCONNECT = 2, VPC_VPCP_VNI_GET = 3, VPC_VPCP_VNI_SET = 4, VPC_VPCP_VTAG_GET = 5, VPC_VPCP_VTAG_SET = 6, VPC_VPCP_UNUSED7 = 7, VPC_VPCP_UNUSED8 = 8, VPC_VPCP_PEER_ID_GET = 9, VPC_VPCP_MAX = 9, };
  • 43. vpc_ctl(2) enum vpc_vmnic_op_type { VPC_VMNIC_INVALID = 0, VPC_VMNIC_NQUEUES_GET = 1, VPC_VMNIC_NQUEUES_SET = 2, VPC_VMNIC_UNUSED3 = 3, VPC_VMNIC_UNUSED4 = 4, VPC_VMNIC_UNUSED5 = 5, VPC_VMNIC_UNUSED6 = 6, VPC_VMNIC_ATTACH = 7, VPC_VMNIC_MSIX = 8, VPC_VMNIC_FREEZE = 9, VPC_VMNIC_UNFREEZE = 10, VPC_VMNIC_OP_TYPE_MAX = 10, };
  • 44. What's it all mean? • Network interfaces are first-class objects • Network interfaces are pluggable • Configurations can be arbitrarily complex • All vpc(4) interfaces are iflib(9) and can be tcpdump(8)'ed • Performance is nice
  • 46. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  • 47. Tooling • Able to be cross-compiled • Developer-friendly • Stable ABI • Idempotent Operations • No a priori knowledge of the target OS required (i.e. no headers required to cross-compile tooling)
  • 49. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VNI 987 VNI 123 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2
  • 50. ELI5: vpc(4) Assumptions • Guest is running Ubuntu or CentOS Linux • Multi-Queue TX/RX in Host • Multiple network queues available in the guest • Underlay hosts can pass traffic • Overlay hosts are in the same subnet
  • 51. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic0 vmnic2 vpcsw1 vpcsw0 vpcp0 vpcp1 vpcp3 vpcp4 vpcp5
  • 52. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  • 53. ELI5: vpcsw(4) Setup VNI=123
 VLAN=456
 VPCSW0_ID=da64c3f3-095d-91e5-df01-5aabcfc52468 vpc switch create 
 --switch-id=${VPCSW0_ID} 
 --vni=${VNI} 
 --vlan=${VLAN}
  • 54. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vpcsw1
  • 55. ELI5: vmnic(4) Setup VMNIC0_ID=07f95a11-6788-2ae7-c306-ba95cff1db38
 VPCP0_ID=fd436f9c-1f77-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC0_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP0_ID}
 
 vpc switch port connect 
 --port-id=${VPCP0_ID} 
 --interface-id=${VMNIC0_ID}
  • 56. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vpcsw1 vpcp3
  • 57. ELI5: vmnic(4) Setup VMNIC1_ID=a774ba3a-1f77-11e8-8006-0cc47a6c7d1e
 VPCP1_ID=0ebf50e1-1f79-11e8-8002-0cc47a6c7d1e
 
 vpc vmnic create 
 --vmnic-id=${VMNIC1_ID}
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --port-id=${VPCP1_ID}
 
 vpc switch port connect 
 --port-id=${VPCP1_ID} 
 --interface-id=${VMNIC1_ID}
  • 58. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 vmnic2 vpcsw1 vpcp3 vpcp4
  • 59. ELI5: ethlink(4) Setup ETHLINK0_ID=5c4acd32-1b8d-11e8-b408-0cc47a6c7d1e
 UPLINK_PORT_ID=ea58b648-203b-a707-cdf6-7a552c8d5295
 UPLINK_IF=em0
 
 vpc switch port add 
 --switch-id=${VPCSW0_ID} 
 --uplink 
 --port-id=${UPLINK_PORT_ID} 
 --l2-name=${UPLINK_IF} 
 --ethlink-id=${ETHLINK0_ID}
  • 60. ELI5: vpc(4) edition em0 Guest 3 Customer B Guest 2 Customer B vmnic1 ethlink0 vmnic2 vpcsw1 vpcp3 vpcp4 vpcp5
  • 61. ELI5: vpc(4) + VXLAN
  • 62. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162
  • 63. ELI5: vpclink(4) Setup • How do you do ARP or IPv6 Neighbor Discovery? • Broadcast? • Multicast?
  • 64. ELI5: vpc(4) edition em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 VNI 123 VLAN 456 VNI 987 VNI 123 VTAG 456 VNI 987 VXLAN Packets em0 Guest 1 Customer A Guest 3 Customer B Guest 2 Customer B vpclink0 vmnic0 vpcsw1vpcsw0 vmnic1 vmnic2 10.65.5.161 10.65.5.162 ???
  • 65. VTEP: VXLAN Tunnel End Point • vpcsw(4) traps broadcast packets • If the packet is: • a broadcast packet and • either an IPv4 ARP or IPv6 ND packet
 Packet is added to a knote (part of kqueue(2)) and passed to all kqueue(2) subscribers filtering for EVFILT_VPC filters • Packet payload is stored in user-allocated buffer pointed to by ext[0] • Userland utility parses packet and performs lookup • Overlay and underlay forwarding information written back to the kernel via vpc_ctl(2) • vpcsw(4) caches forwarding information for the src/dst MAC tuple
  • 66. Ongoing Work • Firewalling - integrated at vpcp(4) • Routing • NAT • Userland Control Plane (including setup and teardown of bhyve(4) guests via something not a shell script)
  • 67. vpc(8) % doas vpc vm create % doas vpc vm run • bhyve(4) VM creation and launch tool • Unifies network isolation with compute isolation
  • 68. Desktop Software + vpc(4) • vpc(4) + NAT == Desktop Software Hotness • Ability to sandbox VMs with no prior knowledge of the host's networking • Uses:
 % vagrant up % packer build No more dependency on pf(4) or dnsmasq(8)
  • 69. Code • Kernel:
 https://github.com/joyent/freebsd/tree/projects/VPC • Kernel Libraries:
 https://github.com/joyent/freebsd/tree/projects/VPC/ libexec/go/src/go.freebsd.org/sys/vpc • Userland tooling:
 https://github.com/sean-/vpc
 
 Future home:
 
 https://github.com/joyent-/vpc