SlideShare a Scribd company logo
Usernetes Gen2
Kubernetes in Rootless Docker, with Multiple Nodes
Akihiro Suda (NTT)
akihiro.suda.cz@hco.ntt.co.jp
Container Plumbing Days (Apr 15, 2024)
https://github.com/rootless-containers/usernetes
Usernetes Gen2
Kubernetes in Rootless Docker, with Multiple Nodes
Akihiro Suda (NTT)
akihiro.suda.cz@hco.ntt.co.jp
Container Plumbing Days (Apr 15, 2024)
https://github.com/rootless-containers/usernetes
Podman, and nerdctl
• Puts container runtimes (as well as containers) in a user namespace
– UserNS: Linux kernel’s feature that maps a non-root user to a fake root
(the root privilege is limited inside the namespace)
• Can mitigate potential vulnerabilities of the runtimes
– No access to read/write other users’ files
– No access to modify the kernel (e.g., to inject invisible malware)
– No access to modify the firmware
– No ARP spoofing
– No DNS spoofing
• Also useful for shared hosts (High-performance Computing, etc.)
– Works with GPU too
3
[Introduction] Rootless Containers
e.g., runc breakout
CVE-2024-21626
(2024-01-31)
• Linux kernel’s feature to remap UIDs and GIDs
• UID=1000 gains fake root privileges (UID=0) that are enough to create
containers
• The privileges are limited inside the namespace
• No privilege for setting up vEth pairs with “real” IP addresses;
user mode TCP/IP (e.g., slirp4netns) is used instead
• Also notorious as the culprit of the several kernel CVEs,
but at least it is more secure than just running everything as the root
– Ubuntu 24.04 disables UserNS by default with the allowlist (AppArmor profiles)
4
[Introduction] User namespaces
# /etc/subuid
1000:100000:65536
0 1 65536
0 1000 100000 165535
5
[Introduction] Network namespaces
(vEth)
eth0: 172.17.0.2
(Bridge)
docker0: 172.17.0.1
(TAP)
tap0: 10.0.2.100
(vEth) (vEth)
Network namespaces
(vEth)
eth0: 172.17.0.3
(Physical Ethernet)
eth0: 192.168.0.42
(slirp4netns)
virtual IP:10.0.2.2
Network namespace + User namespace
Ethernet packets
Unprivileged socket
syscalls
• Usernetes: Rootless Kubernetes, since 2018
https://github.com/rootless-containers/usernetes
• As old as Rootless Docker (pre-release at that time) and Rootless
Podman
• The changes to upstream was merged in Kubernetes v1.22 (2021)
– Feature gate: KubeletInUserNamespace (Alpha)
– The feature gate is also used by kind, minikube, k3s, etc.
• The first generation (“Gen1”, 2018-2023) of Usernetes didn’t gain
much popularity due to its complexity (”The Hard Way”)
6
Rootless Kubernetes
7
KubeletInUserNamespace feature gate
• The gate is slightly misnomer; as it requires CRI, OCI, CNI, and
kube-proxy to be in the same UserNS too
• Quite “boring” gate to allow trivial permission errors
https://github.com/search?q=repo%3Akubernetes%2Fkubernetes%20KubeletInUserNamespace&type=code
– dmesg
– sysctl -w vm.overcommit_memory
– etc.
• The UserNS has to be created by an external runtime
– Usernetes Gen1: RootlessKit
– Usernetes Gen2: Rootless Docker/Podman/nerdctl
– LXD/Incus can be used too
Gen 1 (2018-2023) Gen 2 (2023-)
Host dependency RootlessKit Rootless Docker,
Rootless Podman, or
Rootless nerdctl
(contaiNERD CTL)
Supports kubeadm No Yes
Supports multi-node Yes, but practically No,
due to complexity
Yes
Supports hostPath
volumes
Yes Yes, for most paths,
but needs an extra config
8
Usernetes Gen 1 vs Gen 2
”The Hard Way”
Similar to `kind` and minikube,
but supports real multi-node
9
File layout
• Makefile
– Defines targets like make up to wrap docker compose up, etc.
• Dockerfile
– FROM kindest/node (kind’s node image) with a few additional ADD and RUN
• docker-compose.yaml
– Just defines a single node container
– Currently, node ports, etc. have to be statically defined here
• kubeadm-config.yaml
– Configures feature gates, CIDRs, TLS SANs, etc.
Everything is just a plain text file,
for ease of customization
10
Usage
# Bootstrap the first node
make up
make kubeadm-init
make install-flannel
# Enable kubectl
make kubeconfig
export KUBECONFIG=$(pwd)/kubeconfig
kubectl get pods -A
# Multi-node
make join-command
scp join-command another-host:~/usernetes
ssh another-host make -C ~/usernetes up kubeadm-join
make sync-external-ip
11
Multi-node Network
• VXLAN is known to work
– Just kubectl -f kube-flannel.yaml
• “External IP” is used, as the containerized kubelet’s IP is not
accessible from other nodes
– kubelet is launched with --cloud-provider=external
– node.status.addresses is dynamically patched with kubectl patch node
– node is also annotated with
flannel.alpha.coreos.com/public-ip-overwrite
– UDP checksums are recomputed with
ethtool -K flannel.1 tx-checksum-ip-generic off
12
Experimental: network acceleration
• Bypass4netns allows bypassing slirp4netns to eliminate the overhead
caused by the usermode TCP/IP
https://github.com/rootless-containers/bypass4netns
• Captures socket syscalls inside the NetNS, reconstructs the FDs outside the
NetNS, and replaces the FDs inside the NetNS, using seccomp_unotify(2)
• As fast as the host network (e.g., 1.28 Gbps vs 49.9 Gbps)
• Bypass4netns supports both connect(2) and bind(2),
but Usernetes only supports accelerating connect(2) currently
– bind(2) is already fast anyway
• Available for nerdctl
13
Experimental: network acceleration
• Pod-to-Pod communications across multiple nodes are not
accelerated yet
– VXLAN packets are generated by the kernel itself and cannot be intercepted
via seccomp_unotify(2)
– NodePorts can be still accelerated, as it does not incur VXLAN packets
Node
Pod Pod
Node
NodePort Pod
Internet
VXLAN
Fast
Slow
• iperf3 (TCP) benchmark across multiple nodes
14
Experimental: network acceleration
slirp4netns bypass4netns
Pod → Pod (same node) 37.6 Gbps 37.6 Gbps
Pod → Pod (different node) 1.40 Gbps 1.41 Gbps
Pod → NodePort (same node) 1.28 Gbps 49.9 Gbps
Pod → NodePort (different node) 1.47 Gbps 9.53 Gbps
Host → NodePort (same node) 50.2 Gbps 49.4 Gbps
Host → NodePort (different node) 9.53 Gbps 9.52 Gbps
IaaS: Amazon EC2 (m7i.2xlarge)
Versions: Ubuntu 22.04, nerdctl v2.0.0-beta.4, bypass4netns v0.4.1, Usernetes Gen2-v20240410.0 (Kubernetes v1.29)
https://github.com/rootless-containers/usernetes/discussions/329
15
Future works
• Integrate bypass4netns into Docker and Podman too
• Support accelerating Pod-to-Pod communications across different nodes,
perhaps with a sidecar proxy that would forward packets to NodePorts
• Support dynamic port forwarding
– Ports are currently statically defined in docker-compose.yaml
– If docker container update could support modifying port forwards, Usernetes coud
just watch Kubernetes service events and update the Docker ports accordingly
https://github.com/docker/cli/issues/5013
• Help other Kubernetes distributions to support rootless
– k3s has been supporting rootless since 2019, but still lacks support for multi-node setup
– Are Podman folks interested in running OKD inside Rootless Podman?

More Related Content

Similar to 20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Docker, with Multiple Nodes.pdf (20)

PPTX
Open stackaustinmeetupsept21
Brent Doncaster
 
PDF
Workshop : 45 minutes pour comprendre Docker avec Jérôme Petazzoni
TheFamily
 
PDF
Introduction to Docker, December 2014 "Tour de France" Edition
Jérôme Petazzoni
 
PDF
From dev to prod: Kubernetes on AWS (short ver.)
佑介 九岡
 
PDF
Rootless Containers
Akihiro Suda
 
PPTX
Docker and kubernetes
Dongwon Kim
 
PPTX
Docker Ecosystem on Azure
Patrick Chanezon
 
PPTX
Docker Security Overview
Sreenivas Makam
 
PPT
Docker Multi Host Networking, Rachit Arora, IBM
Neependra Khare
 
PDF
Containerize! Between Docker and Jube.
Henryk Konsek
 
PDF
Scaling Docker with Kubernetes
Carlos Sanchez
 
PDF
Network stack personality in Android phone - netdev 2.2
Hajime Tazaki
 
PDF
Podman rootless containers
Giuseppe Scrivano
 
PPTX
Docker 1.11 Presentation
Sreenivas Makam
 
PDF
Introduction to Docker at the Azure Meet-up in New York
Jérôme Petazzoni
 
PDF
[Draft] Fast Prototyping with DPDK and eBPF in Containernet
Andrew Wang
 
PDF
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
PPTX
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
PPTX
DockerCon EU 2018 Workshop: Container Networking for Swarm and Kubernetes in ...
Guillaume Morini
 
PDF
The State of Linux Containers
inside-BigData.com
 
Open stackaustinmeetupsept21
Brent Doncaster
 
Workshop : 45 minutes pour comprendre Docker avec Jérôme Petazzoni
TheFamily
 
Introduction to Docker, December 2014 "Tour de France" Edition
Jérôme Petazzoni
 
From dev to prod: Kubernetes on AWS (short ver.)
佑介 九岡
 
Rootless Containers
Akihiro Suda
 
Docker and kubernetes
Dongwon Kim
 
Docker Ecosystem on Azure
Patrick Chanezon
 
Docker Security Overview
Sreenivas Makam
 
Docker Multi Host Networking, Rachit Arora, IBM
Neependra Khare
 
Containerize! Between Docker and Jube.
Henryk Konsek
 
Scaling Docker with Kubernetes
Carlos Sanchez
 
Network stack personality in Android phone - netdev 2.2
Hajime Tazaki
 
Podman rootless containers
Giuseppe Scrivano
 
Docker 1.11 Presentation
Sreenivas Makam
 
Introduction to Docker at the Azure Meet-up in New York
Jérôme Petazzoni
 
[Draft] Fast Prototyping with DPDK and eBPF in Containernet
Andrew Wang
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
DockerCon EU 2018 Workshop: Container Networking for Swarm and Kubernetes in ...
Guillaume Morini
 
The State of Linux Containers
inside-BigData.com
 

More from Akihiro Suda (20)

PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
PDF
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
PDF
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
PDF
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
PDF
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
PDF
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
PDF
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
PDF
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
PDF
The internals and the latest trends of container runtimes
Akihiro Suda
 
PDF
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
PDF
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
PDF
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
The internals and the latest trends of container runtimes
Akihiro Suda
 
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
Ad

Recently uploaded (20)

PDF
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Simplify React app login with asgardeo-sdk
vaibhav289687
 
PDF
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
PPTX
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Add Background Images to Charts in IBM SPSS Statistics Version 31.pdf
Version 1 Analytics
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Simplify React app login with asgardeo-sdk
vaibhav289687
 
Everything you need to know about pricing & licensing Microsoft 365 Copilot f...
Q-Advise
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
iaas vs paas vs saas :choosing your cloud strategy
CloudlayaTechnology
 
Smart Doctor Appointment Booking option in odoo.pptx
AxisTechnolabs
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Open Chain Q2 Steering Committee Meeting - 2025-06-25
Shane Coughlan
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Ad

20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Docker, with Multiple Nodes.pdf

  • 1. Usernetes Gen2 Kubernetes in Rootless Docker, with Multiple Nodes Akihiro Suda (NTT) [email protected] Container Plumbing Days (Apr 15, 2024) https://github.com/rootless-containers/usernetes
  • 2. Usernetes Gen2 Kubernetes in Rootless Docker, with Multiple Nodes Akihiro Suda (NTT) [email protected] Container Plumbing Days (Apr 15, 2024) https://github.com/rootless-containers/usernetes Podman, and nerdctl
  • 3. • Puts container runtimes (as well as containers) in a user namespace – UserNS: Linux kernel’s feature that maps a non-root user to a fake root (the root privilege is limited inside the namespace) • Can mitigate potential vulnerabilities of the runtimes – No access to read/write other users’ files – No access to modify the kernel (e.g., to inject invisible malware) – No access to modify the firmware – No ARP spoofing – No DNS spoofing • Also useful for shared hosts (High-performance Computing, etc.) – Works with GPU too 3 [Introduction] Rootless Containers e.g., runc breakout CVE-2024-21626 (2024-01-31)
  • 4. • Linux kernel’s feature to remap UIDs and GIDs • UID=1000 gains fake root privileges (UID=0) that are enough to create containers • The privileges are limited inside the namespace • No privilege for setting up vEth pairs with “real” IP addresses; user mode TCP/IP (e.g., slirp4netns) is used instead • Also notorious as the culprit of the several kernel CVEs, but at least it is more secure than just running everything as the root – Ubuntu 24.04 disables UserNS by default with the allowlist (AppArmor profiles) 4 [Introduction] User namespaces # /etc/subuid 1000:100000:65536 0 1 65536 0 1000 100000 165535
  • 5. 5 [Introduction] Network namespaces (vEth) eth0: 172.17.0.2 (Bridge) docker0: 172.17.0.1 (TAP) tap0: 10.0.2.100 (vEth) (vEth) Network namespaces (vEth) eth0: 172.17.0.3 (Physical Ethernet) eth0: 192.168.0.42 (slirp4netns) virtual IP:10.0.2.2 Network namespace + User namespace Ethernet packets Unprivileged socket syscalls
  • 6. • Usernetes: Rootless Kubernetes, since 2018 https://github.com/rootless-containers/usernetes • As old as Rootless Docker (pre-release at that time) and Rootless Podman • The changes to upstream was merged in Kubernetes v1.22 (2021) – Feature gate: KubeletInUserNamespace (Alpha) – The feature gate is also used by kind, minikube, k3s, etc. • The first generation (“Gen1”, 2018-2023) of Usernetes didn’t gain much popularity due to its complexity (”The Hard Way”) 6 Rootless Kubernetes
  • 7. 7 KubeletInUserNamespace feature gate • The gate is slightly misnomer; as it requires CRI, OCI, CNI, and kube-proxy to be in the same UserNS too • Quite “boring” gate to allow trivial permission errors https://github.com/search?q=repo%3Akubernetes%2Fkubernetes%20KubeletInUserNamespace&type=code – dmesg – sysctl -w vm.overcommit_memory – etc. • The UserNS has to be created by an external runtime – Usernetes Gen1: RootlessKit – Usernetes Gen2: Rootless Docker/Podman/nerdctl – LXD/Incus can be used too
  • 8. Gen 1 (2018-2023) Gen 2 (2023-) Host dependency RootlessKit Rootless Docker, Rootless Podman, or Rootless nerdctl (contaiNERD CTL) Supports kubeadm No Yes Supports multi-node Yes, but practically No, due to complexity Yes Supports hostPath volumes Yes Yes, for most paths, but needs an extra config 8 Usernetes Gen 1 vs Gen 2 ”The Hard Way” Similar to `kind` and minikube, but supports real multi-node
  • 9. 9 File layout • Makefile – Defines targets like make up to wrap docker compose up, etc. • Dockerfile – FROM kindest/node (kind’s node image) with a few additional ADD and RUN • docker-compose.yaml – Just defines a single node container – Currently, node ports, etc. have to be statically defined here • kubeadm-config.yaml – Configures feature gates, CIDRs, TLS SANs, etc. Everything is just a plain text file, for ease of customization
  • 10. 10 Usage # Bootstrap the first node make up make kubeadm-init make install-flannel # Enable kubectl make kubeconfig export KUBECONFIG=$(pwd)/kubeconfig kubectl get pods -A # Multi-node make join-command scp join-command another-host:~/usernetes ssh another-host make -C ~/usernetes up kubeadm-join make sync-external-ip
  • 11. 11 Multi-node Network • VXLAN is known to work – Just kubectl -f kube-flannel.yaml • “External IP” is used, as the containerized kubelet’s IP is not accessible from other nodes – kubelet is launched with --cloud-provider=external – node.status.addresses is dynamically patched with kubectl patch node – node is also annotated with flannel.alpha.coreos.com/public-ip-overwrite – UDP checksums are recomputed with ethtool -K flannel.1 tx-checksum-ip-generic off
  • 12. 12 Experimental: network acceleration • Bypass4netns allows bypassing slirp4netns to eliminate the overhead caused by the usermode TCP/IP https://github.com/rootless-containers/bypass4netns • Captures socket syscalls inside the NetNS, reconstructs the FDs outside the NetNS, and replaces the FDs inside the NetNS, using seccomp_unotify(2) • As fast as the host network (e.g., 1.28 Gbps vs 49.9 Gbps) • Bypass4netns supports both connect(2) and bind(2), but Usernetes only supports accelerating connect(2) currently – bind(2) is already fast anyway • Available for nerdctl
  • 13. 13 Experimental: network acceleration • Pod-to-Pod communications across multiple nodes are not accelerated yet – VXLAN packets are generated by the kernel itself and cannot be intercepted via seccomp_unotify(2) – NodePorts can be still accelerated, as it does not incur VXLAN packets Node Pod Pod Node NodePort Pod Internet VXLAN Fast Slow
  • 14. • iperf3 (TCP) benchmark across multiple nodes 14 Experimental: network acceleration slirp4netns bypass4netns Pod → Pod (same node) 37.6 Gbps 37.6 Gbps Pod → Pod (different node) 1.40 Gbps 1.41 Gbps Pod → NodePort (same node) 1.28 Gbps 49.9 Gbps Pod → NodePort (different node) 1.47 Gbps 9.53 Gbps Host → NodePort (same node) 50.2 Gbps 49.4 Gbps Host → NodePort (different node) 9.53 Gbps 9.52 Gbps IaaS: Amazon EC2 (m7i.2xlarge) Versions: Ubuntu 22.04, nerdctl v2.0.0-beta.4, bypass4netns v0.4.1, Usernetes Gen2-v20240410.0 (Kubernetes v1.29) https://github.com/rootless-containers/usernetes/discussions/329
  • 15. 15 Future works • Integrate bypass4netns into Docker and Podman too • Support accelerating Pod-to-Pod communications across different nodes, perhaps with a sidecar proxy that would forward packets to NodePorts • Support dynamic port forwarding – Ports are currently statically defined in docker-compose.yaml – If docker container update could support modifying port forwards, Usernetes coud just watch Kubernetes service events and update the Docker ports accordingly https://github.com/docker/cli/issues/5013 • Help other Kubernetes distributions to support rootless – k3s has been supporting rootless since 2019, but still lacks support for multi-node setup – Are Podman folks interested in running OKD inside Rootless Podman?