SlideShare a Scribd company logo
Akihiro Suda (containerd / NTT)
Rootless Containers 2020
Akihiro Suda (containerd / NTT)
Rootless Containers 2020
Ask me questions at
#2-kubecon-maintainer ( https://slack.cncf.io )
What is Rootless Containers?
• Running container runtimes (and also containers, of course) as a non-
root user on the host
• OCI (e.g. runc)
• CRI (e.g. containerd)
• CNI (e.g. Flannel)
• kubelet, dockerd, …
• Protects the host from potential vulnerabilities and misconfigurations
3
What is Rootless Containers?
Don’t be confused… The following stuffs are unrelated:
• .spec.securityContext.runAsUser (≈ docker run --user)
• UserNS KEP (≈ dockerd --userns-remap)
• usermod -aG docker foo
• Singularity with SETUID
4
Why do we need Rootless?
Most runtimes are designed to be secure by default, but they are still
likely to have vulnerabilities
Identifier Component Description
CVE-2017-1002102 kubelet Files on the host could be removed
containerd#2001 (2018) containerd /tmp on the host could be removed
CVE-2018-11235 kubelet Arbitrary command could be executed on the host
runc#1962 (2019) runc Bare procfs was exposed with non-pivot rootfs mode
CVE-2019-5736 runc runc binary could be replaced with a malicious file
CVE-2019-11245 kubelet An image could be executed with an unexpected UID
CVE-2019-14271 dockerd A malicious NSS library could be loaded
… … …
And more!
5
Why do we need Rootless?
• People often make misconfigurations L
• Sets up insufficient PodSecurityPolicy / Gatekeeper policies
• Exposes system components’ TCP ports without mTLS
(e.g. etcd, kube-apiserver, kubelet, dockerd…)
• Exposes private keys as IaaS metadata (169.254.169.254)
• Uses same kubelet certs for all the nodes
• …
6
Why do we need Rootless?
• Rootless Containers can mitigate the impacts of such vulnerabilities
and misconfiguration
• Even if the host gets compromised, the attacker won’t be able to:
• access files owned by other users
• modify firmware and kernel (→ undetectable malware)
• ARP spoofing (→ DNS spoofing)
7
Not a panacea, of course…
Not effective against:
• Vulnerabilities of kernel and hardware
• DDoS attacks
• Cryptomining …
8
Not a panacea, of course…
Some caveats apply
• Network throughput is slowed down
(But we are seeing HUGE improvements in 2020)
• No support for NFS and block storages
(But it doesn’t matter if you use managed DBs and object storages)
9
History
It began in c. 2012… But wasn’t popular until 2018-2019
Year Low layers High layers
2012 Kernel [officially in 2013]
2013 Semi-privileged networking with
SETUID
LXC
2014
2015
2016 runc [officially in 2017]
2017
10
History
It began in c. 2012… But wasn’t popular until 2018-2019
Year Low layers High layers
2018
Unprivileged networking (slirp4netns)
Unprivileged FUSE-OverlayFS
BuildKit, based on containerd tech
Docker [officially in 2019] & containerd
Podman & CRI-O
Kubernetes [unofficial, still]
2019 Unprivileged cgroup v2 via systemd
Faster port forwarding (RootlessKit)
k3s
2020 Faster networking with seccomp addfd
2021+ Kubernetes, officially?
11
• https://get.docker.com/rootless
• Rootless mode was experimental in v19.03, will be GA in v20.10
• Other notables updates in v20.10 w.r.t. Rootless:
• Resource limitation with Cgroup v2
• FUSE-OverlayFS
• Improved installer
Example: Docker
12
Easy to install
Example: Docker
13
$ curl -fsSL https://get.docker.com/rootless | sh ⏎
$ export DOCKER_HOST=unix:///run/user/1000/docker.sock ⏎
$ docker run -d --name caddy -p 8080:80 caddy ⏎
$ curl http://localhost:8080 ⏎
...
<title>Caddy works!</title>
...
All processes are running as a non-root user
Example: Docker
14
$ pstree user ⏎
sshd───bash───pstree
systemd─┬─(sd-pam)
├─containerd-shim─┬─caddy───7*[{caddy}]
│ └─12*[{containerd-shim}]
└─rootlesskit─┬─exe─┬─dockerd─┬─containerd───10*[{containerd}]
│ │ ├─rootlesskit-doc─┬─docker-proxy───6*[{docker-proxy}]
│ │ │ └─6*[{rootlesskit-doc}]
│ │ └─11*[{dockerd}]
│ └─11*[{exe}]
├─vpnkit───4*[{vpnkit}]
└─8*[{rootlesskit}]
• https://github.com/rootless-containers/usernetes
• Rootless Kubernetes distribution
• Multi-node demo is provided as a Docker Compose stack
• CNI: Flannel (VXLAN)
Example: Usernetes
15
$ docker-compose up –d ⏎
$ kubectl get nodes ⏎
NAME STATUS ROLES AGE VERSION
node-containerd Ready <none> 3m46s v1.19.0-usernetes
node-crio Ready <none> 3m46s v1.19.0-usernetes
Example: Usernetes
16
$ docker exec usernetes_node-containerd_1 pstree user ⏎
journalctl---(sd-pam)
systemd-+-(sd-pam)
|-containerd-fuse---containerd-fuse---4*[{containerd-fuse}]
|-containerd.sh---containerd---10*[{containerd}]
|-flanneld.sh---flanneld---9*[{flanneld}]
|-nsenter.sh---kubelet---13*[{kubelet}]
|-nsenter.sh---kube-proxy---7*[{kube-proxy}]
`-rootlesskit.sh---rootlesskit-+-exe-+-rootlesskit.sh---sleep
| `-9*[{exe}]
|-slirp4netns
`-8*[{rootlesskit}]
Example: k3s
17
$ k3s server --rootless ⏎
$ k3s kubectl apply –f manifest.yaml ⏎
• https://k3s.io/
• CNCF Sandbox Project
• Focuses on edge computing
• Incorporates Usernetes patches for supporting rootless, ahead of the
Kubernetes upstream
• Uses containerd as the CRI runtime
Example: BuildKit
18
• https://github.com/moby/buildkit
• A container image builder, built on containerd technology
• Can be executed in several ways
• As a built-in feature of dockerd
• As a standalone daemon
• As a Kubernetes Pod
• As a Kubernetes Job, without a daemon Pod
• As a Tekton Task
No need to set securityContext.Privileged
But Seccomp and AppArmor constraints need to be relaxed
Example: BuildKit
19
spec:
containers:
- securityContext:
runAsUser: 1000
seccompProfile:
type: Unconfined
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
How it works
• UserNS
• MountNS
• NetNS
• Cgroup
• New frontier: Seccomp User Notification
20
• Maps a non-root user (e.g. UID 1000) to a fake root user (UID 0)
• Not the real root, but enough to run containers
• Subordinate UIDs are mapped as well
( typically 65,536 UIDs, defined in /etc/subuid )
How it works: UserNS
21
Host
UserNS
0 1 65536
0 1000 100000 165535 232
How it works: MountNS
• A non-root user can create MontNS along with UserNS
• But cannot mount most filesystems, except bind-mount, tmpfs, procfs,
and sysfs...
• No Overlayfs (on vanilla kernel)
• No NFS
• No block storages
• FUSE is supported since kernel 4.18
• FUSE-OverlayFS can substitute real OverlayFS
22
• A non-root user can also create NetNS with UserNS
• But cannot create vEth pairs, i.e. No internet connectivity
• Slirp is used instead of vEth for unprivileged internet connectivity
• Slow (51.5Gbps → 9.21Gbps), but we are seeing huge improvements
NetNS
How it works: NetNS
23
TAP Kernelslirp4netns
Ethernet
packets
Socket
syscalls
How it works: Cgroup
• No support for cgroup v1
• i.e. no memory limit, no CPU limit, no fork-bomb guard...
• Cgroup v2 is almost fully supported
• Fedora has already switched the default to v2
• Other distros will follow in 2021-2022 ?
24
A new frontier in 2020:
Seccomp User Notification
• Kernel 5.0 merged the support for Seccomp User Notification: a new
way to hook syscalls in the userspace
• Similar to ptrace, but less numbers of context switches
• Allows emulating subordinate UIDs without /etc/subuid
• POC: https://github.com/rootless-containers/subuidless
25
A new frontier in 2020:
Seccomp User Notification
• Kernel 5.9 merged the support for SECCOMP_IOCTL_NOTIF_ADDFD
• Allows injecting file descriptors from a host process into container
processes
• e.g. replace sockfd on connect(2)
• No slirp overhead any more
• POC: https://github.com/rootless-containers/bypass4netns
26
Recap
• Rootless Containers can protect the host from potential vulnerabilities
and misconfigurations
• Already adopted by lots of projects: BuildKit, Docker, containerd,
Podman, CRI-O, k3s ...
• Being also proposed to the Kubernetes upstream
• There are some drawbacks, but being significantly improved using
Seccomp User Notification
27
Resources
• Rootless Containers overview: https://rootlesscontaine.rs/
• Rootless containerd:
https://github.com/containerd/containerd/blob/master/docs/rootless.md
• Rootless Docker: https://get.docker.com/rootless
• Usernetes: https://github.com/rootless-containers/usernetes
• Rootless KEP: https://github.com/kubernetes/enhancements/pull/1371
28
Questions?
• Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io )
29
[KubeCon NA 2020] containerd: Rootless Containers 2020

More Related Content

What's hot (20)

PDF
DockerとKubernetesをかけめぐる
Kohei Tokunaga
 
PDF
10分でわかる Cilium と XDP / BPF
Shuji Yamada
 
PDF
TripleOの光と闇
Manabu Ori
 
PDF
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
VirtualTech Japan Inc.
 
PDF
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Preferred Networks
 
PDF
Kuberneteの運用を支えるGitOps
shunki fujiwara
 
PDF
乗っ取れコンテナ!!開発者から見たコンテナセキュリティの考え方(CloudNative Days Tokyo 2021 発表資料)
NTT DATA Technology & Innovation
 
PDF
ML2/OVN アーキテクチャ概観
Yamato Tanaka
 
PDF
GPU仮想化最前線 - KVMGTとvirtio-gpu -
zgock
 
PDF
containerdの概要と最近の機能
Kohei Tokunaga
 
PDF
PostgreSQL 15の新機能を徹底解説
Masahiko Sawada
 
PDF
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
NTT DATA Technology & Innovation
 
PDF
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
PDF
BuildKitの概要と最近の機能
Kohei Tokunaga
 
PPTX
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
NTT DATA Technology & Innovation
 
PPTX
どうやって決める?kubernetesでのシークレット管理方法(Cloud Native Days 2020 発表資料)
NTT DATA Technology & Innovation
 
PDF
OpenStack勉強会
Yuki Obara
 
PPTX
Paxos
nobu_k
 
PPTX
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
NTT DATA Technology & Innovation
 
PDF
eStargzイメージとlazy pullingによる高速なコンテナ起動
Kohei Tokunaga
 
DockerとKubernetesをかけめぐる
Kohei Tokunaga
 
10分でわかる Cilium と XDP / BPF
Shuji Yamada
 
TripleOの光と闇
Manabu Ori
 
知っているようで知らないNeutron -仮想ルータの冗長と分散- - OpenStack最新情報セミナー 2016年3月
VirtualTech Japan Inc.
 
続・PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜 #2
Preferred Networks
 
Kuberneteの運用を支えるGitOps
shunki fujiwara
 
乗っ取れコンテナ!!開発者から見たコンテナセキュリティの考え方(CloudNative Days Tokyo 2021 発表資料)
NTT DATA Technology & Innovation
 
ML2/OVN アーキテクチャ概観
Yamato Tanaka
 
GPU仮想化最前線 - KVMGTとvirtio-gpu -
zgock
 
containerdの概要と最近の機能
Kohei Tokunaga
 
PostgreSQL 15の新機能を徹底解説
Masahiko Sawada
 
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
NTT DATA Technology & Innovation
 
KubeCon + CloudNativeCon Europe 2022 Recap - Batch/HPCの潮流とScheduler拡張事例 / Kub...
Preferred Networks
 
BuildKitの概要と最近の機能
Kohei Tokunaga
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
NTT DATA Technology & Innovation
 
どうやって決める?kubernetesでのシークレット管理方法(Cloud Native Days 2020 発表資料)
NTT DATA Technology & Innovation
 
OpenStack勉強会
Yuki Obara
 
Paxos
nobu_k
 
Kubernetes環境に対する性能試験(Kubernetes Novice Tokyo #2 発表資料)
NTT DATA Technology & Innovation
 
eStargzイメージとlazy pullingによる高速なコンテナ起動
Kohei Tokunaga
 

Similar to [KubeCon NA 2020] containerd: Rootless Containers 2020 (20)

PDF
Rootless Containers & Unresolved issues
Akihiro Suda
 
PDF
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Akihiro Suda
 
PDF
Rootless Containers
Akihiro Suda
 
PDF
Docker and coreos20141020b
Richard Kuo
 
PDF
[Podman Special Event] Kubernetes in Rootless Podman
Akihiro Suda
 
PDF
The State of Rootless Containers
Akihiro Suda
 
PDF
Kubernetes
Linjith Kunnon
 
PDF
DCSF19 Hardening Docker daemon with Rootless mode
Docker, Inc.
 
PDF
[DockerCon 2019] Hardening Docker daemon with Rootless mode
Akihiro Suda
 
PDF
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
PDF
Podman rootless containers
Giuseppe Scrivano
 
PDF
The internals and the latest trends of container runtimes
Akihiro Suda
 
PPTX
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
PDF
Docker 0.11 at MaxCDN meetup in Los Angeles
Jérôme Petazzoni
 
PPTX
Introducing Container Technology to TSUBAME3.0 Supercomputer
Akihiro Nomura
 
PDF
Containers > VMs
David Timothy Strauss
 
PDF
Introduction to Docker at the Azure Meet-up in New York
Jérôme Petazzoni
 
PDF
Docker_AGH_v0.1.3
Witold 'Ficio' Kopel
 
PPTX
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
PDF
A Gentle Introduction to Docker and Containers
Docker, Inc.
 
Rootless Containers & Unresolved issues
Akihiro Suda
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Akihiro Suda
 
Rootless Containers
Akihiro Suda
 
Docker and coreos20141020b
Richard Kuo
 
[Podman Special Event] Kubernetes in Rootless Podman
Akihiro Suda
 
The State of Rootless Containers
Akihiro Suda
 
Kubernetes
Linjith Kunnon
 
DCSF19 Hardening Docker daemon with Rootless mode
Docker, Inc.
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
Akihiro Suda
 
4. CNCF kubernetes Comparison of-existing-cni-plugins-for-kubernetes
Juraj Hantak
 
Podman rootless containers
Giuseppe Scrivano
 
The internals and the latest trends of container runtimes
Akihiro Suda
 
Comparison of existing cni plugins for kubernetes
Adam Hamsik
 
Docker 0.11 at MaxCDN meetup in Los Angeles
Jérôme Petazzoni
 
Introducing Container Technology to TSUBAME3.0 Supercomputer
Akihiro Nomura
 
Containers > VMs
David Timothy Strauss
 
Introduction to Docker at the Azure Meet-up in New York
Jérôme Petazzoni
 
Docker_AGH_v0.1.3
Witold 'Ficio' Kopel
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
A Gentle Introduction to Docker and Containers
Docker, Inc.
 
Ad

More from Akihiro Suda (20)

PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
PDF
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
PDF
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
PDF
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
PDF
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
20240201 [HPC Containers] Rootless Containers.pdf
Akihiro Suda
 
PDF
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
PDF
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
PDF
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
PDF
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
PDF
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
PDF
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
PDF
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
20240201 [HPC Containers] Rootless Containers.pdf
Akihiro Suda
 
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
Ad

Recently uploaded (20)

PDF
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
PDF
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PDF
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PDF
What companies do with Pharo (ESUG 2025)
ESUG
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
PPTX
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Why Are More Businesses Choosing Partners Over Freelancers for Salesforce.pdf
Cymetrix Software
 
System Center 2025 vs. 2022; What’s new, what’s next_PDF.pdf
Q-Advise
 
Activate_Methodology_Summary presentatio
annapureddyn
 
advancepresentationskillshdhdhhdhdhdhhfhf
jasmenrojas249
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Using licensed Data Loss Prevention (DLP) as a strategic proactive data secur...
Q-Advise
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
What companies do with Pharo (ESUG 2025)
ESUG
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Presentation about variables and constant.pptx
kr2589474
 
SAP GUI Installation Guide for macOS (iOS) | Connect to SAP Systems on Mac
SAP Vista, an A L T Z E N Company
 
Employee salary prediction using Machine learning Project template.ppt
bhanuk27082004
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 

[KubeCon NA 2020] containerd: Rootless Containers 2020

  • 1. Akihiro Suda (containerd / NTT) Rootless Containers 2020
  • 2. Akihiro Suda (containerd / NTT) Rootless Containers 2020 Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io )
  • 3. What is Rootless Containers? • Running container runtimes (and also containers, of course) as a non- root user on the host • OCI (e.g. runc) • CRI (e.g. containerd) • CNI (e.g. Flannel) • kubelet, dockerd, … • Protects the host from potential vulnerabilities and misconfigurations 3
  • 4. What is Rootless Containers? Don’t be confused… The following stuffs are unrelated: • .spec.securityContext.runAsUser (≈ docker run --user) • UserNS KEP (≈ dockerd --userns-remap) • usermod -aG docker foo • Singularity with SETUID 4
  • 5. Why do we need Rootless? Most runtimes are designed to be secure by default, but they are still likely to have vulnerabilities Identifier Component Description CVE-2017-1002102 kubelet Files on the host could be removed containerd#2001 (2018) containerd /tmp on the host could be removed CVE-2018-11235 kubelet Arbitrary command could be executed on the host runc#1962 (2019) runc Bare procfs was exposed with non-pivot rootfs mode CVE-2019-5736 runc runc binary could be replaced with a malicious file CVE-2019-11245 kubelet An image could be executed with an unexpected UID CVE-2019-14271 dockerd A malicious NSS library could be loaded … … … And more! 5
  • 6. Why do we need Rootless? • People often make misconfigurations L • Sets up insufficient PodSecurityPolicy / Gatekeeper policies • Exposes system components’ TCP ports without mTLS (e.g. etcd, kube-apiserver, kubelet, dockerd…) • Exposes private keys as IaaS metadata (169.254.169.254) • Uses same kubelet certs for all the nodes • … 6
  • 7. Why do we need Rootless? • Rootless Containers can mitigate the impacts of such vulnerabilities and misconfiguration • Even if the host gets compromised, the attacker won’t be able to: • access files owned by other users • modify firmware and kernel (→ undetectable malware) • ARP spoofing (→ DNS spoofing) 7
  • 8. Not a panacea, of course… Not effective against: • Vulnerabilities of kernel and hardware • DDoS attacks • Cryptomining … 8
  • 9. Not a panacea, of course… Some caveats apply • Network throughput is slowed down (But we are seeing HUGE improvements in 2020) • No support for NFS and block storages (But it doesn’t matter if you use managed DBs and object storages) 9
  • 10. History It began in c. 2012… But wasn’t popular until 2018-2019 Year Low layers High layers 2012 Kernel [officially in 2013] 2013 Semi-privileged networking with SETUID LXC 2014 2015 2016 runc [officially in 2017] 2017 10
  • 11. History It began in c. 2012… But wasn’t popular until 2018-2019 Year Low layers High layers 2018 Unprivileged networking (slirp4netns) Unprivileged FUSE-OverlayFS BuildKit, based on containerd tech Docker [officially in 2019] & containerd Podman & CRI-O Kubernetes [unofficial, still] 2019 Unprivileged cgroup v2 via systemd Faster port forwarding (RootlessKit) k3s 2020 Faster networking with seccomp addfd 2021+ Kubernetes, officially? 11
  • 12. • https://get.docker.com/rootless • Rootless mode was experimental in v19.03, will be GA in v20.10 • Other notables updates in v20.10 w.r.t. Rootless: • Resource limitation with Cgroup v2 • FUSE-OverlayFS • Improved installer Example: Docker 12
  • 13. Easy to install Example: Docker 13 $ curl -fsSL https://get.docker.com/rootless | sh ⏎ $ export DOCKER_HOST=unix:///run/user/1000/docker.sock ⏎ $ docker run -d --name caddy -p 8080:80 caddy ⏎ $ curl http://localhost:8080 ⏎ ... <title>Caddy works!</title> ...
  • 14. All processes are running as a non-root user Example: Docker 14 $ pstree user ⏎ sshd───bash───pstree systemd─┬─(sd-pam) ├─containerd-shim─┬─caddy───7*[{caddy}] │ └─12*[{containerd-shim}] └─rootlesskit─┬─exe─┬─dockerd─┬─containerd───10*[{containerd}] │ │ ├─rootlesskit-doc─┬─docker-proxy───6*[{docker-proxy}] │ │ │ └─6*[{rootlesskit-doc}] │ │ └─11*[{dockerd}] │ └─11*[{exe}] ├─vpnkit───4*[{vpnkit}] └─8*[{rootlesskit}]
  • 15. • https://github.com/rootless-containers/usernetes • Rootless Kubernetes distribution • Multi-node demo is provided as a Docker Compose stack • CNI: Flannel (VXLAN) Example: Usernetes 15 $ docker-compose up –d ⏎ $ kubectl get nodes ⏎ NAME STATUS ROLES AGE VERSION node-containerd Ready <none> 3m46s v1.19.0-usernetes node-crio Ready <none> 3m46s v1.19.0-usernetes
  • 16. Example: Usernetes 16 $ docker exec usernetes_node-containerd_1 pstree user ⏎ journalctl---(sd-pam) systemd-+-(sd-pam) |-containerd-fuse---containerd-fuse---4*[{containerd-fuse}] |-containerd.sh---containerd---10*[{containerd}] |-flanneld.sh---flanneld---9*[{flanneld}] |-nsenter.sh---kubelet---13*[{kubelet}] |-nsenter.sh---kube-proxy---7*[{kube-proxy}] `-rootlesskit.sh---rootlesskit-+-exe-+-rootlesskit.sh---sleep | `-9*[{exe}] |-slirp4netns `-8*[{rootlesskit}]
  • 17. Example: k3s 17 $ k3s server --rootless ⏎ $ k3s kubectl apply –f manifest.yaml ⏎ • https://k3s.io/ • CNCF Sandbox Project • Focuses on edge computing • Incorporates Usernetes patches for supporting rootless, ahead of the Kubernetes upstream • Uses containerd as the CRI runtime
  • 18. Example: BuildKit 18 • https://github.com/moby/buildkit • A container image builder, built on containerd technology • Can be executed in several ways • As a built-in feature of dockerd • As a standalone daemon • As a Kubernetes Pod • As a Kubernetes Job, without a daemon Pod • As a Tekton Task
  • 19. No need to set securityContext.Privileged But Seccomp and AppArmor constraints need to be relaxed Example: BuildKit 19 spec: containers: - securityContext: runAsUser: 1000 seccompProfile: type: Unconfined metadata: annotations: container.apparmor.security.beta.kubernetes.io/buildkitd: unconfined
  • 20. How it works • UserNS • MountNS • NetNS • Cgroup • New frontier: Seccomp User Notification 20
  • 21. • Maps a non-root user (e.g. UID 1000) to a fake root user (UID 0) • Not the real root, but enough to run containers • Subordinate UIDs are mapped as well ( typically 65,536 UIDs, defined in /etc/subuid ) How it works: UserNS 21 Host UserNS 0 1 65536 0 1000 100000 165535 232
  • 22. How it works: MountNS • A non-root user can create MontNS along with UserNS • But cannot mount most filesystems, except bind-mount, tmpfs, procfs, and sysfs... • No Overlayfs (on vanilla kernel) • No NFS • No block storages • FUSE is supported since kernel 4.18 • FUSE-OverlayFS can substitute real OverlayFS 22
  • 23. • A non-root user can also create NetNS with UserNS • But cannot create vEth pairs, i.e. No internet connectivity • Slirp is used instead of vEth for unprivileged internet connectivity • Slow (51.5Gbps → 9.21Gbps), but we are seeing huge improvements NetNS How it works: NetNS 23 TAP Kernelslirp4netns Ethernet packets Socket syscalls
  • 24. How it works: Cgroup • No support for cgroup v1 • i.e. no memory limit, no CPU limit, no fork-bomb guard... • Cgroup v2 is almost fully supported • Fedora has already switched the default to v2 • Other distros will follow in 2021-2022 ? 24
  • 25. A new frontier in 2020: Seccomp User Notification • Kernel 5.0 merged the support for Seccomp User Notification: a new way to hook syscalls in the userspace • Similar to ptrace, but less numbers of context switches • Allows emulating subordinate UIDs without /etc/subuid • POC: https://github.com/rootless-containers/subuidless 25
  • 26. A new frontier in 2020: Seccomp User Notification • Kernel 5.9 merged the support for SECCOMP_IOCTL_NOTIF_ADDFD • Allows injecting file descriptors from a host process into container processes • e.g. replace sockfd on connect(2) • No slirp overhead any more • POC: https://github.com/rootless-containers/bypass4netns 26
  • 27. Recap • Rootless Containers can protect the host from potential vulnerabilities and misconfigurations • Already adopted by lots of projects: BuildKit, Docker, containerd, Podman, CRI-O, k3s ... • Being also proposed to the Kubernetes upstream • There are some drawbacks, but being significantly improved using Seccomp User Notification 27
  • 28. Resources • Rootless Containers overview: https://rootlesscontaine.rs/ • Rootless containerd: https://github.com/containerd/containerd/blob/master/docs/rootless.md • Rootless Docker: https://get.docker.com/rootless • Usernetes: https://github.com/rootless-containers/usernetes • Rootless KEP: https://github.com/kubernetes/enhancements/pull/1371 28
  • 29. Questions? • Ask me questions at #2-kubecon-maintainer ( https://slack.cncf.io ) 29