SlideShare a Scribd company logo
containerd Deep Dive
Akihiro Suda (NTT) & Wei Fu (Alibaba Cloud)
KubeCon EU 2020 Virtual (Aug 19, 2020)
containerd overview
Akihiro Suda (NTT)
What is ?
● “Mid-level Container runtime”
○ Below platforms (Docker, Kubernetes)
○ Above lower level runtimes (runc)
● Resource Manager
○ Container processes
○ Image artifacts
○ Filesystem snapshots
○ Metadata and dependencies
● CNCF graduated project since February 2019
○ Following Kubernetes, Prometheus, Envoy,
and CoreDNS
[KubeCon EU 2020] containerd Deep Dive
Highly customizable
● Runtime plugins
○ Runc, gVisor, Kata, Firecracker...
● Snapshotter plugins
○ OverlayFS, BtrFS, ZFS, …
● Content store plugins
○ Local, IPFS...
● Stream processor plugins
○ ImgCrypt, zstd...
Adoption of containerd
● Container engines
● Kubernetes distributions
● Managed Kubernetes Services
Docker & Moby k3c PouchContainer
k3s kubespray microk8s
Alibaba ACK
Amazon EKS
(Fargate nodes)
Azure AKS
Google GKE IBM IKS
kind minikube
Charmed
Kubernetes
And more...
Adoption of containerd
● BuildKit
○ The modern implementation of `docker build`
● LinuxKit
○ Small Linux distro with containerd as the init
● Faasd
○ OpenFaaS for containerd
● VMware Fusion Nautilus
○ containerd on macOS, using VMware as the runtime plugin
Upcoming features in v1.4
Akihiro Suda (NTT)
Lazy pulling of images
● Run containers before completion of downloading the images
● Use cases:
○ Python/Ruby/Java/dotNET images
○ FaaS
○ Web apps with huge amount of HTML templates and media files
○ Jupyter Notebooks with big data samples included
○ Full GNOME/KDE desktop
Lazy pulling of images: Stargz & eStargz
● The containerd snapshotter plugin for Stargz & eStargz
https://github.com/containerd/stargz-snapshotter
● Stargz: seekable tar.gz for lazy-pullable container images
● eStargz: extended Stargz for batching frequently used files
● Both are fully compatible with legacy OCI tar.gz
Lazy pulling of images: Stargz & eStargz
Metadata 0
File 0
Metadata 1
File 1
Metadata {n-1}
File {n-1}
Footer
...
gzip
legacy tar.gz Stargz
Metadata 0
File 0
gzip
Metadata 1
File 1
gzip
...
Metadata {n-1}
File {n-1}
gzip
gzip
Footer
Metadata
stargz.index.json
Can’t inspect file offsets without
reading the whole archive
Can inspect the file offsets
immediately
Lazy pulling of images: Stargz & eStargz
● eStargz profiles the actual file access pattern and reorders the file entries,
so that relevant files can be prefetched in a single HTTP request
/usr/bin/apt-get
/bin/ls
/bin/vi
/lib/libc.so
/lib/libjpeg.so
/usr/bin/python3
.../usr/lib/python3/.../foo
/usr/lib/python3/.../bar
/app.py
/bin/ls
/app.py
/usr/bin/python3
/lib/libc.so
/usr/lib/python3/.../foo
/usr/lib/python3/.../bar
.../bin/vi
/lib/libjpeg.so
/usr/bin/apt-get
Stargz eStargz
Lazy pulling of images: Stargz & eStargz
Lazy pulling of images: Stargz & eStargz
Yesterday’s talk
https://sched.co/ZepQ
Support for SELinux MCS on CRI mode
● MCS: multi-category security
Containers
Volumes
UID=0
C42
UID=0
C42
UID=0
C43
UID=0
C43
Support for cgroup v2
● The new cgroup hierarchy, adopted by Fedora (since 31)
● Simpler layout
○ V1: /sys/fs/cgroup/{memory,cpu,devices,pids….}/foo
○ V2: /sys/fs/cgroup/foo
● Supports eBPF integration, pressure metrics, improved OOM control...
● Friendly to non-root users
Improved support for rootless mode
● Run containerd (and relevant components) as a non-root user
● Protect the host from potential vulnerabilities
● Adoption in containerd-related projects
○ Docker
○ BuildKit
○ k3s
○ k3c (on plan)
○ Kubernetes (on proposal, KEP 1371)
Improved support for rootless mode
● [v1.3] No support for resource limitation (docker run --cpus … --memory ...)
○ Because unprivileged users cannot control cgroups
● [v1.3] No support for overlayfs snapshotter
○ Because unprivileged users cannot mount overlayfs
(except on Ubuntu/Debian kernels)
○ “Native” snapshotter can be used, but slow and wastes the disk
Improved support for rootless mode
● [v1.3] No support for resource limitation (docker run --cpus … --memory ...)
○ Because unprivileged users cannot control cgroups
● [v1.3] No support for overlayfs snapshotter
○ Because unprivileged users cannot mount overlayfs
(except on Ubuntu/Debian kernels)
○ “Native” snapshotter can be used, but slow and wastes the disk
→ v1.4 supports resource limitation
(requires cgroup v2 and systemd)
→ v1.4 supports FUSE-OverlayFS snapshotter
(requires kernel >= 4.18)
Demo: Rootless Kubernetes with Cgroup v2
“Usernetes” https://github.com/rootless-containers/usernetes
https://asciinema.org/a/349859
Other changes in v1.4
● Windows CRI
● systemd NOTIFY_SOCKET
● Support reloading CNI config without restarting the daemon
● Socat binary is no longer needed
Release note: https://github.com/containerd/containerd/releases
v1.5 planning
● NRI: Node Resource Interface (#4411)
○ The new common interface for node resources such as cgroup
○ The plugin spec is very similar to CNI
● Sandbox API (#4131)
○ Pod sandbox as a first-class object
○ No “/pause” process
● Filesystem quota (#759)
containerd: external plugins
Wei Fu (Alibaba Cloud)
[KubeCon EU 2020] containerd Deep Dive
Backend as external plugins
● Big goal - no re-compilation required!!!
● Stream processors
● gRPC proxy plugin for image storage
● RuntimeV2 proto for OCI Runtime
Stream processor
● OCI Image layer data packaged in tar archive
● OCI image spec only supports few compression algorithms
○ +gzip/+zstd, but +gzip is more common
● How to handle experimental media-type stream?
○ Or encryption purpose?
Image
Layer
Snapshot
Tar Stream
Processor
Diff Service
+gzip
Custom?
Stream processor
● Stream processor(SP) is binary plugin handling media-type stream
○ Accepts customize media-types, returns other one
○ Call binary for media-type converter
● Example
○ containerd/imgcrypt
Image
Layer
Snapshot
Tar
SP
Diff Service
Tar+Gzip
SP
Tar(+Gzip)+encrypted
SP
Other Customize SP
Stream processor - Demo
● Integrate with +zstd media-type
● asciinema link
[stream_processors]
[stream_processors."zstd"]
accepts = ["application/vnd.oci.image.layer.v1.tar+zstd"]
returns = "application/vnd.oci.image.layer.v1.tar"
path = "zstd"
args = ["-dcf"]
Snapshot proxy plugin
// Snapshot service manages snapshots
service Snapshots {
rpc Prepare(PrepareSnapshotRequest) returns (PrepareSnapshotResponse);
rpc View(ViewSnapshotRequest) returns (ViewSnapshotResponse);
rpc Mounts(MountsRequest) returns (MountsResponse);
rpc Commit(CommitSnapshotRequest) returns (google.protobuf.Empty);
rpc Remove(RemoveSnapshotRequest) returns (google.protobuf.Empty);
rpc Stat(StatSnapshotRequest) returns (StatSnapshotResponse);
rpc Update(UpdateSnapshotRequest) returns (UpdateSnapshotResponse);
rpc List(ListSnapshotsRequest) returns (stream ListSnapshotsResponse);
rpc Usage(UsageRequest) returns (UsageResponse);
}
Snapshot proxy plugin
package main
import(
"net"
"log"
"github.com/containerd/containerd/api/services/snapshots/v1"
"github.com/containerd/containerd/contrib/snapshotservice"
)
func main() {
rpc := grpc.NewServer()
sn := CustomSnapshotter()
service := snapshotservice.FromSnapshotter(sn)
snapshots.RegisterSnapshotsServer(rpc, service)
// Listen and serve
l, err := net.Listen("unix", "/var/run/mysnapshotter.sock")
if err != nil {
log.Fatalf("error: %vn", err)
}
if err := rpc.Serve(l); err != nil {
log.Fatalf("error: %vn", err)
}
}
● Configure with proxy_plugins
● Example
○ stargz-snapshotter
○ CVMFS Containerd Snapshotter
[proxy_plugins]
[proxy_plugins.customsnapshot]
type = "snapshot"
address = "/var/run/mysnapshotter.sock"
Runtime V2
● A first class shim API for runtime authors to integrate with containerd
○ More VM like runtimes have internal state and more abstract actions
○ A CLI approach introduces issues with state management
○ Each runtimes has its own values, but keep containerd in solid core scope
● Example
○ gVisor
○ KataContainer
○ Firecracker
Runtime V2
service Task {
rpc State(StateRequest) returns (StateResponse);
rpc Create(CreateTaskRequest) returns (CreateTaskResponse);
rpc Start(StartRequest) returns (StartResponse);
rpc Delete(DeleteRequest) returns (DeleteResponse);
rpc Pids(PidsRequest) returns (PidsResponse);
rpc Pause(PauseRequest) returns (google.protobuf.Empty);
rpc Resume(ResumeRequest) returns (google.protobuf.Empty);
rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty);
rpc Kill(KillRequest) returns (google.protobuf.Empty);
rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty);
rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty);
rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty);
rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty);
rpc Wait(WaitRequest) returns (WaitResponse);
rpc Stats(StatsRequest) returns (StatsResponse);
rpc Connect(ConnectRequest) returns (ConnectResponse);
rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty);
}
Runtime V2 - Binary
● Binary naming convention
○ Name io.containerd.runc.v2 --> Binary containerd-shim-runc-v2
■ So both io.containerd.runc.v1 and io.containerd.runc.v2 are runtime V2
■ runc.v2 supports grouping several containers with less resource
■ runc.v2 as CRI plugin’s default runtime
○ Via a runtime binary available in containerd’s PATH
● Required start/delete sub-commands
○ Resources created by container will be cleanup by delete sub-command
Runtime V2 - Logging
● fifo/npipe as default channel
○ Receiver consumes more resources to handle log output.
dockerd
CRI-plugin
containerd shim
kernel
Named Pipe
Runtime V2 - Logging
● fifo/npipe as default channel
○ Receiver consumes more resources to handle log output.
○ And it requires that receiver must be alive!!!
○ Impact running containers if receiver is down too long.
containerd shim
Named Pipe
kernel
Runtime V2 - Logging
● Support pluggable logging via STDIO URIs
○ fifo - Linux (default)
○ npipe - Windows (default)
○ binary - Linux & Windows
○ file - Linux & Windows
schema path:// ?key=valueSTDIO URI
file file :// /var/log/cntr/hi ?maxSize=100MB
binary binary :// /usr/bin/syslog ?addr=192.168.0.3
Thank you

More Related Content

What's hot (20)

PDF
Docker
SangtongPeesing
 
PDF
Docker swarm
Alberto Guimarães Viana
 
PDF
Docker Containers Deep Dive
Will Kinard
 
PPTX
Kubernetes Introduction
Martin Danielsson
 
PDF
Docker Swarm 0.2.0
Docker, Inc.
 
PDF
Kubernetes Application Deployment with Helm - A beginner Guide!
Krishna-Kumar
 
PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
PDF
Kubernetes - A Comprehensive Overview
Bob Killen
 
PDF
Linux Profiling at Netflix
Brendan Gregg
 
PPTX
A brief study on Kubernetes and its components
Ramit Surana
 
PDF
Brief introduction to kselftest
SeongJae Park
 
PDF
Introduction to kubernetes
Gabriel Carro
 
PDF
The Power of GitOps with Flux & GitOps Toolkit
Weaveworks
 
PPTX
Introduction to kubernetes
Rishabh Indoria
 
PPTX
OpenStackユーザ会資料 - Masakari
masahito12
 
PDF
今だからこそ知りたい Docker Compose/Swarm 入門
Masahito Zembutsu
 
PDF
Ansible Introduction
Robert Reiz
 
PPTX
OpenStack Cinder
Deepti Ramakrishna
 
PDF
containerdの概要と最近の機能
Kohei Tokunaga
 
PPTX
Docker and kubernetes_introduction
Jason Hu
 
Docker Containers Deep Dive
Will Kinard
 
Kubernetes Introduction
Martin Danielsson
 
Docker Swarm 0.2.0
Docker, Inc.
 
Kubernetes Application Deployment with Helm - A beginner Guide!
Krishna-Kumar
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
Kubernetes - A Comprehensive Overview
Bob Killen
 
Linux Profiling at Netflix
Brendan Gregg
 
A brief study on Kubernetes and its components
Ramit Surana
 
Brief introduction to kselftest
SeongJae Park
 
Introduction to kubernetes
Gabriel Carro
 
The Power of GitOps with Flux & GitOps Toolkit
Weaveworks
 
Introduction to kubernetes
Rishabh Indoria
 
OpenStackユーザ会資料 - Masakari
masahito12
 
今だからこそ知りたい Docker Compose/Swarm 入門
Masahito Zembutsu
 
Ansible Introduction
Robert Reiz
 
OpenStack Cinder
Deepti Ramakrishna
 
containerdの概要と最近の機能
Kohei Tokunaga
 
Docker and kubernetes_introduction
Jason Hu
 

Similar to [KubeCon EU 2020] containerd Deep Dive (20)

PPTX
containerd the universal container runtime
Docker, Inc.
 
PDF
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
Akihiro Suda
 
PDF
Extended and embedding: containerd update & project use cases
Phil Estes
 
PDF
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
PDF
FOSDEM 2019: A containerd Project Update
Phil Estes
 
PDF
containerd summit - Deep Dive into containerd
Docker, Inc.
 
PDF
The State of containerd
Moby Project
 
PDF
The internals and the latest trends of container runtimes
Akihiro Suda
 
PDF
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
PPTX
The state of containerd
Docker, Inc.
 
PDF
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Kohei Tokunaga
 
PDF
[FOSDEM 2020] Lazy distribution of container images
Akihiro Suda
 
PDF
Containerd Project Update: FOSDEM 2018
Phil Estes
 
PDF
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
PDF
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy
 
PDF
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
dotCloud
 
PDF
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
PDF
Evolution of containers to kubernetes
Krishna-Kumar
 
PPTX
Docker containerd Kubernetes sig node
Patrick Chanezon
 
containerd the universal container runtime
Docker, Inc.
 
20250617 [KubeCon JP 2025] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
[KubeCon EU 2021] Introduction and Deep Dive Into Containerd
Akihiro Suda
 
Extended and embedding: containerd update & project use cases
Phil Estes
 
20241113 [KubeCon NA Pavilion] containerd.pdf
Akihiro Suda
 
FOSDEM 2019: A containerd Project Update
Phil Estes
 
containerd summit - Deep Dive into containerd
Docker, Inc.
 
The State of containerd
Moby Project
 
The internals and the latest trends of container runtimes
Akihiro Suda
 
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
Yandex
 
The state of containerd
Docker, Inc.
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Kohei Tokunaga
 
[FOSDEM 2020] Lazy distribution of container images
Akihiro Suda
 
Containerd Project Update: FOSDEM 2018
Phil Estes
 
20250403 [KubeCon EU] containerd - Project Update and Deep Dive.pdf
Akihiro Suda
 
Navigating container technology for enhanced security by Niklas Saari
Metosin Oy
 
Introduction to Docker at SF Peninsula Software Development Meetup @Guidewire
dotCloud
 
[KubeConEU2023] containerd pavilion
Akihiro Suda
 
Evolution of containers to kubernetes
Krishna-Kumar
 
Docker containerd Kubernetes sig node
Patrick Chanezon
 
Ad

More from Akihiro Suda (20)

PDF
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
PDF
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
PDF
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
PDF
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Akihiro Suda
 
PDF
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
PDF
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
PDF
20240201 [HPC Containers] Rootless Containers.pdf
Akihiro Suda
 
PDF
[Podman Special Event] Kubernetes in Rootless Podman
Akihiro Suda
 
PDF
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
PDF
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
PDF
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
PDF
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
PDF
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
PDF
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
PDF
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
PDF
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
Akihiro Suda
 
PDF
[CNCF TAG-Runtime 2022-10-06] Lima
Akihiro Suda
 
20250616 [KubeCon JP 2025] VexLLM - Silence Negligible CVE Alerts Using LLM.pdf
Akihiro Suda
 
20250403 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
20250402 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20241115 [KubeCon NA Pavilion] Lima.pdf_
Akihiro Suda
 
【情報科学若手の会 (2024/09/14】なぜオープンソースソフトウェアにコントリビュートすべきなのか
Akihiro Suda
 
【Vuls祭り#10 (2024/08/20)】 VexLLM: LLMを用いたVEX自動生成ツール
Akihiro Suda
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
Akihiro Suda
 
20240321 [KubeCon EU Pavilion] Lima.pdf_
Akihiro Suda
 
20240320 [KubeCon EU Pavilion] containerd.pdf
Akihiro Suda
 
20240201 [HPC Containers] Rootless Containers.pdf
Akihiro Suda
 
[Podman Special Event] Kubernetes in Rootless Podman
Akihiro Suda
 
[KubeConNA2023] Lima pavilion
Akihiro Suda
 
[KubeConNA2023] containerd pavilion
Akihiro Suda
 
[DockerConハイライト] OpenPubKeyによるイメージの署名と検証.pdf
Akihiro Suda
 
[CNCF TAG-Runtime] Usernetes Gen2
Akihiro Suda
 
[DockerCon 2023] Reproducible builds with BuildKit for software supply chain ...
Akihiro Suda
 
[KubeConEU2023] Lima pavilion
Akihiro Suda
 
[Container Plumbing Days 2023] Why was nerdctl made?
Akihiro Suda
 
[FOSDEM2023] Bit-for-bit reproducible builds with Dockerfile
Akihiro Suda
 
[CNCF TAG-Runtime 2022-10-06] Lima
Akihiro Suda
 
Ad

Recently uploaded (20)

PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Generic or Specific? Making sensible software design decisions
Bert Jan Schrijver
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 

[KubeCon EU 2020] containerd Deep Dive

  • 1. containerd Deep Dive Akihiro Suda (NTT) & Wei Fu (Alibaba Cloud) KubeCon EU 2020 Virtual (Aug 19, 2020)
  • 3. What is ? ● “Mid-level Container runtime” ○ Below platforms (Docker, Kubernetes) ○ Above lower level runtimes (runc) ● Resource Manager ○ Container processes ○ Image artifacts ○ Filesystem snapshots ○ Metadata and dependencies ● CNCF graduated project since February 2019 ○ Following Kubernetes, Prometheus, Envoy, and CoreDNS
  • 5. Highly customizable ● Runtime plugins ○ Runc, gVisor, Kata, Firecracker... ● Snapshotter plugins ○ OverlayFS, BtrFS, ZFS, … ● Content store plugins ○ Local, IPFS... ● Stream processor plugins ○ ImgCrypt, zstd...
  • 6. Adoption of containerd ● Container engines ● Kubernetes distributions ● Managed Kubernetes Services Docker & Moby k3c PouchContainer k3s kubespray microk8s Alibaba ACK Amazon EKS (Fargate nodes) Azure AKS Google GKE IBM IKS kind minikube Charmed Kubernetes And more...
  • 7. Adoption of containerd ● BuildKit ○ The modern implementation of `docker build` ● LinuxKit ○ Small Linux distro with containerd as the init ● Faasd ○ OpenFaaS for containerd ● VMware Fusion Nautilus ○ containerd on macOS, using VMware as the runtime plugin
  • 8. Upcoming features in v1.4 Akihiro Suda (NTT)
  • 9. Lazy pulling of images ● Run containers before completion of downloading the images ● Use cases: ○ Python/Ruby/Java/dotNET images ○ FaaS ○ Web apps with huge amount of HTML templates and media files ○ Jupyter Notebooks with big data samples included ○ Full GNOME/KDE desktop
  • 10. Lazy pulling of images: Stargz & eStargz ● The containerd snapshotter plugin for Stargz & eStargz https://github.com/containerd/stargz-snapshotter ● Stargz: seekable tar.gz for lazy-pullable container images ● eStargz: extended Stargz for batching frequently used files ● Both are fully compatible with legacy OCI tar.gz
  • 11. Lazy pulling of images: Stargz & eStargz Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Footer ... gzip legacy tar.gz Stargz Metadata 0 File 0 gzip Metadata 1 File 1 gzip ... Metadata {n-1} File {n-1} gzip gzip Footer Metadata stargz.index.json Can’t inspect file offsets without reading the whole archive Can inspect the file offsets immediately
  • 12. Lazy pulling of images: Stargz & eStargz ● eStargz profiles the actual file access pattern and reorders the file entries, so that relevant files can be prefetched in a single HTTP request /usr/bin/apt-get /bin/ls /bin/vi /lib/libc.so /lib/libjpeg.so /usr/bin/python3 .../usr/lib/python3/.../foo /usr/lib/python3/.../bar /app.py /bin/ls /app.py /usr/bin/python3 /lib/libc.so /usr/lib/python3/.../foo /usr/lib/python3/.../bar .../bin/vi /lib/libjpeg.so /usr/bin/apt-get Stargz eStargz
  • 13. Lazy pulling of images: Stargz & eStargz
  • 14. Lazy pulling of images: Stargz & eStargz Yesterday’s talk https://sched.co/ZepQ
  • 15. Support for SELinux MCS on CRI mode ● MCS: multi-category security Containers Volumes UID=0 C42 UID=0 C42 UID=0 C43 UID=0 C43
  • 16. Support for cgroup v2 ● The new cgroup hierarchy, adopted by Fedora (since 31) ● Simpler layout ○ V1: /sys/fs/cgroup/{memory,cpu,devices,pids….}/foo ○ V2: /sys/fs/cgroup/foo ● Supports eBPF integration, pressure metrics, improved OOM control... ● Friendly to non-root users
  • 17. Improved support for rootless mode ● Run containerd (and relevant components) as a non-root user ● Protect the host from potential vulnerabilities ● Adoption in containerd-related projects ○ Docker ○ BuildKit ○ k3s ○ k3c (on plan) ○ Kubernetes (on proposal, KEP 1371)
  • 18. Improved support for rootless mode ● [v1.3] No support for resource limitation (docker run --cpus … --memory ...) ○ Because unprivileged users cannot control cgroups ● [v1.3] No support for overlayfs snapshotter ○ Because unprivileged users cannot mount overlayfs (except on Ubuntu/Debian kernels) ○ “Native” snapshotter can be used, but slow and wastes the disk
  • 19. Improved support for rootless mode ● [v1.3] No support for resource limitation (docker run --cpus … --memory ...) ○ Because unprivileged users cannot control cgroups ● [v1.3] No support for overlayfs snapshotter ○ Because unprivileged users cannot mount overlayfs (except on Ubuntu/Debian kernels) ○ “Native” snapshotter can be used, but slow and wastes the disk → v1.4 supports resource limitation (requires cgroup v2 and systemd) → v1.4 supports FUSE-OverlayFS snapshotter (requires kernel >= 4.18)
  • 20. Demo: Rootless Kubernetes with Cgroup v2 “Usernetes” https://github.com/rootless-containers/usernetes https://asciinema.org/a/349859
  • 21. Other changes in v1.4 ● Windows CRI ● systemd NOTIFY_SOCKET ● Support reloading CNI config without restarting the daemon ● Socat binary is no longer needed Release note: https://github.com/containerd/containerd/releases
  • 22. v1.5 planning ● NRI: Node Resource Interface (#4411) ○ The new common interface for node resources such as cgroup ○ The plugin spec is very similar to CNI ● Sandbox API (#4131) ○ Pod sandbox as a first-class object ○ No “/pause” process ● Filesystem quota (#759)
  • 23. containerd: external plugins Wei Fu (Alibaba Cloud)
  • 25. Backend as external plugins ● Big goal - no re-compilation required!!! ● Stream processors ● gRPC proxy plugin for image storage ● RuntimeV2 proto for OCI Runtime
  • 26. Stream processor ● OCI Image layer data packaged in tar archive ● OCI image spec only supports few compression algorithms ○ +gzip/+zstd, but +gzip is more common ● How to handle experimental media-type stream? ○ Or encryption purpose? Image Layer Snapshot Tar Stream Processor Diff Service +gzip Custom?
  • 27. Stream processor ● Stream processor(SP) is binary plugin handling media-type stream ○ Accepts customize media-types, returns other one ○ Call binary for media-type converter ● Example ○ containerd/imgcrypt Image Layer Snapshot Tar SP Diff Service Tar+Gzip SP Tar(+Gzip)+encrypted SP Other Customize SP
  • 28. Stream processor - Demo ● Integrate with +zstd media-type ● asciinema link [stream_processors] [stream_processors."zstd"] accepts = ["application/vnd.oci.image.layer.v1.tar+zstd"] returns = "application/vnd.oci.image.layer.v1.tar" path = "zstd" args = ["-dcf"]
  • 29. Snapshot proxy plugin // Snapshot service manages snapshots service Snapshots { rpc Prepare(PrepareSnapshotRequest) returns (PrepareSnapshotResponse); rpc View(ViewSnapshotRequest) returns (ViewSnapshotResponse); rpc Mounts(MountsRequest) returns (MountsResponse); rpc Commit(CommitSnapshotRequest) returns (google.protobuf.Empty); rpc Remove(RemoveSnapshotRequest) returns (google.protobuf.Empty); rpc Stat(StatSnapshotRequest) returns (StatSnapshotResponse); rpc Update(UpdateSnapshotRequest) returns (UpdateSnapshotResponse); rpc List(ListSnapshotsRequest) returns (stream ListSnapshotsResponse); rpc Usage(UsageRequest) returns (UsageResponse); }
  • 30. Snapshot proxy plugin package main import( "net" "log" "github.com/containerd/containerd/api/services/snapshots/v1" "github.com/containerd/containerd/contrib/snapshotservice" ) func main() { rpc := grpc.NewServer() sn := CustomSnapshotter() service := snapshotservice.FromSnapshotter(sn) snapshots.RegisterSnapshotsServer(rpc, service) // Listen and serve l, err := net.Listen("unix", "/var/run/mysnapshotter.sock") if err != nil { log.Fatalf("error: %vn", err) } if err := rpc.Serve(l); err != nil { log.Fatalf("error: %vn", err) } } ● Configure with proxy_plugins ● Example ○ stargz-snapshotter ○ CVMFS Containerd Snapshotter [proxy_plugins] [proxy_plugins.customsnapshot] type = "snapshot" address = "/var/run/mysnapshotter.sock"
  • 31. Runtime V2 ● A first class shim API for runtime authors to integrate with containerd ○ More VM like runtimes have internal state and more abstract actions ○ A CLI approach introduces issues with state management ○ Each runtimes has its own values, but keep containerd in solid core scope ● Example ○ gVisor ○ KataContainer ○ Firecracker
  • 32. Runtime V2 service Task { rpc State(StateRequest) returns (StateResponse); rpc Create(CreateTaskRequest) returns (CreateTaskResponse); rpc Start(StartRequest) returns (StartResponse); rpc Delete(DeleteRequest) returns (DeleteResponse); rpc Pids(PidsRequest) returns (PidsResponse); rpc Pause(PauseRequest) returns (google.protobuf.Empty); rpc Resume(ResumeRequest) returns (google.protobuf.Empty); rpc Checkpoint(CheckpointTaskRequest) returns (google.protobuf.Empty); rpc Kill(KillRequest) returns (google.protobuf.Empty); rpc Exec(ExecProcessRequest) returns (google.protobuf.Empty); rpc ResizePty(ResizePtyRequest) returns (google.protobuf.Empty); rpc CloseIO(CloseIORequest) returns (google.protobuf.Empty); rpc Update(UpdateTaskRequest) returns (google.protobuf.Empty); rpc Wait(WaitRequest) returns (WaitResponse); rpc Stats(StatsRequest) returns (StatsResponse); rpc Connect(ConnectRequest) returns (ConnectResponse); rpc Shutdown(ShutdownRequest) returns (google.protobuf.Empty); }
  • 33. Runtime V2 - Binary ● Binary naming convention ○ Name io.containerd.runc.v2 --> Binary containerd-shim-runc-v2 ■ So both io.containerd.runc.v1 and io.containerd.runc.v2 are runtime V2 ■ runc.v2 supports grouping several containers with less resource ■ runc.v2 as CRI plugin’s default runtime ○ Via a runtime binary available in containerd’s PATH ● Required start/delete sub-commands ○ Resources created by container will be cleanup by delete sub-command
  • 34. Runtime V2 - Logging ● fifo/npipe as default channel ○ Receiver consumes more resources to handle log output. dockerd CRI-plugin containerd shim kernel Named Pipe
  • 35. Runtime V2 - Logging ● fifo/npipe as default channel ○ Receiver consumes more resources to handle log output. ○ And it requires that receiver must be alive!!! ○ Impact running containers if receiver is down too long. containerd shim Named Pipe kernel
  • 36. Runtime V2 - Logging ● Support pluggable logging via STDIO URIs ○ fifo - Linux (default) ○ npipe - Windows (default) ○ binary - Linux & Windows ○ file - Linux & Windows schema path:// ?key=valueSTDIO URI file file :// /var/log/cntr/hi ?maxSize=100MB binary binary :// /usr/bin/syslog ?addr=192.168.0.3