Hosted Control Planes and Bare Metal: What, Why, and How
This is a blog post I co-authored with Matthias Winzeler from meltcloud , trying to explain why Hosted Control Planes matter for Bare Metal setups, along with a deep dive into this architectural pattern: what they are, why they matter and how to run them in practice.
If you're running Kubernetes at a reasonably sized organization, you will need multiple Kubernetes clusters: at least separate clusters for dev, staging & production, but often also some dedicated clusters for special projects or teams.
That raises the question: how do we scale the control planes without wasting hardware and multiplying orchestration overhead?
This is where Hosted Control Planes (HCPs) come in: instead of dedicating three or more servers or VMs per cluster to its control plane, the control planes run as workloads inside a shared Kubernetes cluster. Think of them as "control planes as pods".
This post dives into what HCPs are, why they matter, and how to operate them in practice. We'll look at architecture, the data store & network problems and where projects like Kamaji, HyperShift and SAP Gardener fit in.
The Old Model: Control Planes as dedicated nodes
In the old model, each Kubernetes cluster comes with a full control plane attached: at least three nodes dedicated to etcd and the Kubernetes control plane processes (API server, scheduler, controllers), alongside its workers:
This makes sense in cloud or when virtualization is available: Control plane VMs can be kept relatively cheap by sizing them as small as possible. Each team gets a full cluster, accepting a limited amount of overhead for the control plane VMs.
But on-prem, especially as many orgs are moving off virtualization after Broadcom's licensing changes, the picture looks different:
That's the pain HCPs aim to solve. Instead of attaching dedicated control plane servers to every cluster, they let us collapse control planes into a shared platform.
Why Hosted Control Planes?
In the HCP model, the API server, controller-manager, scheduler, and supporting components all run inside a shared cluster (sometimes called seed or management cluster), just like normal workloads. Workers - either physical servers or VMs, whatever makes most sense for the workload profile - can then connect remotely to their control plane pods.
This model solves the main drawbacks of dedicated control planes:
With HCPs, we get:
Let's take a look at what the architecture looks like.
Architecture
The tenant's workers don't know the difference: they see a normal API server endpoint.
But under the hood, there's an important design choice still to be made: what about the data stores?
The Data Store Problem
Every Kubernetes control plane needs a backend data store. While there are other options, in practice most still run etcd.
However, we have to figure out whether each tenant cluster gets its own etcd instance, or if multiple clusters share one. Let's look at the trade-offs:
Shared etcd across many clusters
Dedicated etcd per cluster
It's a trade-off:
Projects like Kamaji make this choice explicit and let you pick the model that fits.
The Network Problem
In the old model, control plane nodes usually sit close to the workers, for example in the same subnet. Connectivity is simple.
With hosted control planes the control plane now lives remotely, inside a management cluster. Each API server must be reachable externally, typically exposed via a Service of type LoadBalancer. That requires your management cluster to provide LoadBalancer capability.
By default, the API server also needs to establish connections into the worker cluster (e.g. to talk to kubelets), which might be undesirable from a firewall point of view. The practical solution is konnectivity: with it, all traffic flows from workers to the API server, eliminating inbound connections from the control plane. In practice, this makes konnectivity close to a requirement for HCP setups.
Tenancy isolation also matters more. Each hosted control plane should be strictly separated:
These requirements aren't difficult, but they need deliberate design, especially in on-prem environments where firewalls, routing, and L2/L3 boundaries usually separate workers and the management cluster.
How it looks in practice
Let's take Kamaji as an example. It runs tenant control planes as pods inside a management cluster. Let's make sure you have a cluster ready that offers PVs (for etcd data) and LoadBalancer services (for API server exposure).
Then, installing Kamaji itself is just a matter of installing its helm chart:
# install cert-manager (prerequisite)
helm install \
cert-manager oci://quay.io/jetstack/charts/cert-manager \
--version v1.19.1 \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true
# install kamaji
helm repo add clastix https://clastix.github.io/charts
helm repo update
helm install kamaji clastix/kamaji \
--version 0.0.0+latest \
--namespace kamaji-system \
--create-namespace \
--set image.tag=latest
By default, Kamaji deploys a shared etcd instance for all control planes. If you prefer a dedicated etcd per cluster, you could deploy one kamaji-etcd for each cluster instead.
Now, creating a new cluster plane is as simple as applying a TenantControlPlane custom resource:
apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
name: my-cluster
labels:
tenant.clastix.io: my-cluster
spec:
controlPlane:
deployment:
replicas: 2
service:
serviceType: LoadBalancer
kubernetes:
version: "v1.33.0"
kubelet:
cgroupfs: systemd
networkProfile:
port: 6443
addons:
coreDNS: {}
kubeProxy: {}
konnectivity:
server:
port: 8132
agent:
mode: DaemonSet
After a few minutes, Kamaji will have created the control plane pods inside the management cluster, and have exposed the API server endpoint via a LoadBalancer service.
But this is not only about provisioning: Kamaji - being an operator - takes most of the lifecycle burderen off your shoulders: it handles upgrades, scaling and other toil (rotating secrets, CAs, ...) of the control planes for you - just patch the respective field in the TenantControlPlane resource and Kamaji will take care of the rest.
As a next step, you could now connect your workers to that endpoint (for example, using one of the many supported CAPI providers), and start using your new cluster.
With this, multi-cluster stops being “three servers plus etcd per cluster” and instead becomes “one management cluster, many control planes inside”.
The Road Ahead
Hosted Control Planes are quickly becoming the standard for multi-cluster Kubernetes:
While HCPs give us a clean answer for multi-cluster control planes, they only solve half the story.
On bare metal and on-prem, workers remain a hard problem: how to provision, update, and replace them reliably. And once your bare metal fleet is prepared, how can you slice those large servers into right-sized nodes for true Cluster-as-a-Service?
That's where concepts like immutable workers and elastic pools come in. Together with hosted control planes, they point the way towards something our industry has not figured out yet: a cloud-like managed Kubernetes experience - think GKE/AKS/EKS - on our own premises.
If you're curious about that, check out meltcloud: we're building exactly that.
Summary
Hosted Control Planes let us:
They don't remove every challenge, but they offer a new operational model for Kubernetes at scale.
We'd love to hear how you're approaching multi-cluster. Are you already experimenting with Hosted Control Planes?
Let us know, or try Kamaji using CLASTIX's free of charge Playground area and share your feedback with us (meltcloud) or Dario's CLASTIX's team.
CEO / Founder @greenEagle.cloud ICT GmbH
2w⎈ Dario Tranchitella Nice Article. With IBM Fusion HCI you get all this on top of an Openshift Cluster. HCP is running as PODs in the HUB-Cluster. Dataplane can be sliced with Kubevirt If necessary. For AI Workload add some powerfull H100/H200 Nodes to your hosted Inventory and all managed by the IBM Fusion Operators. #IBM #IBMFusion
☸️ CNCF Kubestronaut | PhD Student @eurecom
2wThanks for sharing, interesting
Supercomputing Engineer @ SF Compute Company | Rust & Nix Tool Maker | US Marine 🖍️ | Former ServiceNow | Angler 🎣
2wThis is outstanding
Futurist historian
2w"Hyperscalers already run this way under the hood" I'm not exactly sure what you're referring to with "this way". EKS, GKE, and AKS run the control plane instances very differently than what you described in this article. None of them run their hosted control plane on top of Kubernetes and EKS/AKS run the control plane as VMs, not containers. GKE doesn't run etcd at all and EKS is moving away from it.
Consultant 💭 Cloud | Data+ Platform Engineering | Solution Architect
2wHosted control plane as POD vs Node-based control plane RPO, RTO would be a good metric, especially in an on-prem situation.