Hosted Control Planes and Bare Metal: What, Why, and How

⎈ Dario Tranchitella

Capsule and Kamaji maintainer

Published Oct 21, 2025

This is a blog post I co-authored with Matthias Winzeler from meltcloud , trying to explain why Hosted Control Planes matter for Bare Metal setups, along with a deep dive into this architectural pattern: what they are, why they matter and how to run them in practice.

If you're running Kubernetes at a reasonably sized organization, you will need multiple Kubernetes clusters: at least separate clusters for dev, staging & production, but often also some dedicated clusters for special projects or teams.

That raises the question: how do we scale the control planes without wasting hardware and multiplying orchestration overhead?

This is where Hosted Control Planes (HCPs) come in: instead of dedicating three or more servers or VMs per cluster to its control plane, the control planes run as workloads inside a shared Kubernetes cluster. Think of them as "control planes as pods".

This post dives into what HCPs are, why they matter, and how to operate them in practice. We'll look at architecture, the data store & network problems and where projects like Kamaji, HyperShift and SAP Gardener fit in.

The Old Model: Control Planes as dedicated nodes

In the old model, each Kubernetes cluster comes with a full control plane attached: at least three nodes dedicated to etcd and the Kubernetes control plane processes (API server, scheduler, controllers), alongside its workers:

This makes sense in cloud or when virtualization is available: Control plane VMs can be kept relatively cheap by sizing them as small as possible. Each team gets a full cluster, accepting a limited amount of overhead for the control plane VMs.

But on-prem, especially as many orgs are moving off virtualization after Broadcom's licensing changes, the picture looks different:

Dedicated control planes no longer mean "a few small VMs", they mean dedicated physical servers
Physical servers these days usually start at 32+ cores and 128+ GB RAM (otherwise, you waste power and rack space) while control planes need only a fraction of that
For dozens of clusters, this quickly becomes racks of underutilized hardware
Each cluster still needs monitoring, patching, and backup, multiplying operational burden

That's the pain HCPs aim to solve. Instead of attaching dedicated control plane servers to every cluster, they let us collapse control planes into a shared platform.

Why Hosted Control Planes?

In the HCP model, the API server, controller-manager, scheduler, and supporting components all run inside a shared cluster (sometimes called seed or management cluster), just like normal workloads. Workers - either physical servers or VMs, whatever makes most sense for the workload profile - can then connect remotely to their control plane pods.

This model solves the main drawbacks of dedicated control planes:

Hardware waste: in the old model, each cluster consumes whole servers for components that barely use them.
Control plane sprawl: more clusters mean more control plane instances (usually at least three for high availability), multiplying the waste
Operational burden: every control plane has its own patching, upgrades, and failure modes to handle.

With HCPs, we get:

Higher density: dozens of clusters can share a small pool of physical servers for their control planes.
Faster provisioning: new clusters come up in minutes rather than days (or weeks if you don't have spare hardware).
Lifecycle as Kubernetes workloads: since control planes run as pods, we can upgrade, monitor, and scale them using Kubernetes’ own orchestration primitives.

Let's take a look at what the architecture looks like.

Architecture

A shared cluster (often called seed or management cluster) runs the hosted control planes.
Each tenant cluster has:
Resources are isolated with namespaces, RBAC, and network policies.

The tenant's workers don't know the difference: they see a normal API server endpoint.

But under the hood, there's an important design choice still to be made: what about the data stores?

The Data Store Problem

Every Kubernetes control plane needs a backend data store. While there are other options, in practice most still run etcd.

However, we have to figure out whether each tenant cluster gets its own etcd instance, or if multiple clusters share one. Let's look at the trade-offs:

Shared etcd across many clusters

Better density and fewer components
Risk of noisy neighbor problems if one tenant overloads etcd
Tighter coupling of lifecycle and upgrades

Dedicated etcd per cluster

Strong isolation and failure domains
More moving parts to manage and back up
Higher overall resource use

It's a trade-off:

Shared etcd across clusters can reduce resource use, but without real QoS guarantees on etcd, you'll probably only want to run it for non-production or lab scenarios where occasional impact is acceptable.
Dedicated etcd per cluster is the usual option for production (this is also what the big clouds do). It isolates failures, provides predictable performance, and keeps recovery contained.

Projects like Kamaji make this choice explicit and let you pick the model that fits.

The Network Problem

In the old model, control plane nodes usually sit close to the workers, for example in the same subnet. Connectivity is simple.

With hosted control planes the control plane now lives remotely, inside a management cluster. Each API server must be reachable externally, typically exposed via a Service of type LoadBalancer. That requires your management cluster to provide LoadBalancer capability.

By default, the API server also needs to establish connections into the worker cluster (e.g. to talk to kubelets), which might be undesirable from a firewall point of view. The practical solution is konnectivity: with it, all traffic flows from workers to the API server, eliminating inbound connections from the control plane. In practice, this makes konnectivity close to a requirement for HCP setups.

Tenancy isolation also matters more. Each hosted control plane should be strictly separated:

Namespaces and RBAC isolate resources per tenant
NetworkPolicies prevent cross-talk between clusters

These requirements aren't difficult, but they need deliberate design, especially in on-prem environments where firewalls, routing, and L2/L3 boundaries usually separate workers and the management cluster.

How it looks in practice

Let's take Kamaji as an example. It runs tenant control planes as pods inside a management cluster. Let's make sure you have a cluster ready that offers PVs (for etcd data) and LoadBalancer services (for API server exposure).

Then, installing Kamaji itself is just a matter of installing its helm chart:

# install cert-manager (prerequisite)
helm install \
  cert-manager oci://quay.io/jetstack/charts/cert-manager \
  --version v1.19.1 \
  --namespace cert-manager \
  --create-namespace \
  --set crds.enabled=true

# install kamaji
helm repo add clastix https://clastix.github.io/charts
helm repo update
helm install kamaji clastix/kamaji \
    --version 0.0.0+latest \
    --namespace kamaji-system \
    --create-namespace \
    --set image.tag=latest

By default, Kamaji deploys a shared etcd instance for all control planes. If you prefer a dedicated etcd per cluster, you could deploy one kamaji-etcd for each cluster instead.

Now, creating a new cluster plane is as simple as applying a TenantControlPlane custom resource:

apiVersion: kamaji.clastix.io/v1alpha1
kind: TenantControlPlane
metadata:
  name: my-cluster
  labels:
    tenant.clastix.io: my-cluster
spec:
  controlPlane:
    deployment:
      replicas: 2
    service:
      serviceType: LoadBalancer
  kubernetes:
    version: "v1.33.0"
    kubelet:
      cgroupfs: systemd
  networkProfile:
    port: 6443
  addons:
    coreDNS: {}
    kubeProxy: {}
    konnectivity:
      server:
        port: 8132
      agent:
        mode: DaemonSet

After a few minutes, Kamaji will have created the control plane pods inside the management cluster, and have exposed the API server endpoint via a LoadBalancer service.

But this is not only about provisioning: Kamaji - being an operator - takes most of the lifecycle burderen off your shoulders: it handles upgrades, scaling and other toil (rotating secrets, CAs, ...) of the control planes for you - just patch the respective field in the TenantControlPlane resource and Kamaji will take care of the rest.

As a next step, you could now connect your workers to that endpoint (for example, using one of the many supported CAPI providers), and start using your new cluster.

With this, multi-cluster stops being “three servers plus etcd per cluster” and instead becomes “one management cluster, many control planes inside”.

The Road Ahead

Hosted Control Planes are quickly becoming the standard for multi-cluster Kubernetes:

Hyperscalers already run this way under the hood
OpenShift is all-in with HyperShift
Kamaji brings the same model to the open ecosystem

While HCPs give us a clean answer for multi-cluster control planes, they only solve half the story.

On bare metal and on-prem, workers remain a hard problem: how to provision, update, and replace them reliably. And once your bare metal fleet is prepared, how can you slice those large servers into right-sized nodes for true Cluster-as-a-Service?

That's where concepts like immutable workers and elastic pools come in. Together with hosted control planes, they point the way towards something our industry has not figured out yet: a cloud-like managed Kubernetes experience - think GKE/AKS/EKS - on our own premises.

If you're curious about that, check out meltcloud: we're building exactly that.

Summary

Hosted Control Planes let us:

Decouple the control plane from dedicated hardware
Increase control plane resource efficiency
Standardize lifecycle, upgrades, and monitoring

They don't remove every challenge, but they offer a new operational model for Kubernetes at scale.

We'd love to hear how you're approaching multi-cluster. Are you already experimenting with Hosted Control Planes?

Let us know, or try Kamaji using CLASTIX's free of charge Playground area and share your feedback with us (meltcloud) or Dario's CLASTIX's team.

The Platform Brief

2,140 followers

+ Subscribe

gerald eggenberger

CEO / Founder @greenEagle.cloud ICT GmbH

⎈ Dario Tranchitella Nice Article. With IBM Fusion HCI you get all this on top of an Openshift Cluster. HCP is running as PODs in the HUB-Cluster. Dataplane can be sliced with Kubevirt If necessary. For AI Workload add some powerfull H100/H200 Nodes to your hosted Inventory and all managed by the IBM Fusion Operators. #IBM #IBMFusion

Meliani Abd Elghani 🇵🇸

☸️ CNCF Kubestronaut | PhD Student @eurecom

Thanks for sharing, interesting

1 Reaction

Kenny Sheridan

Supercomputing Engineer @ SF Compute Company | Rust & Nix Tool Maker | US Marine 🖍️ | Former ServiceNow | Angler 🎣

This is outstanding

2 Reactions

🤘 Justin Garrison

Futurist historian

"Hyperscalers already run this way under the hood" I'm not exactly sure what you're referring to with "this way". EKS, GKE, and AKS run the control plane instances very differently than what you described in this article. None of them run their hosted control plane on top of Kubernetes and EKS/AKS run the control plane as VMs, not containers. GKE doesn't run etcd at all and EKS is moving away from it.

Sudhakar Daggubati

Consultant‎ 💭 Cloud | Data+ Platform Engineering | Solution Architect‎

Hosted control plane as POD vs Node-based control plane RPO, RTO would be a good metric, especially in an on-prem situation.

1 Reaction

See more comments

To view or add a comment, sign in

LinkedIn respects your privacy

Hosted Control Planes and Bare Metal: What, Why, and How

⎈ Dario Tranchitella

Capsule and Kamaji maintainer

The Old Model: Control Planes as dedicated nodes

Why Hosted Control Planes?

Architecture

The Data Store Problem

Shared etcd across many clusters

Dedicated etcd per cluster

The Network Problem

How it looks in practice

The Road Ahead

Summary

The Platform Brief

2,140 followers

More articles by ⎈ Dario Tranchitella

Explore content categories

The Old Model: Control Planes as dedicated nodes

Why Hosted Control Planes?

Architecture

The Data Store Problem

Shared etcd across many clusters

Dedicated etcd per cluster

The Network Problem

How it looks in practice

The Road Ahead

Summary

The Platform Brief

2,140 followers

More articles by ⎈ Dario Tranchitella

The hidden cost of building blocks in Platforms

Multi-Cloud is Multi-Headache

Stop calling everything Cloud Native

GitOps is not a religion: it's just a tool

Your platform isn't a Product until it has SLAs

The Golden Path myth

If your platform needs a Ticket System, it's already broken

Cognitive load is the new downtime

Platform Drift: The silent killer of Internal Kubernetes products

Multi-Tenancy is a Product, not a Feature.

Explore content categories