Kubernetes 1.27 Multi-Tenancy, What's Actually Safe and What Still Isn't
TL;DR — Namespace-as-tenant on Kubernetes 1.27 is safe enough for internal multi-tenancy if you layer ResourceQuota, NetworkPolicy, PSA, and a policy engine. / The Pod Security Standards admission controller replaces PSP cleanly; there’s no excuse for not using it. / For hostile multi-tenancy (untrusted code), you still need separate clusters or strong VM-level isolation.
People keep asking me whether they should run one big cluster or 30 small ones. The answer always depends on the threat model, and most of the time it’s the wrong question. The real question is: which workloads trust each other, and which don’t? Kubernetes 1.27 makes the friendly-multi-tenancy case quite manageable. The hostile case is still hard.
In a platform engineering context — internal teams, shared infrastructure, the same company’s risk surface — namespace-per-tenant works well. Below is the stack I run, the parts of 1.27 that actually help, and the cases where I still spin up dedicated clusters.
For the deployment substrate above this, see ArgoCD ApplicationSets at scale.
The tenancy spectrum
There’s no binary “multi-tenant or not.” There’s a spectrum:
- Soft multi-tenancy — internal teams, same company, mutual trust. Namespace boundaries plus quotas plus policies are enough.
- Hardened multi-tenancy — internal teams handling regulated data (PCI, HIPAA). Add strong network isolation, audit, and possibly node-level separation via taints.
- Hostile multi-tenancy — running untrusted user code. SaaS PaaS-like scenarios. Namespaces are not a security boundary here. You want gVisor, Kata, Firecracker, or separate clusters.
Most platform engineering work lives at level 1 or 2. Level 3 needs a different post.
The control plane: what 1.27 gives you
Kubernetes 1.27 (released April 2023) consolidated a few features that matter for tenancy:
- Pod Security Admission is GA since 1.25. By 1.27 it’s the only sane choice; PodSecurityPolicy has been gone for a while. Three levels —
privileged,baseline,restricted— applied per namespace via labels. - SeccompDefault is GA, so every container gets
RuntimeDefaultseccomp without per-pod config. - NamespaceLifecycle and improved finalizers behavior reduce stuck-namespace pain during tenant offboarding.
- In-place pod vertical scaling landed as alpha. Not relevant to security but worth tracking — see the Kubernetes blog for the 1.27 release notes.
KMSv2for etcd encryption moved to beta with much better rotation semantics.
Here’s the baseline namespace setup I provision per tenant:
apiVersion: v1
kind: Namespace
metadata:
name: checkout
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: v1.27
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
tenant: checkout
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: checkout-quota
namespace: checkout
spec:
hard:
requests.cpu: "40"
requests.memory: 80Gi
limits.cpu: "60"
limits.memory: 120Gi
persistentvolumeclaims: "30"
services.loadbalancers: "2"
count/jobs.batch: "100"
---
apiVersion: v1
kind: LimitRange
metadata:
name: checkout-defaults
namespace: checkout
spec:
limits:
- type: Container
default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
max:
cpu: "4"
memory: 8Gi
The LimitRange is the part most people skip. Without it, a developer who forgets to set resources on a pod will get whatever the scheduler feels like giving them, and you’ll burn capacity. With it, you guarantee every container has requests and limits, and you cap the upper bound so nobody single-handedly schedules a 64-core pod.
Network policy is non-optional
Default-deny ingress and egress per namespace, then explicit allow rules. I use Cilium 1.13 for this because the network policy editor and Hubble flow visibility are worth the operational complexity.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
namespace: checkout
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-platform-egress
namespace: checkout
spec:
podSelector: {}
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: platform-observability
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32 # block cloud metadata
ports:
- protocol: TCP
port: 443
The cloud metadata exclusion matters. On AWS, IMDSv1 (still reachable on many EKS configurations) lets a pod grab the node’s IAM credentials. Block 169.254.169.254 at the network policy level and use IRSA for pod identity instead.
Policy as code with Kyverno
Pod Security Admission is necessary but not sufficient. You also need to enforce things like:
- Every pod must have a non-empty
runAsNonRoot: true - Images must come from your internal registry
- Every namespace must have a
teamlabel that matches a real team in Backstage - No
hostPathvolumes, ever
Kyverno 1.10 (or OPA Gatekeeper, take your pick) handles this. Kyverno’s syntax is closer to native Kubernetes manifests, which makes it easier to onboard team members:
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-image-registry
spec:
validationFailureAction: Enforce
background: false
rules:
- name: only-trusted-registry
match:
any:
- resources:
kinds: [Pod]
namespaces: ['?*']
exclude:
any:
- resources:
namespaces: [kube-system, kyverno, argocd]
validate:
message: "Images must come from registry.acme.io"
pattern:
spec:
containers:
- image: 'registry.acme.io/*'
initContainers:
- image: 'registry.acme.io/*'
=(ephemeralContainers):
- image: 'registry.acme.io/*'
Pair this with a mutating policy that automatically injects a team label from a namespace annotation, and you remove most of the “developer forgot the label” support tickets.
When you actually need separate clusters
Namespace tenancy fails in a few specific situations. Don’t try to paper over them:
- Different Kubernetes versions — if one tenant needs a controller that only works on 1.25 and another needs a feature that landed in 1.27, you don’t have one cluster, you have two clusters in a trenchcoat.
- Different node OS or kernel — kernel modules, GPU drivers, real-time kernels. Node selectors plus taints can hack around this but it’s fragile.
- Regulated isolation requirements — when your auditor reads “shared control plane” and writes “finding.”
- Blast radius concerns at the platform level — upgrading the API server affects every tenant. If a tenant cannot tolerate a 5-minute control-plane disruption during upgrades, they’re not a shared-cluster tenant.
- Massive scale differences — a tenant with 50,000 pods will dominate scheduler latency for everyone else.
For the in-between cases, vcluster 0.15 is a real option. It runs a virtual control plane (a separate API server) per tenant inside a host cluster namespace. Each vcluster looks like a real cluster to its users; under the hood the workloads schedule onto the host. You get separate CRDs, separate RBAC, separate Kubernetes versions even, while still sharing the underlying node pool.
vcluster’s overhead is small (one or two pods per virtual cluster) and the isolation is meaningfully better than namespace-only. It’s not as strong as separate clusters — they still share the host kernel and CNI — but it’s the right answer when you have 50 small tenants and don’t want to run 50 EKS control planes.
Common Pitfalls
- PSA at
enforce: privileged. I’ve seen platforms ship withprivilegedand a TODO to harden later. Later never comes. Start atrestrictedand grant per-namespace exceptions explicitly. - No egress policy. Ingress-only NetworkPolicy is half a solution. Compromised pods that can reach
169.254.169.254or the internet at large are an exfiltration channel. - Quota without
LimitRange. Teams will setrequests: 0to fit more pods in. Then the scheduler over-commits the node, and you get OOM-killed neighbors at 3am. - Shared cluster-scoped resources (CRDs). When tenant A installs a CRD that tenant B depends on, you have an implicit coupling. Either reserve CRD installation for the platform team or use vcluster to isolate them.
- Service mesh assumptions. Istio 1.17 and Linkerd 2.13 both have multi-tenant stories now, but mTLS-by-namespace is not network isolation. You still want
NetworkPolicyunderneath. - Default service account tokens auto-mounted. Set
automountServiceAccountToken: falseat the namespace default, or block at admission. Most pods don’t need an API token. - Treating
kubectl auth can-ias the audit log. It only shows current state. You need API server audit logs shipped off-cluster for actual investigations.
Wrapping Up
Multi-tenancy on Kubernetes 1.27 is a stack of cheap controls that compound. Each control on its own — quotas, PSA, NetworkPolicy, Kyverno, IRSA — is weak; together they’re enough for the friendly-tenant case that covers most internal platforms.
For anything stronger, accept the operational cost of more clusters or move to vcluster. The next post compares FluxCD and ArgoCD for managing the fleet of clusters that result.