background-shape
Kubernetes 1.27 Multi-Tenancy, What's Actually Safe and What Still Isn't
June 9, 2023 · 7 min read · by Muhammad Amal programming

TL;DR — Namespace-as-tenant on Kubernetes 1.27 is safe enough for internal multi-tenancy if you layer ResourceQuota, NetworkPolicy, PSA, and a policy engine. / The Pod Security Standards admission controller replaces PSP cleanly; there’s no excuse for not using it. / For hostile multi-tenancy (untrusted code), you still need separate clusters or strong VM-level isolation.

People keep asking me whether they should run one big cluster or 30 small ones. The answer always depends on the threat model, and most of the time it’s the wrong question. The real question is: which workloads trust each other, and which don’t? Kubernetes 1.27 makes the friendly-multi-tenancy case quite manageable. The hostile case is still hard.

In a platform engineering context — internal teams, shared infrastructure, the same company’s risk surface — namespace-per-tenant works well. Below is the stack I run, the parts of 1.27 that actually help, and the cases where I still spin up dedicated clusters.

For the deployment substrate above this, see ArgoCD ApplicationSets at scale.

The tenancy spectrum

There’s no binary “multi-tenant or not.” There’s a spectrum:

  1. Soft multi-tenancy — internal teams, same company, mutual trust. Namespace boundaries plus quotas plus policies are enough.
  2. Hardened multi-tenancy — internal teams handling regulated data (PCI, HIPAA). Add strong network isolation, audit, and possibly node-level separation via taints.
  3. Hostile multi-tenancy — running untrusted user code. SaaS PaaS-like scenarios. Namespaces are not a security boundary here. You want gVisor, Kata, Firecracker, or separate clusters.

Most platform engineering work lives at level 1 or 2. Level 3 needs a different post.

The control plane: what 1.27 gives you

Kubernetes 1.27 (released April 2023) consolidated a few features that matter for tenancy:

  • Pod Security Admission is GA since 1.25. By 1.27 it’s the only sane choice; PodSecurityPolicy has been gone for a while. Three levels — privileged, baseline, restricted — applied per namespace via labels.
  • SeccompDefault is GA, so every container gets RuntimeDefault seccomp without per-pod config.
  • NamespaceLifecycle and improved finalizers behavior reduce stuck-namespace pain during tenant offboarding.
  • In-place pod vertical scaling landed as alpha. Not relevant to security but worth tracking — see the Kubernetes blog for the 1.27 release notes.
  • KMSv2 for etcd encryption moved to beta with much better rotation semantics.

Here’s the baseline namespace setup I provision per tenant:

apiVersion: v1
kind: Namespace
metadata:
  name: checkout
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.27
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
    tenant: checkout
---
apiVersion: v1
kind: ResourceQuota
metadata:
  name: checkout-quota
  namespace: checkout
spec:
  hard:
    requests.cpu: "40"
    requests.memory: 80Gi
    limits.cpu: "60"
    limits.memory: 120Gi
    persistentvolumeclaims: "30"
    services.loadbalancers: "2"
    count/jobs.batch: "100"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: checkout-defaults
  namespace: checkout
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "4"
        memory: 8Gi

The LimitRange is the part most people skip. Without it, a developer who forgets to set resources on a pod will get whatever the scheduler feels like giving them, and you’ll burn capacity. With it, you guarantee every container has requests and limits, and you cap the upper bound so nobody single-handedly schedules a 64-core pod.

Network policy is non-optional

Default-deny ingress and egress per namespace, then explicit allow rules. I use Cilium 1.13 for this because the network policy editor and Hubble flow visibility are worth the operational complexity.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny
  namespace: checkout
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-platform-egress
  namespace: checkout
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: platform-observability
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
            except:
              - 169.254.169.254/32  # block cloud metadata
      ports:
        - protocol: TCP
          port: 443

The cloud metadata exclusion matters. On AWS, IMDSv1 (still reachable on many EKS configurations) lets a pod grab the node’s IAM credentials. Block 169.254.169.254 at the network policy level and use IRSA for pod identity instead.

Policy as code with Kyverno

Pod Security Admission is necessary but not sufficient. You also need to enforce things like:

  • Every pod must have a non-empty runAsNonRoot: true
  • Images must come from your internal registry
  • Every namespace must have a team label that matches a real team in Backstage
  • No hostPath volumes, ever

Kyverno 1.10 (or OPA Gatekeeper, take your pick) handles this. Kyverno’s syntax is closer to native Kubernetes manifests, which makes it easier to onboard team members:

apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-image-registry
spec:
  validationFailureAction: Enforce
  background: false
  rules:
    - name: only-trusted-registry
      match:
        any:
          - resources:
              kinds: [Pod]
              namespaces: ['?*']
      exclude:
        any:
          - resources:
              namespaces: [kube-system, kyverno, argocd]
      validate:
        message: "Images must come from registry.acme.io"
        pattern:
          spec:
            containers:
              - image: 'registry.acme.io/*'
            initContainers:
              - image: 'registry.acme.io/*'
            =(ephemeralContainers):
              - image: 'registry.acme.io/*'

Pair this with a mutating policy that automatically injects a team label from a namespace annotation, and you remove most of the “developer forgot the label” support tickets.

When you actually need separate clusters

Namespace tenancy fails in a few specific situations. Don’t try to paper over them:

  • Different Kubernetes versions — if one tenant needs a controller that only works on 1.25 and another needs a feature that landed in 1.27, you don’t have one cluster, you have two clusters in a trenchcoat.
  • Different node OS or kernel — kernel modules, GPU drivers, real-time kernels. Node selectors plus taints can hack around this but it’s fragile.
  • Regulated isolation requirements — when your auditor reads “shared control plane” and writes “finding.”
  • Blast radius concerns at the platform level — upgrading the API server affects every tenant. If a tenant cannot tolerate a 5-minute control-plane disruption during upgrades, they’re not a shared-cluster tenant.
  • Massive scale differences — a tenant with 50,000 pods will dominate scheduler latency for everyone else.

For the in-between cases, vcluster 0.15 is a real option. It runs a virtual control plane (a separate API server) per tenant inside a host cluster namespace. Each vcluster looks like a real cluster to its users; under the hood the workloads schedule onto the host. You get separate CRDs, separate RBAC, separate Kubernetes versions even, while still sharing the underlying node pool.

vcluster’s overhead is small (one or two pods per virtual cluster) and the isolation is meaningfully better than namespace-only. It’s not as strong as separate clusters — they still share the host kernel and CNI — but it’s the right answer when you have 50 small tenants and don’t want to run 50 EKS control planes.

Common Pitfalls

  • PSA at enforce: privileged. I’ve seen platforms ship with privileged and a TODO to harden later. Later never comes. Start at restricted and grant per-namespace exceptions explicitly.
  • No egress policy. Ingress-only NetworkPolicy is half a solution. Compromised pods that can reach 169.254.169.254 or the internet at large are an exfiltration channel.
  • Quota without LimitRange. Teams will set requests: 0 to fit more pods in. Then the scheduler over-commits the node, and you get OOM-killed neighbors at 3am.
  • Shared cluster-scoped resources (CRDs). When tenant A installs a CRD that tenant B depends on, you have an implicit coupling. Either reserve CRD installation for the platform team or use vcluster to isolate them.
  • Service mesh assumptions. Istio 1.17 and Linkerd 2.13 both have multi-tenant stories now, but mTLS-by-namespace is not network isolation. You still want NetworkPolicy underneath.
  • Default service account tokens auto-mounted. Set automountServiceAccountToken: false at the namespace default, or block at admission. Most pods don’t need an API token.
  • Treating kubectl auth can-i as the audit log. It only shows current state. You need API server audit logs shipped off-cluster for actual investigations.

Wrapping Up

Multi-tenancy on Kubernetes 1.27 is a stack of cheap controls that compound. Each control on its own — quotas, PSA, NetworkPolicy, Kyverno, IRSA — is weak; together they’re enough for the friendly-tenant case that covers most internal platforms.

For anything stronger, accept the operational cost of more clusters or move to vcluster. The next post compares FluxCD and ArgoCD for managing the fleet of clusters that result.