background-shape
mTLS, Service Mesh or Application Layer in 2024
July 24, 2024 · 8 min read · by Muhammad Amal programming

TL;DR — Mesh-terminated mTLS is the default in 2024 for in-cluster traffic. Application-terminated mTLS is the right choice when you need cryptographic identity inside the process, when you’re crossing a trust boundary, or when you don’t have a mesh. Most teams need both, scoped differently.

The mTLS decision is one of those architecture choices where the right answer depends on what you’re actually defending against and where your trust boundaries live. I get asked this regularly enough that I want to write the answer down once and link to it.

The shorthand framing I use: mesh mTLS protects the wire, application mTLS protects the workload. Both are valuable. They protect different things. If you only do one, you’re missing the other.

This post walks through both approaches with working configurations. I’ll cover the threat model differences, the operational realities, and the specific scenarios where each one is the right call.

What mTLS actually gets you

Before the architecture choice, a clear statement of what mTLS provides:

  • Confidentiality. TLS handshake establishes a session key; subsequent traffic is encrypted.
  • Server authentication. Client verifies the server’s cert against a trusted CA. This is also what plain TLS gives you.
  • Client authentication. Server verifies the client’s cert against a trusted CA. This is the “m” in mTLS.

The third point is the interesting one. With mTLS, a server can refuse a connection from a client without a valid certificate, and it can identify the client cryptographically. That’s stronger than IP allowlists, network policies, or shared bearer tokens, because the identity is bound to a key the workload holds.

What mTLS does not give you, on its own:

  • Authorization. mTLS proves “this client is workload X”. It doesn’t say “workload X is allowed to do this”. You need policy on top.
  • Application-layer auth. mTLS authenticates the workload, not the user. A request from authenticated workload X may still represent a user the server doesn’t trust. JWT-style user auth (covered in securing Go microservices with JWT) is orthogonal.

Mesh-terminated mTLS, the default

In an Istio or Linkerd deployment, mTLS happens between sidecars. The application talks plain HTTP to its sidecar; the sidecar wraps the connection in mTLS to the remote sidecar; the remote sidecar unwraps it and forwards plain HTTP to the remote application. Certificates are issued and rotated by the mesh control plane, typically via SPIFFE/SPIRE or the mesh’s built-in CA.

The Istio config to enforce strict mTLS namespace-wide:

apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: default
  namespace: orders
spec:
  mtls:
    mode: STRICT
---
apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  name: orders-allow-from-frontend
  namespace: orders
spec:
  selector:
    matchLabels:
      app: orders
  action: ALLOW
  rules:
    - from:
        - source:
            principals: ["cluster.local/ns/frontend/sa/frontend"]
      to:
        - operation:
            methods: ["GET", "POST"]
            paths: ["/v1/orders*"]

PeerAuthentication: STRICT rejects any non-mTLS traffic to pods in the orders namespace. The AuthorizationPolicy then layers identity-based policy on top — only the frontend service account, in the frontend namespace, can call the orders endpoints.

What’s good about this:

  • Zero application changes. Your Go code keeps talking plain HTTP. The mesh handles all the TLS work.
  • Centralized cert rotation. Istio rotates pod certs every 24 hours by default. You never touch a cert file.
  • Identity is the Kubernetes service account. Reusable across other policy (PodSecurityPolicy, NetworkPolicy, etc).
  • Consistent across services. Every workload in the mesh gets the same treatment with the same config.

What’s not so good:

  • Application sees plain HTTP. Your app doesn’t know the connection is mTLS. If you need to make authorization decisions based on the calling workload’s identity, you have to either trust headers the mesh sets (X-Forwarded-Client-Cert) or do AuthorizationPolicy in the mesh.
  • Mesh is on the critical path. Sidecar crash, control plane outage, cert issuance failure — all affect traffic. The mesh is now part of your blast radius.
  • Cross-cluster gets harder. Mesh-to-mesh or mesh-to-non-mesh traffic needs gateway config, federation, or both. Workable but not trivial.

For most in-cluster, in-trust-boundary traffic in 2024, this is what I default to. The operational simplicity wins.

Application-terminated mTLS, when it matters

There are cases where the mesh approach isn’t enough. The main ones:

  1. Crossing a trust boundary. Mesh mTLS terminates at the sidecar. If your application needs to know cryptographically that the calling workload is who it claims to be — not just that the mesh said so — you need application mTLS.
  2. No mesh available. Bare-metal services, VM deployments, edge nodes. The mesh isn’t there to terminate mTLS for you.
  3. External clients. Partners, third-party integrations, or external services calling your API. The mesh’s notion of identity (spiffe://cluster.local/ns/foo/sa/bar) doesn’t extend to them.
  4. Compliance requirements. Some regulators want application-level cryptographic identity, not infrastructure-level.

A Go server doing mTLS directly:

package main

import (
    "crypto/tls"
    "crypto/x509"
    "log"
    "net/http"
    "os"
)

func main() {
    caPEM, err := os.ReadFile("/etc/certs/ca.crt")
    if err != nil {
        log.Fatal(err)
    }
    caPool := x509.NewCertPool()
    if !caPool.AppendCertsFromPEM(caPEM) {
        log.Fatal("bad ca")
    }

    serverCert, err := tls.LoadX509KeyPair("/etc/certs/server.crt", "/etc/certs/server.key")
    if err != nil {
        log.Fatal(err)
    }

    mux := http.NewServeMux()
    mux.HandleFunc("GET /v1/orders/{id}", func(w http.ResponseWriter, r *http.Request) {
        peer := peerIdentity(r.TLS)
        log.Printf("request from %s", peer)
        // ... handle request
        w.WriteHeader(http.StatusOK)
    })

    server := &http.Server{
        Addr:    ":8443",
        Handler: mux,
        TLSConfig: &tls.Config{
            Certificates: []tls.Certificate{serverCert},
            ClientCAs:    caPool,
            ClientAuth:   tls.RequireAndVerifyClientCert,
            MinVersion:   tls.VersionTLS13,
        },
    }
    log.Fatal(server.ListenAndServeTLS("", ""))
}

func peerIdentity(state *tls.ConnectionState) string {
    if state == nil || len(state.PeerCertificates) == 0 {
        return "unknown"
    }
    cert := state.PeerCertificates[0]
    // SPIFFE URI in SAN, when using SPIRE
    for _, uri := range cert.URIs {
        if uri.Scheme == "spiffe" {
            return uri.String()
        }
    }
    return cert.Subject.CommonName
}

tls.RequireAndVerifyClientCert is the mode that makes this mTLS rather than TLS. Anything else, and you’ve made server-auth TLS with optional client certs, which is not what you want for a trust boundary.

tls.VersionTLS13 should be your default minimum in 2024. TLS 1.2 is still common but TLS 1.3 has better handshake performance, removes a swath of vulnerable cipher options, and is universally supported by anything you’d be talking to. Drop 1.2 unless you have a known consumer that needs it.

The client side, just for completeness:

clientCert, _ := tls.LoadX509KeyPair("/etc/certs/client.crt", "/etc/certs/client.key")
client := &http.Client{
    Transport: &http.Transport{
        TLSClientConfig: &tls.Config{
            Certificates: []tls.Certificate{clientCert},
            RootCAs:      caPool,
            MinVersion:   tls.VersionTLS13,
        },
    },
}

SPIFFE, the identity framework

If you’re going down the application mTLS path, look at SPIFFE and SPIRE seriously. SPIFFE defines a portable identity format (spiffe://trust-domain/path) that lives in the cert’s SAN. SPIRE is the runtime that issues and rotates these certs to workloads.

The advantage over hand-rolling: SPIRE handles cert rotation (typically every hour), attestation of workloads (so a workload can only get a cert if it proves it’s running on a specific node), and integration with Kubernetes service accounts, AWS IAM, Azure managed identities.

I won’t dive deeper here — SPIFFE deserves its own post — but if you’re terminating mTLS in your application and you’re rolling your own cert distribution with cert-manager and Secrets, you’re working harder than you need to.

How they compose

The pattern I see in mature 2024 deployments:

  • Inside the cluster, inside one trust boundary → mesh mTLS, no application work.
  • Inside the cluster, crossing trust boundaries (e.g. between business units, between security zones) → mesh mTLS plus application mTLS layered, with the application doing identity checks against SPIFFE IDs.
  • Cluster boundary (to other clusters, other clouds) → application mTLS or gateway-mediated mTLS at the edge.
  • External (partners, third parties) → always application mTLS, with a separate CA from your internal identities.

This split lets the mesh do what it’s good at (the easy 80%) while application mTLS does what it’s good at (the high-trust boundaries).

Common Pitfalls

  • tls.VerifyClientCertIfGiven instead of tls.RequireAndVerifyClientCert. The first accepts connections without a client cert. That’s not mTLS, that’s optional TLS. Use the second.
  • Forgetting MinVersion. Without it, Go defaults to TLS 1.0 minimum on the server. Set TLS 1.3 explicitly.
  • Long-lived certs. A cert that lives a year is a cert you’ll forget to rotate. Aim for hours or days, automated. SPIRE defaults to 1 hour.
  • Trusting X-Forwarded-Client-Cert headers from outside the mesh. This is a header the mesh sidecar sets when forwarding mTLS state to the application. If your service can be reached from anywhere except the sidecar, the header is forgeable. Lock down sidecar-only ingress.
  • Mixing identity models. Don’t have some services on SPIFFE IDs, others on CN-based identity, others on shared bearer tokens. Pick one identity model for the trust boundary and stick to it.
  • No revocation plan. What happens when a workload’s key is compromised? If your answer is “wait for the cert to expire”, and the cert lives 24 hours, that’s your worst-case exposure. Short cert lifetimes are revocation.

Wrapping Up

The mesh-or-application question for mTLS isn’t either/or — in 2024, the right answer for most production systems is both, applied to different scopes. Mesh mTLS for the easy bulk of in-cluster traffic; application mTLS for the trust boundaries that matter.

The trap to avoid is making this decision once at architecture time and then living with it everywhere. Different services have different trust requirements. The orders service that holds payment data needs application mTLS for a few specific callers; the recommendations service probably doesn’t. Get comfortable with the asymmetry and design for it.

What I’d encourage next, if you’ve got mesh mTLS in place and you’re thinking about the application layer: pilot SPIRE on one service that genuinely crosses a trust boundary. Get a feel for the operational shape — cert rotation, attestation, debugging when it goes wrong — before you commit to it across the fleet. The pattern scales, but only if you’ve internalized how the pieces fit on a small, real scope first.