Securing Internal Microservices with JWT and SPIFFE in 2025

Securing Internal Microservices with JWT and SPIFFE in 2025

July 14, 2025 · 8 min read · by Muhammad Amal programming

TL;DR — Static service tokens are how most internal API breaches escalate. SPIFFE-issued X.509 SVIDs give every workload a cryptographically attested identity, and short-lived JWT-SVIDs let you carry that identity through L7 paths where mTLS doesn’t reach. The combination, wired carefully, gives you zero-trust internal auth that survives audit.

Every team I’ve consulted for in the last three years has the same internal-auth story: a shared secret in Vault, rotated never, scoped poorly, and trusted by every service because the alternative is more work. When the post-mortem shows up, the kill chain always includes that token. It’s the cliche of the decade.

SPIFFE and SPIRE solve the bootstrapping problem for workload identity. You stop asking “does this caller know the secret?” and start asking “is this caller actually the orders service running in the prod cluster?” The answer is signed by SPIRE and attested by the platform (kubelet, AWS metadata, GCP metadata, what have you). It’s the same model TLS gives you for browsers, applied to backends.

I’ll walk through a Go service that authenticates inbound calls with mTLS using SPIFFE SVIDs, and that also accepts JWT-SVIDs for paths where mTLS isn’t possible (Lambda, third-party webhooks, browser sessions proxied through a gateway). This builds on the Connect-Go tutorial, but the patterns transfer to vanilla gRPC and HTTP just as well.

1. The SPIFFE Model in 90 Seconds

SPIFFE defines a name format and a way to express identity. SPIRE is the reference implementation that issues credentials.

A SPIFFE ID looks like spiffe://acme.internal/ns/prod/sa/orders. It encodes the trust domain (acme.internal), the namespace (prod), and the workload account (orders). The format is opinionated for a reason: it forces you to think about identity as something the platform attests, not something the app claims.

Each workload gets two credentials:

X.509-SVID: a short-lived (1h default) certificate where the SPIFFE ID is in the SAN URI. Used for mTLS.
JWT-SVID: a JWT signed by the trust domain’s key, with the SPIFFE ID as the sub claim. Used where TLS can’t reach.

SPIRE delivers both via the SPIFFE Workload API, a local UDS socket. No tokens on disk, no environment variables, no rotation logic to write yourself.

+----------+   attestation    +-------+   issuance   +----------+
| kubelet  | ---------------> | SPIRE | -----------> | workload |
+----------+                  +-------+              +----------+
                                                       |  via UDS
                                                       v
                                              X.509 + JWT SVID

2. Running SPIRE on Kubernetes

For this tutorial I’m using the SPIRE Helm charts (v0.21, July 2025). The actual install is a few helm upgrade commands and is well-documented at spiffe.io. The pieces you end up with:

spire-server — a StatefulSet that owns the trust domain CA and the registration database
spire-agent — a DaemonSet that runs on every node, attests pods via the kubelet, and exposes the Workload API socket
A ClusterSPIFFEID custom resource per workload, mapping pod selectors to SPIFFE IDs

The minimal registration for our orders service:

apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: orders
spec:
  spiffeIDTemplate: spiffe://acme.internal/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}
  podSelector:
    matchLabels:
      app: orders
  workloadSelectorTemplates:
    - "k8s:ns:prod"
    - "k8s:sa:orders"

After this is applied, any pod with app=orders in the prod namespace gets the SVID for spiffe://acme.internal/ns/prod/sa/orders. No app changes required.

3. Wiring the Workload API in Go

The github.com/spiffe/go-spiffe/v2 library does the heavy lifting. Here’s a server that uses an X.509 SVID for mTLS:

package main

import (
	"context"
	"crypto/tls"
	"errors"
	"log/slog"
	"net/http"
	"time"

	"github.com/spiffe/go-spiffe/v2/spiffeid"
	"github.com/spiffe/go-spiffe/v2/spiffetls/tlsconfig"
	"github.com/spiffe/go-spiffe/v2/workloadapi"
)

func main() {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()

	source, err := workloadapi.NewX509Source(ctx)
	if err != nil {
		slog.Error("workload api", "err", err)
		return
	}
	defer source.Close()

	td := spiffeid.RequireTrustDomainFromString("acme.internal")
	// Authorize: only callers from our trust domain that match a path prefix.
	authz := tlsconfig.AuthorizeAny()
	tlsCfg := tlsconfig.MTLSServerConfig(source, source, authz)
	_ = td

	srv := &http.Server{
		Addr:      ":8443",
		TLSConfig: tlsCfg,
		Handler:   http.HandlerFunc(handle),
		ReadHeaderTimeout: 5 * time.Second,
	}
	slog.Info("listening", "addr", srv.Addr)
	if err := srv.ListenAndServeTLS("", ""); err != nil && !errors.Is(err, http.ErrServerClosed) {
		slog.Error("serve", "err", err)
	}
}

func handle(w http.ResponseWriter, r *http.Request) {
	id, err := callerSpiffeID(r.TLS)
	if err != nil {
		http.Error(w, "no identity", http.StatusUnauthorized)
		return
	}
	if !allowed(id) {
		http.Error(w, "forbidden", http.StatusForbidden)
		return
	}
	w.WriteHeader(http.StatusOK)
	_, _ = w.Write([]byte("hello " + id.String()))
}

func callerSpiffeID(state *tls.ConnectionState) (spiffeid.ID, error) {
	if state == nil || len(state.PeerCertificates) == 0 {
		return spiffeid.ID{}, errors.New("no peer cert")
	}
	uris := state.PeerCertificates[0].URIs
	if len(uris) == 0 {
		return spiffeid.ID{}, errors.New("no spiffe URI")
	}
	return spiffeid.FromURI(uris[0])
}

func allowed(id spiffeid.ID) bool {
	return id.TrustDomain().String() == "acme.internal" &&
		(id.Path() == "/ns/prod/sa/checkout" || id.Path() == "/ns/prod/sa/reports")
}

The client side is the same shape:

src, err := workloadapi.NewX509Source(ctx)
if err != nil { /* ... */ }
defer src.Close()

serverID := spiffeid.RequireFromString("spiffe://acme.internal/ns/prod/sa/orders")
tlsCfg := tlsconfig.MTLSClientConfig(src, src, tlsconfig.AuthorizeID(serverID))

client := &http.Client{Transport: &http.Transport{TLSClientConfig: tlsCfg}}
resp, err := client.Get("https://orders.prod.svc:8443/healthz")

Note: tlsconfig.AuthorizeID pins to one exact SPIFFE ID. AuthorizeMemberOf accepts any SVID from a trust domain. Use the former for the most important calls.

4. JWT-SVIDs for L7 Paths

mTLS doesn’t always reach. An AWS Lambda calling your API goes through API Gateway, which terminates TLS. A browser calling through Envoy Gateway can’t present a workload cert. For those, you fetch a JWT-SVID and put it in the Authorization header.

jwtSrc, err := workloadapi.NewJWTSource(ctx)
if err != nil { /* ... */ }
defer jwtSrc.Close()

svid, err := jwtSrc.FetchJWTSVID(ctx, jwtsvid.Params{
    Audience: "orders.prod",
})
if err != nil { /* ... */ }

req, _ := http.NewRequestWithContext(ctx, "GET", "https://orders.prod/api/foo", nil)
req.Header.Set("Authorization", "Bearer "+svid.Marshal())

The audience matters. SPIRE-issued JWT-SVIDs are bound to a single audience; replaying one against a different service fails validation. Give each service a distinct audience and you’ve cut the blast radius of a leaked token to that one service.

4.1 Validating on the Server

import (
	"github.com/spiffe/go-spiffe/v2/svid/jwtsvid"
	"github.com/spiffe/go-spiffe/v2/workloadapi"
)

func jwtMiddleware(jwtSrc *workloadapi.JWTSource, audience string) func(http.Handler) http.Handler {
	return func(next http.Handler) http.Handler {
		return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			authz := r.Header.Get("Authorization")
			if len(authz) < 8 || authz[:7] != "Bearer " {
				http.Error(w, "missing bearer", http.StatusUnauthorized)
				return
			}
			tok := authz[7:]
			svid, err := jwtsvid.ParseAndValidate(tok, jwtSrc, []string{audience})
			if err != nil {
				http.Error(w, "invalid token", http.StatusUnauthorized)
				return
			}
			ctx := context.WithValue(r.Context(), ctxKey{}, svid.ID)
			next.ServeHTTP(w, r.WithContext(ctx))
		})
	}
}

ParseAndValidate checks signature, expiration (default 5 min for SPIRE-issued JWT-SVIDs), and audience. The trust bundle is fetched from the Workload API and rotated automatically.

5. Mixing Custom Claims with golang-jwt

SPIFFE JWT-SVIDs are minimal by design: iss, sub, aud, exp. If you need extra claims (tenant id, request id, scope), you have two options:

Issue a second short-lived JWT signed by the calling service’s SVID-derived key, and bundle that alongside the JWT-SVID.
Use the JWT-SVID for identity and pass extra context in separate headers (signed if you must).

Most teams pick option 2 because it keeps the SPIFFE-issued token unmodified and centralizes claims management in your app. Here’s the helper for option 1, for cases where you need everything in one self-contained token (e.g., long event-driven flows):

import (
    "time"

    "github.com/golang-jwt/jwt/v5"
)

type appClaims struct {
    TenantID string `json:"tenant_id"`
    Scope    string `json:"scope"`
    jwt.RegisteredClaims
}

func signAppJWT(key []byte, sub, aud, tenant, scope string) (string, error) {
    claims := appClaims{
        TenantID: tenant,
        Scope:    scope,
        RegisteredClaims: jwt.RegisteredClaims{
            Issuer:    "orders",
            Subject:   sub,
            Audience:  jwt.ClaimStrings{aud},
            ExpiresAt: jwt.NewNumericDate(time.Now().Add(5 * time.Minute)),
            IssuedAt:  jwt.NewNumericDate(time.Now()),
        },
    }
    tok := jwt.NewWithClaims(jwt.SigningMethodHS256, claims)
    return tok.SignedString(key)
}

Don’t sign these with a long-lived secret. Either derive a per-pod ephemeral key on startup, or use the X.509-SVID private key (with care about non-repudiation semantics).

6. Common Pitfalls

6.1 Logging the Whole Authorization Header

You will do this once. The token ends up in your log aggregation, which has different access controls than your secret store. Always log only the SPIFFE ID, never the raw token.

6.2 Treating SPIFFE IDs as Opaque

The path structure (/ns/prod/sa/orders) is meaningful. Build a tiny matcher library that parses it once and exposes typed accessors (Namespace(), ServiceAccount()). Avoid string substring matches.

6.3 Forgetting Audience Validation

jwtsvid.ParseAndValidate requires you to pass the expected audience. Pass it; don’t accept any. I’ve seen services that pass nil and accept any SVID for the trust domain. That collapses the per-service scoping SPIFFE gives you.

6.4 Trust Domain Drift

Dev, staging, and prod should have different trust domains (acme.dev, acme.staging, acme.internal). Otherwise a dev workload could in principle present a token a prod service would accept. The Helm chart values handle this if you set them at install time and never change them.

6.5 Mixing SPIRE with Cloud IAM Without a Bridge

If half your auth is SPIFFE-issued and half is AWS IAM roles, requests crossing the boundary lose context. SPIRE has integrations for AWS, GCP, and Azure that bridge their attestation into a SPIFFE ID; use those rather than letting the boundary leak.

7. Troubleshooting

7.1 `failed to fetch X509-SVID, no identity issued`

The pod isn’t matched by any ClusterSPIFFEID. Check labels, namespace, and the spec.workloadSelectorTemplates. The SPIRE agent logs (kubectl logs -n spire spire-agent-...) tell you what attestation selectors it observed.

7.2 `tls: failed to verify certificate, unknown authority`

The client and server are in different trust domains, or the trust bundle hasn’t rotated. The Workload API auto-rotates bundles every few minutes; if the discrepancy persists, restart the agent pod on the offended node.

7.3 JWT Validation Fails Right After Rotation

JWT-SVIDs are signed with keys that rotate. If you cache the bundle for too long, validation fails after rotation. Use workloadapi.JWTSource rather than fetching the bundle yourself; it refreshes automatically.

8. Wrapping Up

Replacing static service tokens with SPIFFE-issued credentials is the single highest-impact change you can make to internal API security in 2025. It removes an entire class of incidents from your post-mortems. The investment is real — SPIRE is one more system to run — but the maintenance is mostly Helm-chart values, not bespoke code.

The canonical reference is the SPIFFE specification and the SPIRE docs. Read both before you commit to a trust-domain layout, because changing it later is painful. With auth solved, the next big lever for reliability is how data moves through your services in real time, which is where streaming gRPC comes in.

1. The SPIFFE Model in 90 Seconds

2. Running SPIRE on Kubernetes

3. Wiring the Workload API in Go

4. JWT-SVIDs for L7 Paths

4.1 Validating on the Server

5. Mixing Custom Claims with golang-jwt

6. Common Pitfalls

6.1 Logging the Whole Authorization Header

6.2 Treating SPIFFE IDs as Opaque

6.3 Forgetting Audience Validation

6.4 Trust Domain Drift

6.5 Mixing SPIRE with Cloud IAM Without a Bridge

7. Troubleshooting

7.1 failed to fetch X509-SVID, no identity issued

7.2 tls: failed to verify certificate, unknown authority

7.3 JWT Validation Fails Right After Rotation

8. Wrapping Up

Related posts

SPIFFE and SPIRE for Service Identity, A Hands On Tutorial

mTLS, Service Mesh or Application Layer in 2024

Securing Go Microservices with JWT, Patterns That Hold Up

Backend API Security in 2022, The Threat Model

Supply Chain Security for AI Models, Signing and SBOM

Content Moderation for LLMs with Llama Guard 3.2

Policy as Code with OPA 1.0, A Production Walkthrough

Securing RAG Systems Against Data Exfiltration in 2025

Let’s Start a Project

7.1 `failed to fetch X509-SVID, no identity issued`

7.2 `tls: failed to verify certificate, unknown authority`