Zero Trust Architectures for AI Services, A Step by Step Setup
TL;DR — Zero Trust for AI services means every component (model server, vector DB, gateway, agent) authenticates with a cryptographic identity, every request carries that identity, and every authorization is a policy decision on the path. SPIRE 1.10 issues the identities, Envoy enforces mTLS at the data plane, OPA 1.0 makes the decisions.
The marketing version of Zero Trust is “never trust, always verify.” The operational version is “every byte that moves between two processes inside your VPC is signed, the signature is verified against a policy that was decided five microseconds ago, and you can prove to an auditor that this happened.” Most AI shops aren’t there. They have a public-facing API gateway with auth, then everything behind it talks plaintext to everything else because “it’s all internal.”
That model is dead. Modern AI services have too many components (model servers, vector stores, fine-tuning pipelines, agent orchestrators, RAG retrievers, evaluation harnesses) and too many of those components have legitimate reasons to talk to sensitive data. The blast radius of any single compromised process is enormous. Zero Trust is the architecture that contains that blast radius.
This tutorial walks through a working Zero Trust setup for an AI inference service. We’ll wire SPIRE for workload identity, Envoy for mTLS at the data plane, and OPA 1.0 for per-request authorization. I assume you’ve read the prompt injection defenses post because the policy decisions here are how you actually enforce the controls described there.
1. The Architecture We’re Building
Five components, each with a workload identity, mTLS between every pair, and a policy gate on every privileged call.
+---------------------+
| API Gateway |
| (Envoy + ext_authz) |
+----------+----------+
| mTLS, SPIFFE ID
v
+-----------------------------------+
| Agent Orchestrator |
+--+---------+----------+-----------+
| | |
mTLS | | mTLS | mTLS
v v v
+----------+ +--------+ +---------+
| Model | | Vector | | Tool |
| Server | | Store | | Runtime |
+----------+ +--------+ +---------+
^ ^ ^
| | |
+---------+----------+
|
+------+------+
| SPIRE |
| Server |
+-------------+
^
|
+------+------+
| OPA |
| (decision) |
+-------------+
Nothing talks plaintext. Nothing trusts a header. Every call is mTLS-authenticated to a SPIFFE ID, then authorized by OPA. If either step fails, the connection dies.
2. Bootstrapping Identities with SPIRE
SPIRE is the SPIFFE reference implementation. SPIRE 1.10 (released August 2025) is the version we’ll use. Install the server and agent via the official Helm chart.
helm repo add spiffe https://spiffe.github.io/helm-charts-hardened/
helm install spire-crds spiffe/spire-crds \
--namespace spire-server --create-namespace
helm install spire spiffe/spire \
--namespace spire-server \
--set global.spire.trustDomain=ai.internal \
--set spire-server.controllerManager.enabled=true \
--version 0.24.0
Once SPIRE is up, register workloads. Each workload gets a SPIFFE ID derived from Kubernetes attributes (service account, namespace, pod selector).
# ClusterSPIFFEID for the model server
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
name: model-server
spec:
spiffeIDTemplate: "spiffe://ai.internal/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
podSelector:
matchLabels:
app: model-server
workloadSelectorTemplates:
- "k8s:ns:{{ .PodMeta.Namespace }}"
- "k8s:sa:{{ .PodSpec.ServiceAccountName }}"
A pod with service account model-server in namespace inference will receive the identity spiffe://ai.internal/ns/inference/sa/model-server. That identity is delivered as a short-lived X.509 SVID through the SPIRE workload API.
2.1 Fetching SVIDs from Python
The model server fetches its SVID via the SPIFFE Workload API socket:
# requires py-spiffe>=2.0
from pyspiffe.workloadapi import default_workload_api_client
client = default_workload_api_client()
svid = client.fetch_x509_svid()
print("My identity:", svid.spiffe_id)
print("Cert expires:", svid.cert_chain[0].not_valid_after)
# Use these in your TLS server
cert_chain_pem = svid.cert_chain_pem
private_key_pem = svid.private_key_pem
SVIDs rotate every hour by default. Your TLS listener must refresh the certificate when the workload API streams an update. The pyspiffe library handles streaming for you.
3. mTLS Between Every Service
Application-level TLS termination is fragile. We’ll use Envoy as a sidecar to handle mTLS, so application code talks plain HTTP to localhost and Envoy handles the cryptography.
# envoy sidecar config (excerpt)
static_resources:
listeners:
- address: { socket_address: { address: 0.0.0.0, port_value: 8443 } }
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificate_sds_secret_configs:
- sds_config:
api_config_source:
api_type: GRPC
grpc_services:
envoy_grpc:
cluster_name: spire_agent
validation_context_sds_secret_config:
sds_config:
api_config_source:
api_type: GRPC
grpc_services:
envoy_grpc:
cluster_name: spire_agent
require_client_certificate: true
Envoy consumes SVIDs over the SDS API directly from the SPIRE agent. No certificate files on disk. Rotation is automatic.
3.1 Mutual verification rules
require_client_certificate: true means inbound connections without a client SVID die at the TLS handshake. Outbound, configure the upstream cluster to require a specific SPIFFE ID:
clusters:
- name: vector_store
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
common_tls_context:
validation_context_sds_secret_config:
name: "spiffe://ai.internal/ns/storage/sa/vector-store"
If something with a different SPIFFE ID is listening on that address, the connection fails. This kills DNS hijacking and ARP poisoning as a class of attacks.
4. Policy Decisions with OPA 1.0
OPA 1.0 (January 2025) shipped the new rego.v1 syntax as default and dropped a pile of deprecations. The decision model is the same: your request data goes in, a policy says allow or deny.
Envoy calls OPA via the External Authorization filter on every request:
http_filters:
- name: envoy.filters.http.ext_authz
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
grpc_service:
envoy_grpc:
cluster_name: opa
transport_api_version: V3
failure_mode_allow: false
The OPA policy:
package envoy.authz
import rego.v1
default allow := {"allowed": false}
allow := {"allowed": true} if {
input.attributes.source.principal == valid_caller
request_allowed
}
# extract SPIFFE ID from peer certificate
valid_caller := principal if {
principal := input.attributes.source.principal
startswith(principal, "spiffe://ai.internal/")
}
request_allowed if {
input.attributes.request.http.method == "POST"
input.attributes.request.http.path == "/v1/generate"
input.attributes.source.principal in data.allowed_callers.generate
}
data.allowed_callers.generate is a JSON document loaded from an OPA bundle. Editing the bundle reconfigures who can call the model server without redeploying anything.
4.1 Decision logs and audit
Turn on OPA decision logging and ship the logs to your SIEM. Every allow/deny gets a structured record with the caller SPIFFE ID, the request path, and the policy that fired. This is the audit trail that satisfies your compliance team.
# opa config
decision_logs:
console: false
service: siem
services:
siem:
url: https://siem.internal/opa-logs
credentials:
bearer:
token_path: /var/run/secrets/siem-token
5. End-to-End Request Flow
When the agent orchestrator calls the model server, the flow is:
agent -> envoy(agent) -> mTLS -> envoy(model) -> ext_authz -> OPA -> allow
|
v
model server process
Step by step:
- Agent process makes a plain HTTP POST to
localhost:8443/v1/generate. - Agent’s Envoy sidecar opens an mTLS connection to the model server’s Envoy, presenting the agent’s SVID.
- Model server’s Envoy verifies the peer certificate, extracts the SPIFFE ID, and calls OPA.
- OPA checks the policy, returns allow with metadata.
- Envoy forwards the request to the model server process on localhost.
- Response flows back through the same path.
Application code on either end has no idea any of this is happening. That’s the point.
5.1 Forwarding identity into the application
When the model server needs to know who called it (for logging or downstream authorization), Envoy injects the SPIFFE ID into a request header:
http_filters:
- name: envoy.filters.http.ext_authz
typed_config:
# ...
include_peer_certificate: true
- name: envoy.filters.http.lua
typed_config:
inline_code: |
function envoy_on_request(handle)
local peer = handle:streamInfo():downstreamSslConnection():uriSanPeerCertificate()
handle:headers():add("x-forwarded-spiffe-id", peer[1])
end
Now the model server sees x-forwarded-spiffe-id: spiffe://ai.internal/ns/agents/sa/orchestrator on every request and can log it or use it for further authorization.
6. Common Pitfalls
Four pitfalls I keep seeing.
6.1 Trusting the SPIFFE ID header from outside
If your gateway accepts an x-forwarded-spiffe-id header from the public internet because some internal service set it, you have a trivial impersonation bug. Strip that header at the perimeter and only let Envoy set it after a successful mTLS handshake.
6.2 Long-lived SVIDs
Some teams crank the SVID TTL up to 24 hours to “reduce noise.” Don’t. Short TTLs are a feature, not a bug. If an SVID is exfiltrated, you want it useless within minutes, not hours. Default to one hour, drop to 15 minutes for sensitive workloads.
6.3 OPA bundle staleness
OPA caches policy bundles. If your bundle service goes down or returns stale data, OPA keeps using the old policy. This is by design (fail-static is usually safer than fail-open) but it bites you when you push a deny rule and it doesn’t propagate. Monitor bundle freshness and alert if any agent is more than 5 minutes stale.
6.4 Forgetting failure_mode_allow: false
Default Envoy ext_authz configs sometimes set failure_mode_allow: true so the system stays up when OPA is unreachable. For a security-critical decision, this is the wrong default. Fail closed. Yes, that means OPA outages take your service down. Decide that’s the price of meaningful authorization.
7. Troubleshooting
When things break, three failure modes account for most of the pain.
7.1 TLS handshake errors after a node restart
A node restart can leave a pod with a cached SVID that’s been revoked. The fix is to make the workload reload its SVID when the Envoy hot restart fires. Watch for SSL_ERROR_BAD_CERTIFICATE in client logs and verify the SVID timestamp matches the workload API stream.
7.2 OPA decisions exceeding 10ms
If your policy evaluation is slow, it’s almost always because you’re doing N+1 lookups against a big data document. Index your data with partial evaluation hints, or split the bundle into hot and cold paths. The 99th percentile decision time should stay under 1ms for hot paths.
7.3 Mysterious 403s after a deploy
You deployed a new service, registered it with SPIRE, and traffic still gets rejected. Nine times out of ten, the new service’s SPIFFE ID isn’t in data.allowed_callers. The other one time, the controller manager hasn’t reconciled the new ClusterSPIFFEID yet. Check both before paging anyone.
8. Wrapping Up
Zero Trust isn’t a product you buy. It’s an architectural discipline you commit to. The combination of SPIRE for identity, Envoy for mTLS, and OPA for authorization is the open-source baseline that gets you 90% of the way there in 2025. The remaining 10% is operational: log everything, alert on bundle staleness, rotate SVIDs aggressively, and don’t accept “internal” as a trust boundary.
The pattern in this post scales from a single inference cluster to a multi-region multi-cloud AI platform. SPIRE federates across trust domains, Envoy speaks the same protocol everywhere, and OPA policies are portable. Start with one critical path (your model server) and expand outward.
For more reading, the SPIFFE specification and the OPA 1.0 migration guide are the canonical references. My next post in this series, DevSecOps in AI ML pipelines, covers how to enforce policies like these from CI all the way to runtime.