Zero Trust Architectures for AI Services, A Step by Step Setup

September 3, 2025 · 8 min read · by Muhammad Amal programming

TL;DR — Zero Trust for AI services means every component (model server, vector DB, gateway, agent) authenticates with a cryptographic identity, every request carries that identity, and every authorization is a policy decision on the path. SPIRE 1.10 issues the identities, Envoy enforces mTLS at the data plane, OPA 1.0 makes the decisions.

The marketing version of Zero Trust is “never trust, always verify.” The operational version is “every byte that moves between two processes inside your VPC is signed, the signature is verified against a policy that was decided five microseconds ago, and you can prove to an auditor that this happened.” Most AI shops aren’t there. They have a public-facing API gateway with auth, then everything behind it talks plaintext to everything else because “it’s all internal.”

That model is dead. Modern AI services have too many components (model servers, vector stores, fine-tuning pipelines, agent orchestrators, RAG retrievers, evaluation harnesses) and too many of those components have legitimate reasons to talk to sensitive data. The blast radius of any single compromised process is enormous. Zero Trust is the architecture that contains that blast radius.

This tutorial walks through a working Zero Trust setup for an AI inference service. We’ll wire SPIRE for workload identity, Envoy for mTLS at the data plane, and OPA 1.0 for per-request authorization. I assume you’ve read the prompt injection defenses post because the policy decisions here are how you actually enforce the controls described there.

1. The Architecture We’re Building

Five components, each with a workload identity, mTLS between every pair, and a policy gate on every privileged call.

                  +---------------------+
                  |   API Gateway       |
                  | (Envoy + ext_authz) |
                  +----------+----------+
                             | mTLS, SPIFFE ID
                             v
            +-----------------------------------+
            |        Agent Orchestrator         |
            +--+---------+----------+-----------+
               |         |          |
        mTLS   |         | mTLS     | mTLS
               v         v          v
        +----------+ +--------+ +---------+
        |  Model   | | Vector | |  Tool   |
        |  Server  | | Store  | | Runtime |
        +----------+ +--------+ +---------+
               ^         ^          ^
               |         |          |
               +---------+----------+
                         |
                  +------+------+
                  |    SPIRE    |
                  |    Server   |
                  +-------------+
                         ^
                         |
                  +------+------+
                  |     OPA     |
                  | (decision)  |
                  +-------------+

Nothing talks plaintext. Nothing trusts a header. Every call is mTLS-authenticated to a SPIFFE ID, then authorized by OPA. If either step fails, the connection dies.

2. Bootstrapping Identities with SPIRE

SPIRE is the SPIFFE reference implementation. SPIRE 1.10 (released August 2025) is the version we’ll use. Install the server and agent via the official Helm chart.

helm repo add spiffe https://spiffe.github.io/helm-charts-hardened/
helm install spire-crds spiffe/spire-crds \
  --namespace spire-server --create-namespace

helm install spire spiffe/spire \
  --namespace spire-server \
  --set global.spire.trustDomain=ai.internal \
  --set spire-server.controllerManager.enabled=true \
  --version 0.24.0

Once SPIRE is up, register workloads. Each workload gets a SPIFFE ID derived from Kubernetes attributes (service account, namespace, pod selector).

# ClusterSPIFFEID for the model server
apiVersion: spire.spiffe.io/v1alpha1
kind: ClusterSPIFFEID
metadata:
  name: model-server
spec:
  spiffeIDTemplate: "spiffe://ai.internal/ns/{{ .PodMeta.Namespace }}/sa/{{ .PodSpec.ServiceAccountName }}"
  podSelector:
    matchLabels:
      app: model-server
  workloadSelectorTemplates:
    - "k8s:ns:{{ .PodMeta.Namespace }}"
    - "k8s:sa:{{ .PodSpec.ServiceAccountName }}"

A pod with service account model-server in namespace inference will receive the identity spiffe://ai.internal/ns/inference/sa/model-server. That identity is delivered as a short-lived X.509 SVID through the SPIRE workload API.

2.1 Fetching SVIDs from Python

The model server fetches its SVID via the SPIFFE Workload API socket:

# requires py-spiffe>=2.0
from pyspiffe.workloadapi import default_workload_api_client

client = default_workload_api_client()
svid = client.fetch_x509_svid()

print("My identity:", svid.spiffe_id)
print("Cert expires:", svid.cert_chain[0].not_valid_after)

# Use these in your TLS server
cert_chain_pem = svid.cert_chain_pem
private_key_pem = svid.private_key_pem

SVIDs rotate every hour by default. Your TLS listener must refresh the certificate when the workload API streams an update. The pyspiffe library handles streaming for you.

3. mTLS Between Every Service

Application-level TLS termination is fragile. We’ll use Envoy as a sidecar to handle mTLS, so application code talks plain HTTP to localhost and Envoy handles the cryptography.

# envoy sidecar config (excerpt)
static_resources:
  listeners:
  - address: { socket_address: { address: 0.0.0.0, port_value: 8443 } }
    filter_chains:
    - transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificate_sds_secret_configs:
            - sds_config:
                api_config_source:
                  api_type: GRPC
                  grpc_services:
                    envoy_grpc:
                      cluster_name: spire_agent
            validation_context_sds_secret_config:
              sds_config:
                api_config_source:
                  api_type: GRPC
                  grpc_services:
                    envoy_grpc:
                      cluster_name: spire_agent
          require_client_certificate: true

Envoy consumes SVIDs over the SDS API directly from the SPIRE agent. No certificate files on disk. Rotation is automatic.

3.1 Mutual verification rules

require_client_certificate: true means inbound connections without a client SVID die at the TLS handshake. Outbound, configure the upstream cluster to require a specific SPIFFE ID:

clusters:
- name: vector_store
  transport_socket:
    name: envoy.transport_sockets.tls
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
      common_tls_context:
        validation_context_sds_secret_config:
          name: "spiffe://ai.internal/ns/storage/sa/vector-store"

If something with a different SPIFFE ID is listening on that address, the connection fails. This kills DNS hijacking and ARP poisoning as a class of attacks.

4. Policy Decisions with OPA 1.0

OPA 1.0 (January 2025) shipped the new rego.v1 syntax as default and dropped a pile of deprecations. The decision model is the same: your request data goes in, a policy says allow or deny.

Envoy calls OPA via the External Authorization filter on every request:

http_filters:
- name: envoy.filters.http.ext_authz
  typed_config:
    "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
    grpc_service:
      envoy_grpc:
        cluster_name: opa
    transport_api_version: V3
    failure_mode_allow: false

The OPA policy:

package envoy.authz

import rego.v1

default allow := {"allowed": false}

allow := {"allowed": true} if {
    input.attributes.source.principal == valid_caller
    request_allowed
}

# extract SPIFFE ID from peer certificate
valid_caller := principal if {
    principal := input.attributes.source.principal
    startswith(principal, "spiffe://ai.internal/")
}

request_allowed if {
    input.attributes.request.http.method == "POST"
    input.attributes.request.http.path == "/v1/generate"
    input.attributes.source.principal in data.allowed_callers.generate
}

data.allowed_callers.generate is a JSON document loaded from an OPA bundle. Editing the bundle reconfigures who can call the model server without redeploying anything.

4.1 Decision logs and audit

Turn on OPA decision logging and ship the logs to your SIEM. Every allow/deny gets a structured record with the caller SPIFFE ID, the request path, and the policy that fired. This is the audit trail that satisfies your compliance team.

# opa config
decision_logs:
  console: false
  service: siem
services:
  siem:
    url: https://siem.internal/opa-logs
    credentials:
      bearer:
        token_path: /var/run/secrets/siem-token

5. End-to-End Request Flow

When the agent orchestrator calls the model server, the flow is:

agent -> envoy(agent) -> mTLS -> envoy(model) -> ext_authz -> OPA -> allow
                                          |
                                          v
                                     model server process

Step by step:

Agent process makes a plain HTTP POST to localhost:8443/v1/generate.
Agent’s Envoy sidecar opens an mTLS connection to the model server’s Envoy, presenting the agent’s SVID.
Model server’s Envoy verifies the peer certificate, extracts the SPIFFE ID, and calls OPA.
OPA checks the policy, returns allow with metadata.
Envoy forwards the request to the model server process on localhost.
Response flows back through the same path.

Application code on either end has no idea any of this is happening. That’s the point.

5.1 Forwarding identity into the application

When the model server needs to know who called it (for logging or downstream authorization), Envoy injects the SPIFFE ID into a request header:

http_filters:
- name: envoy.filters.http.ext_authz
  typed_config:
    # ...
    include_peer_certificate: true
- name: envoy.filters.http.lua
  typed_config:
    inline_code: |
      function envoy_on_request(handle)
        local peer = handle:streamInfo():downstreamSslConnection():uriSanPeerCertificate()
        handle:headers():add("x-forwarded-spiffe-id", peer[1])
      end

Now the model server sees x-forwarded-spiffe-id: spiffe://ai.internal/ns/agents/sa/orchestrator on every request and can log it or use it for further authorization.

6. Common Pitfalls

Four pitfalls I keep seeing.

6.1 Trusting the SPIFFE ID header from outside

If your gateway accepts an x-forwarded-spiffe-id header from the public internet because some internal service set it, you have a trivial impersonation bug. Strip that header at the perimeter and only let Envoy set it after a successful mTLS handshake.

6.2 Long-lived SVIDs

Some teams crank the SVID TTL up to 24 hours to “reduce noise.” Don’t. Short TTLs are a feature, not a bug. If an SVID is exfiltrated, you want it useless within minutes, not hours. Default to one hour, drop to 15 minutes for sensitive workloads.

6.3 OPA bundle staleness

OPA caches policy bundles. If your bundle service goes down or returns stale data, OPA keeps using the old policy. This is by design (fail-static is usually safer than fail-open) but it bites you when you push a deny rule and it doesn’t propagate. Monitor bundle freshness and alert if any agent is more than 5 minutes stale.

6.4 Forgetting `failure_mode_allow: false`

Default Envoy ext_authz configs sometimes set failure_mode_allow: true so the system stays up when OPA is unreachable. For a security-critical decision, this is the wrong default. Fail closed. Yes, that means OPA outages take your service down. Decide that’s the price of meaningful authorization.

7. Troubleshooting

When things break, three failure modes account for most of the pain.

7.1 TLS handshake errors after a node restart

A node restart can leave a pod with a cached SVID that’s been revoked. The fix is to make the workload reload its SVID when the Envoy hot restart fires. Watch for SSL_ERROR_BAD_CERTIFICATE in client logs and verify the SVID timestamp matches the workload API stream.

7.2 OPA decisions exceeding 10ms

If your policy evaluation is slow, it’s almost always because you’re doing N+1 lookups against a big data document. Index your data with partial evaluation hints, or split the bundle into hot and cold paths. The 99th percentile decision time should stay under 1ms for hot paths.

7.3 Mysterious 403s after a deploy

You deployed a new service, registered it with SPIRE, and traffic still gets rejected. Nine times out of ten, the new service’s SPIFFE ID isn’t in data.allowed_callers. The other one time, the controller manager hasn’t reconciled the new ClusterSPIFFEID yet. Check both before paging anyone.

8. Wrapping Up

Zero Trust isn’t a product you buy. It’s an architectural discipline you commit to. The combination of SPIRE for identity, Envoy for mTLS, and OPA for authorization is the open-source baseline that gets you 90% of the way there in 2025. The remaining 10% is operational: log everything, alert on bundle staleness, rotate SVIDs aggressively, and don’t accept “internal” as a trust boundary.

The pattern in this post scales from a single inference cluster to a multi-region multi-cloud AI platform. SPIRE federates across trust domains, Envoy speaks the same protocol everywhere, and OPA policies are portable. Start with one critical path (your model server) and expand outward.

For more reading, the SPIFFE specification and the OPA 1.0 migration guide are the canonical references. My next post in this series, DevSecOps in AI ML pipelines, covers how to enforce policies like these from CI all the way to runtime.

1. The Architecture We’re Building

2. Bootstrapping Identities with SPIRE

2.1 Fetching SVIDs from Python

3. mTLS Between Every Service

3.1 Mutual verification rules

4. Policy Decisions with OPA 1.0

4.1 Decision logs and audit

5. End-to-End Request Flow

5.1 Forwarding identity into the application

6. Common Pitfalls

6.1 Trusting the SPIFFE ID header from outside

6.2 Long-lived SVIDs

6.3 OPA bundle staleness

6.4 Forgetting failure_mode_allow: false

7. Troubleshooting

7.1 TLS handshake errors after a node restart

7.2 OPA decisions exceeding 10ms

7.3 Mysterious 403s after a deploy

8. Wrapping Up

Related posts

Supply Chain Security for AI Models, Signing and SBOM

Content Moderation for LLMs with Llama Guard 3.2

Policy as Code with OPA 1.0, A Production Walkthrough

SPIFFE and SPIRE for Service Identity, A Hands On Tutorial

Securing RAG Systems Against Data Exfiltration in 2025

DevSecOps in AI ML Pipelines, A Comprehensive Tutorial

Advanced Prompt Injection Defenses in 2025, A Practical Guide

Managing Secrets and Credentials in n8n for Enterprise

Let’s Start a Project

6.4 Forgetting `failure_mode_allow: false`