Policy as Code with OPA 1.0, A Production Walkthrough

September 17, 2025 · 9 min read · by Muhammad Amal programming

TL;DR — OPA 1.0 (January 2025) shipped rego.v1 as the default syntax, dropped deprecated rule forms, and stabilized the bundle and decision log APIs. Run it as a sidecar, push policies through a bundle service, log every decision, and use partial evaluation for hot paths. None of this is new but the version finally makes it production-comfortable.

Policy as code is one of those ideas that sounds great in slide decks and gets implemented badly in roughly half the orgs that try it. The most common antipatterns: hardcoded policies inside application code that nobody can update without a deploy, monolithic JSON config files that drift between environments, hand-rolled DSLs that nobody but the original author can read. OPA solves none of these by accident; you have to commit to the patterns.

OPA 1.0 is the version where the language churn finally stopped. rego.v1 is the default syntax now, the deprecated rule forms are gone, and the bundle and decision log protocols are stable. If you held off adopting OPA because the Rego dialect kept changing, 2025 is the year to revisit. This post is the production walkthrough, focused on what actually matters once you’re past the “hello world” stage.

I’m going to assume you’ve seen the SPIRE post because workload identity is the input that makes most of these policies useful.

1. The Mental Model

OPA is a decision engine. Your application gathers facts, sends them to OPA, and OPA returns a decision based on policy plus data.

   application                       OPA
   +-----------+   query     +----------------+
   |           |------------>|                |
   | (gathers  | input:{...} | rules (.rego)  |
   |  facts)   |             | + data (.json) |
   |           |<------------|                |
   +-----------+   decision  +----------------+
                    {...}

Three things go in: the rules (your policies, in Rego), the data (slowly-changing reference data like ACLs, roles, allow lists), and the input (the per-request facts). One thing comes out: a decision document, which can be a boolean or a structured object.

The trick to making this work well is keeping the input small, the rules pure, and the data refreshed out-of-band.

2. Step 1, Writing rego.v1 Policies

OPA 1.0 defaults to rego.v1. The main differences from old Rego: rules must use if and contains, no more implicit ORs across rule heads, stricter import requirements.

# authz.rego
package agent.authz

import rego.v1

default allow := false

# A user can read a document if they own it or are in a sharing list
allow if {
    input.action == "read"
    input.resource.type == "document"
    can_read
}

can_read if {
    input.resource.owner == input.user.id
}

can_read if {
    input.user.id in input.resource.shared_with
}

# Admins can do anything
allow if {
    "admin" in input.user.roles
}

Several distinct rules share the same head (allow, can_read). Each is its own conjunction; if any matches, the rule is true. This is the readable Rego idiom and rego.v1 enforces it.

2.1 Structured decisions

For anything beyond a simple boolean, return a structured decision:

package agent.authz

import rego.v1

decision := {
    "allow": allow,
    "reasons": reasons,
    "obligations": obligations,
}

reasons contains "owner" if input.resource.owner == input.user.id
reasons contains "shared" if input.user.id in input.resource.shared_with
reasons contains "admin_role" if "admin" in input.user.roles

obligations contains {"type": "audit"} if input.resource.sensitive
obligations contains {"type": "approval"} if input.resource.amount > 10000

The caller can use obligations to fire side effects (write audit records, request approvals) without OPA needing to know about your audit system.

3. Step 2, Testing Policies

Rego tests run with opa test. They’re fast and let you catch policy regressions before they ship.

# authz_test.rego
package agent.authz_test

import data.agent.authz

test_owner_can_read if {
    authz.allow with input as {
        "action": "read",
        "user": {"id": "alice", "roles": []},
        "resource": {"type": "document", "owner": "alice", "shared_with": []},
    }
}

test_stranger_cannot_read if {
    not authz.allow with input as {
        "action": "read",
        "user": {"id": "bob", "roles": []},
        "resource": {"type": "document", "owner": "alice", "shared_with": []},
    }
}

test_admin_can_read_anything if {
    authz.allow with input as {
        "action": "read",
        "user": {"id": "bob", "roles": ["admin"]},
        "resource": {"type": "document", "owner": "alice", "shared_with": []},
    }
}

Run with opa test -v ./policies/. Wire it into CI. Test coverage works too:

opa test --coverage --format=json ./policies/ | jq '.files | map_values(.coverage)'

4. Step 3, Bundles and the Distribution Layer

Hardcoding policies into OPA container images means redeploys for every policy change. The right pattern is bundles: signed tarballs of policies and data, served over HTTP, pulled by OPA on a schedule.

# OPA configuration
services:
  policy-bundle-service:
    url: https://bundles.internal/v1
    credentials:
      bearer:
        token: ${BUNDLE_TOKEN}

bundles:
  authz:
    service: policy-bundle-service
    resource: bundles/authz.tar.gz
    persist: true
    polling:
      min_delay_seconds: 30
      max_delay_seconds: 120
    signing:
      keyid: "policy-signer-2025"
      scope: "write"

OPA polls the bundle service, verifies the signature, and atomically swaps the policy. Application code never restarts.

4.1 Bundle signing

Sign bundles with a key dedicated to the policy team. OPA verifies before activating:

opa sign --signing-key=policy-key.pem \
         --signing-alg=RS256 \
         --bundle bundles/authz \
         --output-file-path bundles/.signatures.json
tar -czf bundles/authz.tar.gz -C bundles/authz .

Verification fails closed by default. A tampered bundle is rejected and OPA keeps using the previous good one.

5. Step 4, Decision Logs

You cannot run policy as code in production without decision logs. Every allow and deny needs to be auditable.

decision_logs:
  service: log-service
  reporting:
    min_delay_seconds: 5
    max_delay_seconds: 10
    upload_size_limit_bytes: 32768
  mask_decision: "system.log.mask"

The mask_decision is a Rego rule that scrubs sensitive fields from logs before shipping:

# mask.rego
package system.log

import rego.v1

mask contains "/input/user/email" if true
mask contains "/input/resource/contents" if true

These paths get redacted in the shipped log entries. Important for compliance, especially when policies see content like user emails or document text.

5.1 What to do with the logs

Ship to your SIEM, but also build dashboards on top. Useful queries:

Top denied policies by count. Either real attack patterns or a misconfigured app.
Decision latency p99. A policy that suddenly slows down is usually a data-set size problem.
Caller distribution per policy. A new caller is a deploy event; alert on first-seen.

6. Step 5, Partial Evaluation for Hot Paths

Some policies are evaluated millions of times a day with mostly static input. OPA’s partial evaluation precomputes the residual policy against known inputs, leaving only the variable parts to evaluate at request time.

opa eval --data policies/ \
         --input known.json \
         --partial \
         --target rego \
         'data.agent.authz.allow'

The output is a simplified Rego program that you can save as a new policy and deploy. The hot-path decisions become essentially constant-time.

For typical Kubernetes admission controller policies this gets you from ~5ms per decision to under 100µs. Wire it into your build: every time you push a new policy bundle, also publish a partially-evaluated specialization for each known caller class.

6.1 When not to use partial eval

If the input is genuinely varied (per-document, per-row), partial evaluation doesn’t help. Stick with the full evaluator and tune your data documents instead.

7. Step 6, Integrating from Applications

OPA sidecar pattern: your application talks to localhost:8181.

// Go client using the OPA REST API
type Client struct {
    base   string
    client *http.Client
}

func (c *Client) Allow(ctx context.Context, input any) (bool, error) {
    body, _ := json.Marshal(map[string]any{"input": input})
    req, _ := http.NewRequestWithContext(ctx, "POST",
        c.base+"/v1/data/agent/authz/allow", bytes.NewReader(body))
    req.Header.Set("Content-Type", "application/json")
    resp, err := c.client.Do(req)
    if err != nil { return false, err }
    defer resp.Body.Close()
    var out struct{ Result bool `json:"result"` }
    if err := json.NewDecoder(resp.Body).Decode(&out); err != nil {
        return false, err
    }
    return out.Result, nil
}

For latency-critical paths, embed OPA as a library instead. The Go SDK lets you compile and evaluate policies in-process:

import "github.com/open-policy-agent/opa/v1/rego"

r := rego.New(
    rego.Query("data.agent.authz.allow"),
    rego.Load([]string{"policies/"}, nil),
)
query, err := r.PrepareForEval(ctx)
// ...
rs, _ := query.Eval(ctx, rego.EvalInput(input))
allowed := rs[0].Expressions[0].Value.(bool)

In-process eval is 5-10x faster than HTTP for trivial policies. Tradeoff is that bundle updates require process restart unless you wire in dynamic loading.

8. Common Pitfalls

Four pitfalls I see consistently.

8.1 Putting per-user data in `data`

A common mistake: load every user’s permissions into the OPA data document. With 100K users this works fine. At 10M users the bundle is 5GB and bundle distribution falls over. Per-entity data belongs in a separate service that OPA queries via http.send or, better, that the application includes in input for the specific entity being evaluated.

8.2 Treating OPA as a write API

OPA is a decision engine. It doesn’t store state, doesn’t have transactions, doesn’t survive restarts with anything except what’s in its bundle. People sometimes try to use the OPA data API as a write endpoint for per-request updates. That works in development and falls over in production because every replica sees a different state.

8.3 Forgetting that `default allow := false` is per-rule

If you write default allow := false and then have a second allow rule for a different scenario, the default applies. Many engineers expect the default to disappear when any other rule fires. It doesn’t; multiple rule heads OR together, and the default fires only when none match.

8.4 No version pin on the OPA binary

OPA’s behavior is stable but performance characteristics aren’t. A new OPA version can change evaluation timing by 2x in either direction. Pin the binary version, upgrade deliberately, run benchmarks before rolling.

9. Troubleshooting

Three failure modes I see.

9.1 Decisions returning the default after a deploy

Your bundle deployed but decisions still come back as the default value. Almost always one of: bundle signature verification failed (check opa_bundle_loaded metric), policy compilation failed (check OPA logs for “compile error”), or the new policy is in the wrong package (rules in package agent.authz aren’t found by a query against data.agent.authorization).

9.2 Latency spikes on specific policies

A policy that suddenly hits 100ms when it used to take 1ms is usually a data-set growth issue. Run opa eval --profile-limit 10 to see which operations are expensive. Common cause: comprehensions over large data documents that grew without anyone noticing.

9.3 Decision logs not arriving

If decision logs stop showing up in your SIEM, the OPA-side cause is usually buffering: OPA holds logs in memory until upload, and if the upload service is unreachable, logs stay buffered until OPA’s circular buffer overwrites them. Set reporting.upload_size_limit_bytes low and monitor decision_logs_dropped_total.

10. Wrapping Up

OPA 1.0 is the version where you can adopt policy as code without signing up for a treadmill of language migrations. The shape of a production deployment is well-established now: sidecar OPA, signed bundles, decision logs to SIEM, in-process evaluation for the hot paths, partial evaluation where the inputs allow it.

The harder question is organizational. Who writes the policies? Where do they live? How do they get reviewed? The patterns that hold up: a policy-as-code repo separate from application code, owned by a security or platform team, with policy changes going through pull requests like any other change. Application teams contribute policies via PRs; the platform team owns the bundle distribution infrastructure.

For more reading, the OPA 1.0 release notes cover what changed in detail, and the Rego style guide is the best reference for keeping policies readable. My next post in this series, Content Moderation for LLMs with Llama Guard 3.2, uses these decision patterns for a different domain.

1. The Mental Model

2. Step 1, Writing rego.v1 Policies

2.1 Structured decisions

3. Step 2, Testing Policies

4. Step 3, Bundles and the Distribution Layer

4.1 Bundle signing

5. Step 4, Decision Logs

5.1 What to do with the logs

6. Step 5, Partial Evaluation for Hot Paths

6.1 When not to use partial eval

7. Step 6, Integrating from Applications

8. Common Pitfalls

8.1 Putting per-user data in data

8.2 Treating OPA as a write API

8.3 Forgetting that default allow := false is per-rule

8.4 No version pin on the OPA binary

9. Troubleshooting

9.1 Decisions returning the default after a deploy

9.2 Latency spikes on specific policies

9.3 Decision logs not arriving

10. Wrapping Up

Related posts

OPA 0.55 and Gatekeeper 3.13, Writing Admission Policies People Will Actually Maintain

Supply Chain Security for AI Models, Signing and SBOM

Content Moderation for LLMs with Llama Guard 3.2

SPIFFE and SPIRE for Service Identity, A Hands On Tutorial

Securing RAG Systems Against Data Exfiltration in 2025

DevSecOps in AI ML Pipelines, A Comprehensive Tutorial

Zero Trust Architectures for AI Services, A Step by Step Setup

Advanced Prompt Injection Defenses in 2025, A Practical Guide

Let’s Start a Project

8.1 Putting per-user data in `data`

8.3 Forgetting that `default allow := false` is per-rule