background-shape
Audit Logging for Backend APIs
November 28, 2022 · 5 min read · by Muhammad Amal programming

TL;DR — Audit logs record WHO did WHAT to WHICH RESOURCE, WHEN. Distinct from operational logs (which capture HOW things happened). Append-only storage with retention policy. Includes user ID, action, resource ID, timestamp, source IP. Does NOT include passwords or sensitive payloads.

After input validation, the operational layer for incident response. Without audit logs, “we got hacked, what did they touch?” is unanswerable.

Audit log vs operational log

Operational log: Loki / CloudWatch / generic. “GET /api/orders took 47ms, status 200.” Variable. Sampled at scale. Retention: 30 days.

Audit log: dedicated. “user 42 deleted order 1247 at 2022-11-28 09:14:33 UTC from 1.2.3.4”. Structured. Never sampled. Retention: 1-7 years.

Different uses, different stores, different retention.

What to log

Always:

  • Authentication events (login success, login fail, logout, MFA challenge)
  • Authorization decisions (especially denials)
  • Data modifications (CREATE, UPDATE, DELETE) on sensitive resources
  • Privilege changes (role assignments, permission grants)
  • Configuration changes (settings, API keys, integrations)
  • Data exports (downloading user data, generating reports)

Sometimes:

  • Reads of sensitive data (PII access, financial info)
  • Failed validation attempts (potential probing)

Never:

  • Routine reads (every list query)
  • Health checks
  • Internal service-to-service calls

The litmus test: would the auditor need to see this in a 1-year-old incident review? If yes, log it.

Schema

CREATE TABLE audit_log (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  ts              timestamptz NOT NULL DEFAULT now(),
  actor_type      text NOT NULL,           -- 'user' / 'api_key' / 'system'
  actor_id        text NOT NULL,           -- user_id or 'api_key:abc...'
  action          text NOT NULL,           -- 'order.create', 'user.delete'
  resource_type  text NOT NULL,            -- 'order', 'user'
  resource_id     text NOT NULL,
  organization_id bigint,                  -- for multi-tenant scoping
  source_ip       inet,
  source_user_agent text,
  context         jsonb,                   -- request method, path, etc.
  changes         jsonb,                   -- before/after for updates
  result          text NOT NULL DEFAULT 'success'  -- 'success' / 'failure'
);

CREATE INDEX ON audit_log (ts DESC);
CREATE INDEX ON audit_log (actor_id, ts DESC);
CREATE INDEX ON audit_log (resource_type, resource_id, ts DESC);
CREATE INDEX ON audit_log (organization_id, ts DESC);

Indexes for common queries: “what did user X do?”, “what happened to resource Y?”, “all actions in org Z.”

Application code

A helper:

def audit(req, action: str, resource_type: str, resource_id: str,
          changes: dict = None, result: str = "success"):
    db.execute("""
        INSERT INTO audit_log (
            actor_type, actor_id, action, resource_type, resource_id,
            organization_id, source_ip, source_user_agent, context, changes, result
        ) VALUES (
            :at, :aid, :act, :rt, :rid, :org, :ip, :ua, :ctx, :ch, :res
        )
    """, {
        "at": req.actor_type, "aid": str(req.actor.id),
        "act": action, "rt": resource_type, "rid": str(resource_id),
        "org": req.actor.organization_id,
        "ip": req.client.host, "ua": req.headers.get("user-agent"),
        "ctx": json.dumps({"method": req.method, "path": req.url.path}),
        "ch": json.dumps(changes) if changes else None,
        "res": result,
    })

# Usage
@app.delete("/orders/{order_id}")
def delete_order(req, order_id: int):
    order = orders.get(order_id)
    if not user_can_delete(req.user, order):
        audit(req, "order.delete", "order", str(order_id), result="forbidden")
        raise Forbidden()
    orders.delete(order_id)
    audit(req, "order.delete", "order", str(order_id),
          changes={"before": serialize(order)})

The audit call lives in the controller / use case, not somewhere generic. Each domain action audits itself with appropriate context.

Storage choices

Option A — Postgres in your own DB. Simplest. Same operational story as the rest of your data. Drawback: anyone with DB access can modify logs (insider risk).

Option B — Separate Postgres / write-only DB. Stronger isolation. Different credentials; app can append but not modify.

Option C — S3 + Parquet, append-only. Cheap at scale; immutable by storage policy. Queryable via Athena. Used for long-term retention.

Option D — Managed audit service. AWS CloudTrail-style; Datadog Audit; Splunk. Cost scales with volume.

For most teams: Option A or B, plus periodic Lambda/job archiving to S3 for compliance retention.

Append-only enforcement

If using Postgres:

CREATE ROLE audit_writer LOGIN PASSWORD '...';
GRANT INSERT ON audit_log TO audit_writer;
-- No UPDATE, no DELETE

App connects as audit_writer. Even with full DB access via that connection, can’t modify history. Modify-capable role exists for legitimate operations (rare; e.g., DSR deletion); kept separate, audited.

For S3:

Bucket policy:
  - Deny DeleteObject, PutObjectAcl
  - Allow PutObject only with conditions
  - Object lock with retention period

Object Lock makes deletion impossible until retention expires.

Retention

Depends on:

  • Regulatory requirements (SOX, HIPAA, PCI: usually 1-7 years)
  • Internal policy
  • Incident response needs

Typical:

  • Operational logs: 30 days
  • Audit logs (hot tier, Postgres): 90 days
  • Audit logs (cold tier, S3): 7 years

After hot tier, archive to S3 Glacier. Slow but cheap.

Querying

Common queries operators run:

-- Everything user X did in last 24 hours
SELECT * FROM audit_log
WHERE actor_id = 'user-42' AND ts > now() - interval '24 hours'
ORDER BY ts DESC;

-- Who touched this order?
SELECT * FROM audit_log
WHERE resource_type = 'order' AND resource_id = '1247'
ORDER BY ts;

-- Failed logins from one IP
SELECT * FROM audit_log
WHERE action = 'auth.login' AND result = 'failure'
  AND source_ip = '1.2.3.4' AND ts > now() - interval '1 hour';

Build a basic UI for support / security teams. Don’t make them write SQL.

What to alert on

Audit logs can drive security alerts:

  • Repeated auth failures from one IP / for one user (credential stuffing)
  • Privilege escalation events
  • Mass data exports (e.g., a user downloading 10K records they wouldn’t normally)
  • Off-hours admin activity
  • Anomalous source IP (login from new country)

Stream new audit events into your alert evaluator. Most teams use Loki + Grafana alerts for this.

What NOT to log

  • Passwords. Even hashed; even in fields named “password_attempted.” Never.
  • Full request bodies. May contain sensitive data (credit cards, secrets).
  • Full API tokens. Log the prefix only (sk_live_abc1...).
  • PII when not necessary. Log user IDs, not full names + addresses, unless the action specifically warrants it.
  • GDPR-sensitive data without retention plan. Audit logs are subject to deletion requests.

Common Pitfalls

No audit logs. “We can see operational logs.” Different thing.

Mixing audit and operational logs. Audit gets sampled, dropped, expired. Defeats the purpose.

Audit logs in the same DB with full write access. Insider can modify. Use restricted role.

No retention policy. Disk fills, or compliance violations.

Querying audit by full-text search. Slow. Index by actor_id, resource_id, ts.

Logging password attempts. Don’t, even hashed.

Audit log lookup as a hot path. Causes load on a write-mostly DB. Either materialize summaries or query rarely.

Wrapping Up

Audit log = who did what to which resource when. Separate from ops logs. Append-only. Retention by compliance. Wednesday: November retro.