Audit Logging for Backend APIs
TL;DR — Audit logs record WHO did WHAT to WHICH RESOURCE, WHEN. Distinct from operational logs (which capture HOW things happened). Append-only storage with retention policy. Includes user ID, action, resource ID, timestamp, source IP. Does NOT include passwords or sensitive payloads.
After input validation, the operational layer for incident response. Without audit logs, “we got hacked, what did they touch?” is unanswerable.
Audit log vs operational log
Operational log: Loki / CloudWatch / generic. “GET /api/orders took 47ms, status 200.” Variable. Sampled at scale. Retention: 30 days.
Audit log: dedicated. “user 42 deleted order 1247 at 2022-11-28 09:14:33 UTC from 1.2.3.4”. Structured. Never sampled. Retention: 1-7 years.
Different uses, different stores, different retention.
What to log
Always:
- Authentication events (login success, login fail, logout, MFA challenge)
- Authorization decisions (especially denials)
- Data modifications (CREATE, UPDATE, DELETE) on sensitive resources
- Privilege changes (role assignments, permission grants)
- Configuration changes (settings, API keys, integrations)
- Data exports (downloading user data, generating reports)
Sometimes:
- Reads of sensitive data (PII access, financial info)
- Failed validation attempts (potential probing)
Never:
- Routine reads (every list query)
- Health checks
- Internal service-to-service calls
The litmus test: would the auditor need to see this in a 1-year-old incident review? If yes, log it.
Schema
CREATE TABLE audit_log (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
ts timestamptz NOT NULL DEFAULT now(),
actor_type text NOT NULL, -- 'user' / 'api_key' / 'system'
actor_id text NOT NULL, -- user_id or 'api_key:abc...'
action text NOT NULL, -- 'order.create', 'user.delete'
resource_type text NOT NULL, -- 'order', 'user'
resource_id text NOT NULL,
organization_id bigint, -- for multi-tenant scoping
source_ip inet,
source_user_agent text,
context jsonb, -- request method, path, etc.
changes jsonb, -- before/after for updates
result text NOT NULL DEFAULT 'success' -- 'success' / 'failure'
);
CREATE INDEX ON audit_log (ts DESC);
CREATE INDEX ON audit_log (actor_id, ts DESC);
CREATE INDEX ON audit_log (resource_type, resource_id, ts DESC);
CREATE INDEX ON audit_log (organization_id, ts DESC);
Indexes for common queries: “what did user X do?”, “what happened to resource Y?”, “all actions in org Z.”
Application code
A helper:
def audit(req, action: str, resource_type: str, resource_id: str,
changes: dict = None, result: str = "success"):
db.execute("""
INSERT INTO audit_log (
actor_type, actor_id, action, resource_type, resource_id,
organization_id, source_ip, source_user_agent, context, changes, result
) VALUES (
:at, :aid, :act, :rt, :rid, :org, :ip, :ua, :ctx, :ch, :res
)
""", {
"at": req.actor_type, "aid": str(req.actor.id),
"act": action, "rt": resource_type, "rid": str(resource_id),
"org": req.actor.organization_id,
"ip": req.client.host, "ua": req.headers.get("user-agent"),
"ctx": json.dumps({"method": req.method, "path": req.url.path}),
"ch": json.dumps(changes) if changes else None,
"res": result,
})
# Usage
@app.delete("/orders/{order_id}")
def delete_order(req, order_id: int):
order = orders.get(order_id)
if not user_can_delete(req.user, order):
audit(req, "order.delete", "order", str(order_id), result="forbidden")
raise Forbidden()
orders.delete(order_id)
audit(req, "order.delete", "order", str(order_id),
changes={"before": serialize(order)})
The audit call lives in the controller / use case, not somewhere generic. Each domain action audits itself with appropriate context.
Storage choices
Option A — Postgres in your own DB. Simplest. Same operational story as the rest of your data. Drawback: anyone with DB access can modify logs (insider risk).
Option B — Separate Postgres / write-only DB. Stronger isolation. Different credentials; app can append but not modify.
Option C — S3 + Parquet, append-only. Cheap at scale; immutable by storage policy. Queryable via Athena. Used for long-term retention.
Option D — Managed audit service. AWS CloudTrail-style; Datadog Audit; Splunk. Cost scales with volume.
For most teams: Option A or B, plus periodic Lambda/job archiving to S3 for compliance retention.
Append-only enforcement
If using Postgres:
CREATE ROLE audit_writer LOGIN PASSWORD '...';
GRANT INSERT ON audit_log TO audit_writer;
-- No UPDATE, no DELETE
App connects as audit_writer. Even with full DB access via that connection, can’t modify history. Modify-capable role exists for legitimate operations (rare; e.g., DSR deletion); kept separate, audited.
For S3:
Bucket policy:
- Deny DeleteObject, PutObjectAcl
- Allow PutObject only with conditions
- Object lock with retention period
Object Lock makes deletion impossible until retention expires.
Retention
Depends on:
- Regulatory requirements (SOX, HIPAA, PCI: usually 1-7 years)
- Internal policy
- Incident response needs
Typical:
- Operational logs: 30 days
- Audit logs (hot tier, Postgres): 90 days
- Audit logs (cold tier, S3): 7 years
After hot tier, archive to S3 Glacier. Slow but cheap.
Querying
Common queries operators run:
-- Everything user X did in last 24 hours
SELECT * FROM audit_log
WHERE actor_id = 'user-42' AND ts > now() - interval '24 hours'
ORDER BY ts DESC;
-- Who touched this order?
SELECT * FROM audit_log
WHERE resource_type = 'order' AND resource_id = '1247'
ORDER BY ts;
-- Failed logins from one IP
SELECT * FROM audit_log
WHERE action = 'auth.login' AND result = 'failure'
AND source_ip = '1.2.3.4' AND ts > now() - interval '1 hour';
Build a basic UI for support / security teams. Don’t make them write SQL.
What to alert on
Audit logs can drive security alerts:
- Repeated auth failures from one IP / for one user (credential stuffing)
- Privilege escalation events
- Mass data exports (e.g., a user downloading 10K records they wouldn’t normally)
- Off-hours admin activity
- Anomalous source IP (login from new country)
Stream new audit events into your alert evaluator. Most teams use Loki + Grafana alerts for this.
What NOT to log
- Passwords. Even hashed; even in fields named “password_attempted.” Never.
- Full request bodies. May contain sensitive data (credit cards, secrets).
- Full API tokens. Log the prefix only (
sk_live_abc1...). - PII when not necessary. Log user IDs, not full names + addresses, unless the action specifically warrants it.
- GDPR-sensitive data without retention plan. Audit logs are subject to deletion requests.
Common Pitfalls
No audit logs. “We can see operational logs.” Different thing.
Mixing audit and operational logs. Audit gets sampled, dropped, expired. Defeats the purpose.
Audit logs in the same DB with full write access. Insider can modify. Use restricted role.
No retention policy. Disk fills, or compliance violations.
Querying audit by full-text search. Slow. Index by actor_id, resource_id, ts.
Logging password attempts. Don’t, even hashed.
Audit log lookup as a hot path. Causes load on a write-mostly DB. Either materialize summaries or query rarely.
Wrapping Up
Audit log = who did what to which resource when. Separate from ops logs. Append-only. Retention by compliance. Wednesday: November retro.