Wiring an Automated AI Code Review Pipeline into CI

Ci article cover illustration on a gradient background

February 13, 2026 · 9 min read · by Muhammad Amal programming

TL;DR — Diff the PR, send only the diff, get structured findings back / post comments via the GitHub review API / scope permissions tight and fail soft.

Human code review is a bottleneck, and not because reviewers are slow. It’s because the boring 80% — the unhandled error path, the missing null check, the function that grew three responsibilities — soaks up attention that should go to architecture and intent. An AI code review pipeline doesn’t replace your reviewers. It clears the boring 80% off their plate so they spend their judgment where it counts.

The trap is treating this as “pipe the diff to a model and dump the reply in a comment.” That produces a wall of unstructured prose nobody reads, fires on every trivial whitespace change, and occasionally leaks more of your code than you intended. A pipeline you’ll actually keep enabled needs three properties: it sends only the diff, it gets back structured findings, and it posts them as inline comments anchored to specific lines.

This article builds that pipeline on GitHub Actions with the Anthropic API and Python 3.12. It’s a sibling to my piece on orchestrating multi-agent workflows with CrewAI — same agentic-engineering theme, applied here to the review gate itself.

Architecture

Four stages, one workflow file plus one Python script.

PR opened/updated
      │
      ▼
GitHub Actions trigger (pull_request)
      │
      ▼
fetch diff  ──▶  call Anthropic API  ──▶  parse structured findings
                                                  │
                                                  ▼
                                  post inline review via GitHub API

The script does the work; the workflow handles triggering, checkout, secrets, and permissions. Keeping logic in Python rather than inline YAML means you can test it locally and the workflow file stays readable.

The Workflow File

Start with the GitHub Actions definition. The two settings that matter most are the trigger and the permissions block.

# .github/workflows/ai-review.yml
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

# Least privilege: the job can read code and write PR comments, nothing else.
permissions:
  contents: read
  pull-requests: write

concurrency:
  # Cancel a stale review run when the PR is updated again.
  group: ai-review-${{ github.event.pull_request.number }}
  cancel-in-progress: true

jobs:
  review:
    runs-on: ubuntu-24.04
    # Skip drafts and bot-authored PRs to save tokens.
    if: github.event.pull_request.draft == false
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # need full history to compute the diff

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install dependencies
        run: pip install anthropic==0.45.2

      - name: Run AI review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          PR_NUMBER: ${{ github.event.pull_request.number }}
          REPO: ${{ github.repository }}
          BASE_SHA: ${{ github.event.pull_request.base.sha }}
          HEAD_SHA: ${{ github.event.pull_request.head.sha }}
        run: python .github/scripts/ai_review.py

fetch-depth: 0 is required — without full history the runner can’t diff base.sha against head.sha. The concurrency block cancels an in-flight review when someone pushes again, so you never pay for a review of stale code. The GitHub Actions permissions docs explain why the explicit permissions block beats the default token scope.

Extracting the Diff

The script’s first job is producing a clean diff. We shell out to git, then filter — skip lockfiles, generated files, and anything too large to review meaningfully.

# .github/scripts/ai_review.py
from __future__ import annotations
import os
import subprocess
import sys

# Files that are noise for a reviewer.
SKIP_SUFFIXES = (".lock", ".min.js", ".svg", ".snap")
SKIP_PATHS = ("dist/", "build/", "vendor/", "node_modules/")
MAX_DIFF_CHARS = 60_000


def get_diff(base_sha: str, head_sha: str) -> str:
    """Return a filtered unified diff between two commits."""
    try:
        files = subprocess.run(
            ["git", "diff", "--name-only", base_sha, head_sha],
            capture_output=True, text=True, check=True,
        ).stdout.splitlines()
    except subprocess.CalledProcessError as exc:
        sys.exit(f"failed to list changed files: {exc.stderr}")

    keep = [
        f for f in files
        if not f.endswith(SKIP_SUFFIXES)
        and not any(f.startswith(p) for p in SKIP_PATHS)
    ]
    if not keep:
        print("no reviewable files changed; exiting cleanly")
        sys.exit(0)

    try:
        diff = subprocess.run(
            ["git", "diff", "--unified=3", base_sha, head_sha, "--", *keep],
            capture_output=True, text=True, check=True,
        ).stdout
    except subprocess.CalledProcessError as exc:
        sys.exit(f"failed to compute diff: {exc.stderr}")

    if len(diff) > MAX_DIFF_CHARS:
        print(f"diff is {len(diff)} chars; truncating to {MAX_DIFF_CHARS}")
        diff = diff[:MAX_DIFF_CHARS] + "\n... [diff truncated] ..."
    return diff

The size cap is both a cost control and a quality control. A 200-file refactor reviewed as one giant prompt produces shallow comments. Truncating — and saying so — is more honest than pretending to review code that didn’t fit.

Calling the Anthropic API for Structured Findings

The model must return JSON, not prose. We enforce that by describing the exact shape in the prompt and using a tool definition as the schema — the cleanest way to get structured output from the Anthropic API.

# .github/scripts/ai_review.py (continued)
import json
from anthropic import Anthropic, APIError

_REVIEW_TOOL = {
    "name": "submit_review",
    "description": "Submit code review findings for the pull request diff.",
    "input_schema": {
        "type": "object",
        "properties": {
            "findings": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "file": {"type": "string", "description": "Path from the diff"},
                        "line": {"type": "integer", "description": "Line in the new file"},
                        "severity": {
                            "type": "string",
                            "enum": ["blocker", "warning", "nit"],
                        },
                        "comment": {"type": "string"},
                    },
                    "required": ["file", "line", "severity", "comment"],
                },
            },
            "summary": {"type": "string", "description": "One-paragraph overview"},
        },
        "required": ["findings", "summary"],
    },
}

_SYSTEM = (
    "You are a senior engineer reviewing a pull request diff. Report only "
    "real issues: correctness bugs, security flaws, resource leaks, missing "
    "error handling, and clear contract violations. Do not comment on style "
    "a formatter would catch. Use 'blocker' only for issues that must be "
    "fixed before merge. If the diff is clean, return an empty findings list."
)


def review_diff(diff: str) -> dict:
    """Send the diff to the model and return parsed findings."""
    client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
    try:
        resp = client.messages.create(
            model="claude-sonnet-4-5-20250929",
            max_tokens=4096,
            system=_SYSTEM,
            tools=[_REVIEW_TOOL],
            tool_choice={"type": "tool", "name": "submit_review"},
            messages=[{"role": "user", "content": f"Review this diff:\n\n{diff}"}],
        )
    except APIError as exc:
        # Fail soft: a review-service outage must not block the PR.
        print(f"::warning::Anthropic API error, skipping review: {exc}")
        sys.exit(0)

    for block in resp.content:
        if block.type == "tool_use" and block.name == "submit_review":
            return block.input
    print("::warning::model returned no structured review")
    sys.exit(0)

tool_choice forced to submit_review guarantees the model answers through the schema instead of free text. And note the failure posture: when the API errors, we emit a workflow warning and exit 0. A review assistant that’s down should never block a merge — fail soft, always.

Posting Inline Comments

The last stage turns findings into a GitHub review. The Reviews API lets you post a batch of inline comments plus a summary in a single call, each comment anchored to a file and line.

# .github/scripts/ai_review.py (continued)
import urllib.request
import urllib.error

SEVERITY_PREFIX = {"blocker": "BLOCKER", "warning": "Warning", "nit": "Nit"}


def post_review(repo: str, pr_number: str, head_sha: str, result: dict) -> None:
    """Post findings as a single GitHub pull request review."""
    token = os.environ["GITHUB_TOKEN"]
    comments = [
        {
            "path": f["file"],
            "line": f["line"],
            "side": "RIGHT",
            "body": f"**{SEVERITY_PREFIX.get(f['severity'], 'Note')}** — {f['comment']}",
        }
        for f in result["findings"]
    ]
    has_blocker = any(f["severity"] == "blocker" for f in result["findings"])
    body = {
        "commit_id": head_sha,
        "body": f"AI review summary\n\n{result['summary']}",
        # COMMENT never blocks merge; REQUEST_CHANGES does. Stay advisory.
        "event": "REQUEST_CHANGES" if has_blocker else "COMMENT",
        "comments": comments,
    }
    req = urllib.request.Request(
        f"https://api.github.com/repos/{repo}/pulls/{pr_number}/reviews",
        data=json.dumps(body).encode(),
        method="POST",
        headers={
            "Authorization": f"Bearer {token}",
            "Accept": "application/vnd.github+json",
            "X-GitHub-Api-Version": "2022-11-28",
        },
    )
    try:
        with urllib.request.urlopen(req) as resp:
            print(f"posted review, HTTP {resp.status}, {len(comments)} comments")
    except urllib.error.HTTPError as exc:
        # 422 usually means a comment line is outside the diff hunk.
        detail = exc.read().decode()
        print(f"::warning::failed to post review ({exc.code}): {detail}")

A subtlety worth internalizing: GitHub rejects inline comments whose line falls outside the diff hunk, returning a 422. The model occasionally misjudges line numbers, so treat a 422 as expected noise — log it as a warning rather than failing the job.

The Entry Point

# .github/scripts/ai_review.py (continued)
def main() -> None:
    required = ["BASE_SHA", "HEAD_SHA", "REPO", "PR_NUMBER",
                "ANTHROPIC_API_KEY", "GITHUB_TOKEN"]
    missing = [k for k in required if not os.environ.get(k)]
    if missing:
        sys.exit(f"missing required env vars: {', '.join(missing)}")

    diff = get_diff(os.environ["BASE_SHA"], os.environ["HEAD_SHA"])
    result = review_diff(diff)

    findings = result["findings"]
    print(f"review produced {len(findings)} finding(s)")
    if not findings:
        print("clean diff; nothing to post")
        return

    post_review(
        os.environ["REPO"],
        os.environ["PR_NUMBER"],
        os.environ["HEAD_SHA"],
        result,
    )


if __name__ == "__main__":
    main()

Common Pitfalls

Sending whole files instead of the diff. It costs more tokens and buries the change in unchanged context. Send the unified diff; it’s all the model needs.

Unstructured output. A free-text reply forces fragile regex parsing and can’t be posted as inline comments. Force a schema with a tool definition and tool_choice.

Blocking merges on AI verdicts. Use event: COMMENT by default. Reserve REQUEST_CHANGES for genuine blockers, and even then make sure a human can dismiss it. An AI reviewer that hard-blocks merges will get disabled within a week.

Over-broad token permissions. The default GITHUB_TOKEN scope is wider than this job needs. Pin permissions: to contents: read and pull-requests: write.

No diff size cap. A huge PR sent whole produces shallow review and a large bill. Cap the diff and say when you truncated.

Reviewing every push synchronously. Without a concurrency group, ten quick pushes mean ten parallel reviews of soon-stale code. Set cancel-in-progress: true.

Troubleshooting

Symptom: the job fails with fatal: bad object on the diff. Cause: shallow checkout, so base.sha isn’t in local history. Fix: set fetch-depth: 0 on actions/checkout.

Symptom: the review posts but inline comments are missing. Cause: GitHub rejected comments whose line is outside the diff hunk, returning 422. Fix: this is partly expected; log the 422 body, and consider snapping line numbers to the nearest changed line before posting.

Symptom: the model returns prose instead of JSON. Cause: tool_choice not pinned to the review tool. Fix: set tool_choice={"type": "tool", "name": "submit_review"} so the response must use the schema.

Symptom: Resource not accessible by integration when posting. Cause: the workflow lacks pull-requests: write. Fix: add it to the permissions: block; the default token can’t post reviews otherwise.

Symptom: reviews run on draft PRs and waste tokens. Cause: no draft guard. Fix: add if: github.event.pull_request.draft == false to the job.

Symptom: an API outage fails the whole pipeline. Cause: the script exits non-zero on APIError. Fix: catch APIError, emit ::warning::, and sys.exit(0) so a service blip never blocks merges.

What’s Next

You now have a pipeline that reviews every PR, posts anchored inline comments, and fails soft when its dependencies wobble. The next steps are caching reviews keyed by diff hash to avoid re-reviewing unchanged code, adding a per-repo config file so teams can tune severity thresholds, and feeding accepted-versus-dismissed comment data back as examples to sharpen the prompt over time.

Architecture

The Workflow File

Extracting the Diff

Calling the Anthropic API for Structured Findings

Posting Inline Comments

The Entry Point

Common Pitfalls

Troubleshooting

What’s Next

Related posts

Catching Regressions with an AI Reviewer Agent on Pull Requests

SLSA v1.0 in Practice, Build Provenance Without Boiling the Ocean

Advanced GitHub Actions, Reusable Workflows, OIDC, and Matrix Patterns That Don't Become Spaghetti

Orchestrating GitHub Actions From n8n, Webhooks, Dispatch, and Sanity

Docker Compose for CI, Ephemeral Stacks per Test Run

Deploying Docker Images from GitHub Actions to Staging

GitHub Actions Matrix Builds and Parallel Test Sharding

GitHub Actions Caching, actions/cache + BuildKit Registry Cache

Let’s Start a Project