background-shape
TechDocs at Scale with Backstage, A Production Setup
October 15, 2025 · 9 min read · by Muhammad Amal programming

TL;DR — The TechDocs default of local builder is a demo mode. Past ten repos you need the external builder pattern, where docs are built in CI and uploaded to S3 (or GCS, or Azure Blob). This post walks through the full production setup, including the search indexer and the CI workflow that keeps it fast.

I’ve watched three teams hit the same wall with TechDocs. The local builder works perfectly for the first month, then someone adds a fiftieth service to the catalog, the backend pod starts OOMing during doc builds, and the docs tab on the entity page starts taking ninety seconds to load. The fix isn’t bigger pods. The fix is moving the build out of the backend entirely.

This tutorial is the production setup I deploy now by default, even on small portals. The upfront cost is a couple hours of CI work. The payoff is that the Backstage backend becomes a thin proxy that reads pre-rendered HTML from object storage, and adding more docs doesn’t add load to the portal itself.

If you haven’t seen the Backstage 1.34 baseline this builds on, the earlier post in this series covers it. The other prerequisite is some object storage you control, and a CI system. I’ll use GitHub Actions and S3, but the pattern translates to GCS plus GitLab CI without much change.

1. The Local Builder Problem

Out of the box, Backstage TechDocs runs mkdocs build inside the backend pod whenever a user opens a docs page that isn’t cached. With the default config in 1.34:

techdocs:
  builder: 'local'
  generator:
    runIn: 'docker'
  publisher:
    type: 'local'

Three things go wrong as you scale. First, every cold cache hit spawns a Docker container in the backend pod, which doesn’t work at all in Kubernetes unless you’ve gone out of your way to enable Docker-in-Docker. Second, even if you switch generator.runIn to local (no Docker), the backend now has to bundle every Python dependency that any team’s mkdocs.yml references, which means the image bloats unboundedly. Third, the cache is local to the pod, so horizontally scaling the backend creates duplicate work on every restart.

The external builder pattern fixes all three. CI does the build, uploads the rendered HTML to S3 under a deterministic prefix, and the backend just reads from that prefix.

2. Configuring the External Builder

Set techdocs.builder: 'external' in app-config.yaml. This tells the backend to never run a build, only to read what’s already there:

techdocs:
  builder: 'external'
  publisher:
    type: 'awsS3'
    awsS3:
      bucketName: 'acme-techdocs-prod'
      region: 'us-east-1'
      credentials:
        useFsCredentials: false
  cache:
    ttl: 3600
    readTimeout: 500

The S3 bucket needs a specific layout. Each entity gets its own prefix matching <namespace>/<kind>/<name>/. So a Component named payments-api in the default namespace lives at s3://acme-techdocs-prod/default/component/payments-api/. Backstage normalizes everything to lowercase. If your bucket layout mixes cases, the reader silently returns 404 and the entity page shows an empty docs tab.

Bucket policy needs read for the Backstage IAM role and write for the CI role. Don’t share credentials between them:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BackstageRead",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789012:role/backstage-techdocs-reader" },
      "Action": ["s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::acme-techdocs-prod",
        "arn:aws:s3:::acme-techdocs-prod/*"
      ]
    },
    {
      "Sid": "CIWrite",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789012:role/techdocs-publisher" },
      "Action": ["s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::acme-techdocs-prod",
        "arn:aws:s3:::acme-techdocs-prod/*"
      ]
    }
  ]
}

The IAM role for the Backstage pod assumes via IRSA on EKS. The CI role assumes via OIDC from GitHub Actions. Neither uses long-lived access keys.

3. The Standard Repo Layout

Every repository that owns docs follows the same layout. Enforce it through the scaffolder template so new repos start with it for free:

my-service/
+- catalog-info.yaml          # owns the entity
+- mkdocs.yml                 # mkdocs config
+- docs/
|  +- index.md
|  +- architecture.md
|  +- runbook.md
+- .github/
   +- workflows/
      +- techdocs.yml         # publishes on merge

The catalog-info.yaml tells Backstage where the docs live:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payments-api
  annotations:
    backstage.io/techdocs-ref: dir:.
spec:
  type: service
  owner: team-payments
  lifecycle: production

The dir:. ref means “look in the same repo, mkdocs.yml is at the root”. The alternative is url:https://github.com/acme/docs-repo for shared documentation repos. Don’t mix the two patterns across your org; pick one and stick with it.

The mkdocs.yml is plain Material for MkDocs:

site_name: Payments API
docs_dir: docs
plugins:
  - techdocs-core
markdown_extensions:
  - admonition
  - codehilite
  - pymdownx.superfences
  - pymdownx.tabbed:
      alternate_style: true
nav:
  - Overview: index.md
  - Architecture: architecture.md
  - Runbook: runbook.md

The techdocs-core plugin is the only required one. It bundles the search index generation and the styling Backstage’s reader expects.

4. The CI Workflow

Here’s the GitHub Actions workflow I ship to every repo via template. It publishes only on merge to main, only when relevant paths change:

name: TechDocs Publish
on:
  push:
    branches: [main]
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
      - 'catalog-info.yaml'
      - '.github/workflows/techdocs.yml'

permissions:
  id-token: write
  contents: read

jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-node@v4
        with:
          node-version: 22

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install techdocs-cli and mkdocs plugins
        run: |
          npm install -g @techdocs/cli@1.9.0
          pip install mkdocs-techdocs-core==1.5.0

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/techdocs-publisher
          aws-region: us-east-1

      - name: Resolve entity ref from catalog-info.yaml
        id: entity
        run: |
          NAME=$(yq '.metadata.name' catalog-info.yaml)
          NAMESPACE=$(yq '.metadata.namespace // "default"' catalog-info.yaml)
          KIND=$(yq '.kind | downcase' catalog-info.yaml)
          echo "ref=${NAMESPACE}/${KIND}/${NAME}" >> $GITHUB_OUTPUT

      - name: Generate
        run: techdocs-cli generate --no-docker --verbose

      - name: Publish
        run: |
          techdocs-cli publish \
            --publisher-type awsS3 \
            --storage-name acme-techdocs-prod \
            --awsBucketRootPath '' \
            --entity ${{ steps.entity.outputs.ref }}

Key details. --no-docker runs mkdocs directly on the runner, which is much faster than spawning a Docker container. The entity ref is parsed from the catalog file rather than hardcoded, so renaming a service doesn’t desynchronize the publish path from the catalog. The role assumption uses OIDC and short-lived credentials.

A typical publish for a repo with a few markdown files takes 25 seconds. Most of that is pip install. If you want it faster, build a base image with the deps baked in and use it as the workflow’s container.

TechDocs ships with a search collator that indexes the published HTML. With the external builder pattern, you configure it to read from the same S3 bucket:

search:
  collators:
    techdocs:
      locationTemplate: '/docs/:namespace/:kind/:name/:path'
      legacyPathCasing: false
      schedule:
        frequency: { hours: 6 }
        timeout: { minutes: 15 }
        initialDelay: { minutes: 5 }

Wire the collator into the backend with the search module:

// packages/backend/src/index.ts (excerpt)
backend.add(import('@backstage/plugin-search-backend/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-techdocs/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-catalog/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-pg/alpha'));

The pg module uses Postgres full-text search, which is enough for most portals. Past about a thousand documents you’ll want to switch to Elasticsearch or OpenSearch. Don’t reach for that complexity unless the query latency demands it.

The collator runs every six hours by default. That’s fine for docs because the publish pipeline is the source of truth. If you want fresher search, drop the frequency to one hour, but understand that each run lists every object in the bucket, which adds up to real S3 API costs at scale.

6. The Backstage Pod’s Role

With external builds, the Backstage pod’s job for TechDocs is to read S3 objects and stream them. Make sure the pod has IRSA configured with the reader role:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: backstage
  namespace: backstage
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/backstage-techdocs-reader
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backstage
  namespace: backstage
spec:
  template:
    spec:
      serviceAccountName: backstage
      containers:
        - name: backstage
          image: ghcr.io/acme-engineering/backstage:1.34.3
          env:
            - name: AWS_REGION
              value: us-east-1
          resources:
            requests: { cpu: 500m, memory: 1Gi }
            limits: { cpu: 2, memory: 2Gi }

The AWS SDK in the Backstage TechDocs reader auto-discovers credentials via the IRSA token mounted by EKS. No access keys, no secrets.

+--------+    push   +--------+    publish   +--------+
| dev    +---------->+ GitHub +------------->+   S3   |
+--------+           | Actions|              +---+----+
                     +--------+                  |
                                                 | get
                                                 v
+--------+   open page  +-----------+      +----------+
| user   +------------->+ Backstage +----->+  read HTML|
+--------+              +-----------+      +-----------+

7. Versioning Docs

The default setup publishes every commit to main as the only version. For most teams, that’s correct. If you maintain SDKs or APIs with hard version cutoffs, you’ll want multiple versions visible. The pattern I use is to publish each git tag to a versioned prefix:

- name: Publish versioned
  if: startsWith(github.ref, 'refs/tags/v')
  run: |
    VERSION=${GITHUB_REF#refs/tags/}
    techdocs-cli publish \
      --publisher-type awsS3 \
      --storage-name acme-techdocs-prod \
      --awsBucketRootPath versions/${VERSION} \
      --entity ${{ steps.entity.outputs.ref }}

Then the entity page links out to the versioned reader with a custom annotation:

metadata:
  annotations:
    backstage.io/techdocs-ref: dir:.
    acme.com/docs-versions: 'v1.2.0,v1.3.0,v2.0.0'

A small frontend extension reads that annotation and renders a version picker. The reader hits the versioned prefix. Not built into TechDocs directly, but a thin layer that’s been in production for me for two years.

Common Pitfalls

  • Case-sensitive S3 keys. The techdocs-cli lowercases the entity ref. The Backstage reader lowercases it. If you manually upload with mixed case, the reader 404s with no useful error. Always let techdocs-cli publish do the upload.
  • Skipping the search collator. People search inside docs, especially long runbooks. Without the collator, the global search bar returns “no results” for content that’s plainly there. Wire it on day one.
  • Building docs in the entity request path. The default local builder triggers a build when a user opens a doc that isn’t cached. Under load, this turns the Backstage pod into a heavyweight CI worker. Always external build, even if you have ten repos and “it works”.
  • Bundling every Python dep into the backend image. If you keep local builder for any reason, every plugin and theme you reference in any mkdocs.yml becomes a transitive dep of the backend image. External build keeps your image lean.

Troubleshooting

  • Docs tab shows “Documentation has not been generated yet” forever. The S3 object doesn’t exist under the expected key. SSH into the publish job’s logs and read the exact S3 URI. Compare to what the backend computes by toggling LOG_LEVEL=debug and watching the reader’s GET attempts.
  • Search returns no results for words that are in the docs. The collator hasn’t run since publish, or the collator ran but the document hadn’t been published yet. Hit the search admin endpoint to force a refresh.
  • Publish job 403s on S3. The OIDC trust policy on the publisher role doesn’t include the repo path or the workflow name. AWS’s OIDC trust is strict about the sub claim format. The default GitHub Actions OIDC token has sub of repo:org/repo:ref:refs/heads/main; the trust policy needs to match exactly, including the branch.

Wrapping Up

A TechDocs setup with external builds is the only one that scales past the demo stage. It separates the build from the serve, lets the docs grow without inflating the backend image, and gives you the right primitives to add versioning later. Add the search collator, IRSA-based reader credentials, and a templated CI workflow, and you’ve got something a hundred-repo org can run on autopilot.

The TechDocs operator guide covers every storage backend. The next post in this series steps away from Backstage internals and gets into Score Spec for workload portability.