TechDocs at Scale with Backstage, A Production Setup
TL;DR — The TechDocs default of
localbuilder is a demo mode. Past ten repos you need the external builder pattern, where docs are built in CI and uploaded to S3 (or GCS, or Azure Blob). This post walks through the full production setup, including the search indexer and the CI workflow that keeps it fast.
I’ve watched three teams hit the same wall with TechDocs. The local builder works perfectly for the first month, then someone adds a fiftieth service to the catalog, the backend pod starts OOMing during doc builds, and the docs tab on the entity page starts taking ninety seconds to load. The fix isn’t bigger pods. The fix is moving the build out of the backend entirely.
This tutorial is the production setup I deploy now by default, even on small portals. The upfront cost is a couple hours of CI work. The payoff is that the Backstage backend becomes a thin proxy that reads pre-rendered HTML from object storage, and adding more docs doesn’t add load to the portal itself.
If you haven’t seen the Backstage 1.34 baseline this builds on, the earlier post in this series covers it. The other prerequisite is some object storage you control, and a CI system. I’ll use GitHub Actions and S3, but the pattern translates to GCS plus GitLab CI without much change.
1. The Local Builder Problem
Out of the box, Backstage TechDocs runs mkdocs build inside the backend pod whenever a user opens a docs page that isn’t cached. With the default config in 1.34:
techdocs:
builder: 'local'
generator:
runIn: 'docker'
publisher:
type: 'local'
Three things go wrong as you scale. First, every cold cache hit spawns a Docker container in the backend pod, which doesn’t work at all in Kubernetes unless you’ve gone out of your way to enable Docker-in-Docker. Second, even if you switch generator.runIn to local (no Docker), the backend now has to bundle every Python dependency that any team’s mkdocs.yml references, which means the image bloats unboundedly. Third, the cache is local to the pod, so horizontally scaling the backend creates duplicate work on every restart.
The external builder pattern fixes all three. CI does the build, uploads the rendered HTML to S3 under a deterministic prefix, and the backend just reads from that prefix.
2. Configuring the External Builder
Set techdocs.builder: 'external' in app-config.yaml. This tells the backend to never run a build, only to read what’s already there:
techdocs:
builder: 'external'
publisher:
type: 'awsS3'
awsS3:
bucketName: 'acme-techdocs-prod'
region: 'us-east-1'
credentials:
useFsCredentials: false
cache:
ttl: 3600
readTimeout: 500
The S3 bucket needs a specific layout. Each entity gets its own prefix matching <namespace>/<kind>/<name>/. So a Component named payments-api in the default namespace lives at s3://acme-techdocs-prod/default/component/payments-api/. Backstage normalizes everything to lowercase. If your bucket layout mixes cases, the reader silently returns 404 and the entity page shows an empty docs tab.
Bucket policy needs read for the Backstage IAM role and write for the CI role. Don’t share credentials between them:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BackstageRead",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::123456789012:role/backstage-techdocs-reader" },
"Action": ["s3:GetObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::acme-techdocs-prod",
"arn:aws:s3:::acme-techdocs-prod/*"
]
},
{
"Sid": "CIWrite",
"Effect": "Allow",
"Principal": { "AWS": "arn:aws:iam::123456789012:role/techdocs-publisher" },
"Action": ["s3:PutObject", "s3:DeleteObject", "s3:ListBucket"],
"Resource": [
"arn:aws:s3:::acme-techdocs-prod",
"arn:aws:s3:::acme-techdocs-prod/*"
]
}
]
}
The IAM role for the Backstage pod assumes via IRSA on EKS. The CI role assumes via OIDC from GitHub Actions. Neither uses long-lived access keys.
3. The Standard Repo Layout
Every repository that owns docs follows the same layout. Enforce it through the scaffolder template so new repos start with it for free:
my-service/
+- catalog-info.yaml # owns the entity
+- mkdocs.yml # mkdocs config
+- docs/
| +- index.md
| +- architecture.md
| +- runbook.md
+- .github/
+- workflows/
+- techdocs.yml # publishes on merge
The catalog-info.yaml tells Backstage where the docs live:
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payments-api
annotations:
backstage.io/techdocs-ref: dir:.
spec:
type: service
owner: team-payments
lifecycle: production
The dir:. ref means “look in the same repo, mkdocs.yml is at the root”. The alternative is url:https://github.com/acme/docs-repo for shared documentation repos. Don’t mix the two patterns across your org; pick one and stick with it.
The mkdocs.yml is plain Material for MkDocs:
site_name: Payments API
docs_dir: docs
plugins:
- techdocs-core
markdown_extensions:
- admonition
- codehilite
- pymdownx.superfences
- pymdownx.tabbed:
alternate_style: true
nav:
- Overview: index.md
- Architecture: architecture.md
- Runbook: runbook.md
The techdocs-core plugin is the only required one. It bundles the search index generation and the styling Backstage’s reader expects.
4. The CI Workflow
Here’s the GitHub Actions workflow I ship to every repo via template. It publishes only on merge to main, only when relevant paths change:
name: TechDocs Publish
on:
push:
branches: [main]
paths:
- 'docs/**'
- 'mkdocs.yml'
- 'catalog-info.yaml'
- '.github/workflows/techdocs.yml'
permissions:
id-token: write
contents: read
jobs:
publish:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install techdocs-cli and mkdocs plugins
run: |
npm install -g @techdocs/cli@1.9.0
pip install mkdocs-techdocs-core==1.5.0
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789012:role/techdocs-publisher
aws-region: us-east-1
- name: Resolve entity ref from catalog-info.yaml
id: entity
run: |
NAME=$(yq '.metadata.name' catalog-info.yaml)
NAMESPACE=$(yq '.metadata.namespace // "default"' catalog-info.yaml)
KIND=$(yq '.kind | downcase' catalog-info.yaml)
echo "ref=${NAMESPACE}/${KIND}/${NAME}" >> $GITHUB_OUTPUT
- name: Generate
run: techdocs-cli generate --no-docker --verbose
- name: Publish
run: |
techdocs-cli publish \
--publisher-type awsS3 \
--storage-name acme-techdocs-prod \
--awsBucketRootPath '' \
--entity ${{ steps.entity.outputs.ref }}
Key details. --no-docker runs mkdocs directly on the runner, which is much faster than spawning a Docker container. The entity ref is parsed from the catalog file rather than hardcoded, so renaming a service doesn’t desynchronize the publish path from the catalog. The role assumption uses OIDC and short-lived credentials.
A typical publish for a repo with a few markdown files takes 25 seconds. Most of that is pip install. If you want it faster, build a base image with the deps baked in and use it as the workflow’s container.
5. Hooking Up Search
TechDocs ships with a search collator that indexes the published HTML. With the external builder pattern, you configure it to read from the same S3 bucket:
search:
collators:
techdocs:
locationTemplate: '/docs/:namespace/:kind/:name/:path'
legacyPathCasing: false
schedule:
frequency: { hours: 6 }
timeout: { minutes: 15 }
initialDelay: { minutes: 5 }
Wire the collator into the backend with the search module:
// packages/backend/src/index.ts (excerpt)
backend.add(import('@backstage/plugin-search-backend/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-techdocs/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-catalog/alpha'));
backend.add(import('@backstage/plugin-search-backend-module-pg/alpha'));
The pg module uses Postgres full-text search, which is enough for most portals. Past about a thousand documents you’ll want to switch to Elasticsearch or OpenSearch. Don’t reach for that complexity unless the query latency demands it.
The collator runs every six hours by default. That’s fine for docs because the publish pipeline is the source of truth. If you want fresher search, drop the frequency to one hour, but understand that each run lists every object in the bucket, which adds up to real S3 API costs at scale.
6. The Backstage Pod’s Role
With external builds, the Backstage pod’s job for TechDocs is to read S3 objects and stream them. Make sure the pod has IRSA configured with the reader role:
apiVersion: v1
kind: ServiceAccount
metadata:
name: backstage
namespace: backstage
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/backstage-techdocs-reader
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: backstage
namespace: backstage
spec:
template:
spec:
serviceAccountName: backstage
containers:
- name: backstage
image: ghcr.io/acme-engineering/backstage:1.34.3
env:
- name: AWS_REGION
value: us-east-1
resources:
requests: { cpu: 500m, memory: 1Gi }
limits: { cpu: 2, memory: 2Gi }
The AWS SDK in the Backstage TechDocs reader auto-discovers credentials via the IRSA token mounted by EKS. No access keys, no secrets.
+--------+ push +--------+ publish +--------+
| dev +---------->+ GitHub +------------->+ S3 |
+--------+ | Actions| +---+----+
+--------+ |
| get
v
+--------+ open page +-----------+ +----------+
| user +------------->+ Backstage +----->+ read HTML|
+--------+ +-----------+ +-----------+
7. Versioning Docs
The default setup publishes every commit to main as the only version. For most teams, that’s correct. If you maintain SDKs or APIs with hard version cutoffs, you’ll want multiple versions visible. The pattern I use is to publish each git tag to a versioned prefix:
- name: Publish versioned
if: startsWith(github.ref, 'refs/tags/v')
run: |
VERSION=${GITHUB_REF#refs/tags/}
techdocs-cli publish \
--publisher-type awsS3 \
--storage-name acme-techdocs-prod \
--awsBucketRootPath versions/${VERSION} \
--entity ${{ steps.entity.outputs.ref }}
Then the entity page links out to the versioned reader with a custom annotation:
metadata:
annotations:
backstage.io/techdocs-ref: dir:.
acme.com/docs-versions: 'v1.2.0,v1.3.0,v2.0.0'
A small frontend extension reads that annotation and renders a version picker. The reader hits the versioned prefix. Not built into TechDocs directly, but a thin layer that’s been in production for me for two years.
Common Pitfalls
- Case-sensitive S3 keys. The
techdocs-clilowercases the entity ref. The Backstage reader lowercases it. If you manually upload with mixed case, the reader 404s with no useful error. Always lettechdocs-cli publishdo the upload. - Skipping the search collator. People search inside docs, especially long runbooks. Without the collator, the global search bar returns “no results” for content that’s plainly there. Wire it on day one.
- Building docs in the entity request path. The default
localbuilder triggers a build when a user opens a doc that isn’t cached. Under load, this turns the Backstage pod into a heavyweight CI worker. Always external build, even if you have ten repos and “it works”. - Bundling every Python dep into the backend image. If you keep
localbuilder for any reason, every plugin and theme you reference in anymkdocs.ymlbecomes a transitive dep of the backend image. External build keeps your image lean.
Troubleshooting
- Docs tab shows “Documentation has not been generated yet” forever. The S3 object doesn’t exist under the expected key. SSH into the publish job’s logs and read the exact S3 URI. Compare to what the backend computes by toggling
LOG_LEVEL=debugand watching the reader’s GET attempts. - Search returns no results for words that are in the docs. The collator hasn’t run since publish, or the collator ran but the document hadn’t been published yet. Hit the search admin endpoint to force a refresh.
- Publish job 403s on S3. The OIDC trust policy on the publisher role doesn’t include the repo path or the workflow name. AWS’s OIDC trust is strict about the
subclaim format. The default GitHub Actions OIDC token hassubofrepo:org/repo:ref:refs/heads/main; the trust policy needs to match exactly, including the branch.
Wrapping Up
A TechDocs setup with external builds is the only one that scales past the demo stage. It separates the build from the serve, lets the docs grow without inflating the backend image, and gives you the right primitives to add versioning later. Add the search collator, IRSA-based reader credentials, and a templated CI workflow, and you’ve got something a hundred-repo org can run on autopilot.
The TechDocs operator guide covers every storage backend. The next post in this series steps away from Backstage internals and gets into Score Spec for workload portability.