background-shape
Backstage 1.14 as the Backbone of an Internal Developer Platform
June 20, 2023 · 7 min read · by Muhammad Amal programming

TL;DR — Backstage by itself is a service catalog and a portal framework, not a platform. / The value is the plugins that wire it to ArgoCD, Kubernetes, GitHub, Prometheus, and your cloud — and the catalog model that makes “who owns this?” answerable. / Treat the Backstage repo like a product: a small dedicated team, real backlog, real release process.

I keep running into orgs that “have Backstage.” When you look closely, what they have is a Backstage install with the demo catalog, three components imported, and nobody using it. That’s not Backstage. That’s a wiki with a worse search.

Spotify’s framework is one of the best things to happen to platform engineering, but it’s a framework, not a finished product. The work is integrating it. Below is how I think about a Backstage rollout that actually gets traction, and the plugin stack I install by default on every greenfield IDP.

For the broader platform picture, see platform engineering and Team Topologies. This post is the “what goes in the portal” half.

The catalog model is the API

The Backstage catalog (entities of kind Component, API, Resource, System, Group, User, Location) is the schema for your entire org. Get this right and everything downstream becomes easier — alerts route correctly, on-call rotations resolve, ownership questions answer themselves. Get it wrong and you have a stale spreadsheet rendered in TypeScript.

A solid catalog-info.yaml for a service looks like:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout
  title: Checkout Service
  description: Handles order checkout and payment intent creation
  annotations:
    github.com/project-slug: acme/checkout
    backstage.io/techdocs-ref: dir:.
    argocd/app-name: checkout-prod-us
    backstage.io/kubernetes-id: checkout
    pagerduty.com/integration-key: ${PD_CHECKOUT_KEY}
    prometheus.io/rule: 'checkout.SLO'
  links:
    - url: https://runbooks.acme.io/checkout
      title: Runbook
      icon: docs
  tags: [go, payments, tier-1]
spec:
  type: service
  lifecycle: production
  owner: group:checkout-team
  system: payments
  providesApis: [checkout-api]
  dependsOn: [resource:postgres-checkout-prod, component:inventory]

Every annotation here unlocks a Backstage feature. The github.com/project-slug annotation lights up the GitHub Actions and PR plugins. argocd/app-name wires the ArgoCD plugin into the component view. kubernetes-id lets the Kubernetes plugin pull pod status. techdocs-ref enables TechDocs to render the in-repo docs.

The catalog ingestion strategy that scales is discovery, not registration. You don’t ask teams to register their services in Backstage. You set up the GitHub integration to scan all repos for catalog-info.yaml and register them automatically. Teams own the YAML in their repos. The platform team owns the ingestion config.

# app-config.yaml
catalog:
  providers:
    githubOrg:
      - id: production
        target: https://github.com/acme
        orgs: [acme]
        schedule:
          frequency: { minutes: 60 }
          timeout: { minutes: 15 }
    github:
      providerId:
        organization: acme
        catalogPath: /catalog-info.yaml
        filters:
          branch: main
          repository: '.*'
        schedule:
          frequency: { minutes: 15 }
          timeout: { minutes: 5 }

The plugins that earn their keep

Backstage has hundreds of plugins. Most are noise. The ones I install on every IDP:

  • @backstage/plugin-kubernetes — pod status, logs, events per component. The thing developers click into when they get a “service is down” Slack ping.
  • @roadiehq/backstage-plugin-argo-cd — sync status, history, and “click to sync” without leaving the portal.
  • @backstage/plugin-techdocs — MkDocs-rendered docs from each service’s repo. Stops the inevitable “where are the docs?” sprint planning question.
  • @backstage/plugin-scaffolder — the new-service wizard. This is the part that actually changes behavior. Make it the path of least resistance and people will use it.
  • @backstage/plugin-catalog-graph — dependency visualization. Useful for “what blast radius am I looking at if I take this down?”
  • @backstage/plugin-pagerduty — on-call display per component. Routes the 3am “who do I page?” question.
  • @backstage-community/plugin-github-actions — pipeline status without context-switching to GitHub.

That’s the baseline. Add cloud-specific plugins (AWS, GCP) where they help, and write internal plugins for anything bespoke.

Scaffolder: where the golden path lives

The scaffolder is Backstage’s killer feature for platform teams. Software templates produce repos preconfigured with the golden path. The template I described in the platform engineering post is one example; a multi-template setup looks like:

apiVersion: backstage.io/v1alpha1
kind: Location
metadata:
  name: platform-templates
spec:
  type: url
  targets:
    - https://github.com/acme/platform-templates/blob/main/go-service/template.yaml
    - https://github.com/acme/platform-templates/blob/main/node-service/template.yaml
    - https://github.com/acme/platform-templates/blob/main/cron-job/template.yaml
    - https://github.com/acme/platform-templates/blob/main/python-worker/template.yaml
    - https://github.com/acme/platform-templates/blob/main/static-frontend/template.yaml

A few principles I follow with templates:

  • Tag the recommended ones. Add a recommended tag. Use the catalog filter to surface them first.
  • Version the templates. Tag the template repo. When you bump the Go version in the skeleton, you’re changing every future service. That’s a release.
  • Don’t accept arbitrary parameter input. Validate names against a regex, owners against the catalog. Pre-fill what you can from the user’s identity.
  • Output a PR, not a merged main. The scaffolder can publish directly to GitHub, but for non-trivial scaffolds I open a draft PR so the new owner can review what was generated before it lands.

The Backstage docs at backstage.io cover the scaffolder actions in detail. Custom actions (a Action that creates an ArgoCD Application, registers a PagerDuty service, opens a Vault path) are where you turn Backstage into a true control plane.

Backstage as a Kubernetes interface

The @backstage/plugin-kubernetes plugin needs a service account in each managed cluster:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: backstage
  namespace: backstage-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: backstage-readonly
rules:
  - apiGroups: ['']
    resources: [pods, services, configmaps, namespaces, nodes]
    verbs: [get, list, watch]
  - apiGroups: [apps]
    resources: [deployments, statefulsets, daemonsets, replicasets]
    verbs: [get, list, watch]
  - apiGroups: [batch]
    resources: [jobs, cronjobs]
    verbs: [get, list, watch]
  - apiGroups: [networking.k8s.io]
    resources: [ingresses]
    verbs: [get, list, watch]
  - apiGroups: [metrics.k8s.io]
    resources: [pods]
    verbs: [get, list]
  - apiGroups: [argoproj.io]
    resources: [rollouts]
    verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: backstage-readonly
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: backstage-readonly
subjects:
  - kind: ServiceAccount
    name: backstage
    namespace: backstage-system

Read-only is intentional. Backstage should not write to clusters. If a user needs to restart a pod, that’s an ArgoCD sync or a kubectl delete pod — not a Backstage button. Keeping the portal read-only avoids the worst class of “the IDP did it, I don’t know which person” audit findings.

The component-to-cluster mapping uses label selectors:

# In app-config.yaml
kubernetes:
  serviceLocatorMethod:
    type: multiTenant
  clusterLocatorMethods:
    - type: config
      clusters:
        - name: prod-us
          url: https://prod-us.k8s.acme.io
          authProvider: serviceAccount
          serviceAccountToken: ${K8S_PROD_US_TOKEN}
          skipTLSVerify: false
          caData: ${K8S_PROD_US_CA}
  customResources:
    - group: argoproj.io
      apiVersion: v1alpha1
      plural: rollouts

The customResources block is what makes the Kubernetes plugin show Argo Rollouts status alongside deployments.

TechDocs is worth the setup pain

TechDocs gets unfairly maligned. Yes, the build pipeline (MkDocs in a container, output to object storage) is fiddly to set up. Once it’s running, you get versioned, searchable, in-repo docs alongside every component. The alternative — a Confluence wiki that drifts out of date the moment it’s written — is worse.

Set up the external build approach (TechDocs CLI in CI publishes to S3) rather than building docs in the Backstage backend. The latter doesn’t scale past a hundred components.

Common Pitfalls

  • Treating Backstage like a vendor product. It’s a framework. You will write TypeScript. Budget for a frontend engineer on the platform team, or accept that you’ll only ever use the default plugins.
  • No catalog ownership story. Every Component should have a real owner that maps to a real Group with real members. If owner: unknown exists in your catalog, fix it before adding more services.
  • Auth that bypasses the org IdP. Use the Backstage GitHub or OIDC auth provider connected to your existing IdP. Local users plus passwords plus “remember to remove access when people leave” is a security finding waiting to happen.
  • Plugin sprawl. Each plugin you install is code you now maintain. Audit quarterly; remove anything nobody uses.
  • Database stored in SQLite. Default config is SQLite. For prod, Postgres. Backstage state is small but you’ll want backups and HA.
  • No scaffolder rate limits. The scaffolder hits GitHub, opens repos, writes secrets. Without rate limiting, a runaway template can torch your GitHub API budget for the day.
  • TechDocs without search. TechDocs ships with Lunr search by default, which is fine for small catalogs and terrible past 200 services. Swap to ElasticSearch or PostgreSQL search backend at that scale.

Wrapping Up

Backstage works when you treat the catalog as an API and the portal as the UI on top of it. The plugins are the connective tissue to the rest of the platform — ArgoCD, Kubernetes, CI, observability, on-call. Skimp on any of those connections and Backstage becomes a wiki nobody opens.

The last two posts in this series cover the CI side — advanced GitHub Actions patterns — and the cost-engineering layer with Karpenter and KEDA.