Developer Onboarding with Backstage and ArgoCD, An End to End Tutorial
TL;DR — A working onboarding flow has four moments: account, scaffold, deploy, observe. Backstage 1.34 owns the first two via auth and the scaffolder. ArgoCD 2.13 owns the third via GitOps. The fourth is the entity page surfacing real signals. This tutorial wires all four into one developer-facing loop.
When I interview platform engineers, I ask how they measure onboarding. The honest answer is rarely “we run a stopwatch from offer-accepted to first-PR-merged” because nobody does that, but it’s the right metric. The cost of an unmeasured onboarding is invisible until you scale past about thirty engineers and the new hires start drifting because nobody’s path is the obvious one.
This tutorial is the onboarding loop I’ve shipped at two companies. A new engineer signs in to Backstage with their corporate identity, runs a scaffolder template to create a service, the template hands the repo to ArgoCD, ArgoCD deploys it to a dev cluster, and the engineer sees the running pod’s health on the Backstage entity page within fifteen minutes. After that they’re shipping changes through PRs to the GitOps repo.
This builds on the Backstage portal from the earlier post in this series, and the templates from the golden-paths post. If you haven’t read those, the configuration here will make more sense in context.
1. The Identity Layer
The onboarding loop starts with authentication. The portal must trust the corporate identity provider, and the resulting Backstage user must be findable in the catalog. Use a single source of truth, usually Okta or Entra ID, with GitHub as the authn provider that Backstage talks to (because the catalog ingestion already uses GitHub team membership).
auth:
environment: production
providers:
github:
production:
clientId: ${AUTH_GITHUB_CLIENT_ID}
clientSecret: ${AUTH_GITHUB_CLIENT_SECRET}
signIn:
resolvers:
- resolver: usernameMatchingUserEntityName
- resolver: emailMatchingUserEntityProfileEmail
The signIn resolvers chain. The first tries to find a User entity whose name matches the GitHub username. The second falls back to email. If neither matches, the sign-in fails with a clear error message, which prevents the dangling “logged in but no catalog identity” state that produces phantom permission errors later.
For the catalog side, the GitHub org provider syncs users and teams every hour:
catalog:
providers:
githubOrg:
production:
githubUrl: https://github.com
orgs: [acme-engineering]
schedule:
frequency: { hours: 1 }
timeout: { minutes: 15 }
A new hire’s first login can fail because their GitHub team membership hasn’t been ingested yet. Don’t make them wait an hour. Expose a “refresh me” button via a custom plugin or a manual catalog refresh endpoint, or have HR’s onboarding script trigger the refresh as part of provisioning.
2. The Scaffolder Template
The onboarding scaffolder template is intentionally minimal. The new hire shouldn’t see twelve form fields. They should see three: service name, team owner, system. Everything else is the platform’s opinion.
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: hello-onboarding
title: My First Service
description: Scaffolds a Hello World service and deploys it to the dev cluster
tags: [onboarding, recommended]
spec:
owner: team-platform
type: service
parameters:
- title: About your service
required: [name, owner, system]
properties:
name:
type: string
pattern: '^[a-z][a-z0-9-]{2,30}$'
ui:help: Lowercase, hyphens only
owner:
type: string
ui:field: OwnerPicker
ui:options: { allowedKinds: [Group] }
system:
type: string
ui:field: EntityPicker
ui:options:
catalogFilter: { kind: System }
steps:
- id: render
name: Render skeleton
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
system: ${{ parameters.system }}
- id: publish
name: Create GitHub repo
action: publish:github
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=acme-engineering
defaultBranch: main
protectDefaultBranch: true
requiredApprovingReviewCount: 1
repoVisibility: internal
- id: gitops
name: Wire into GitOps repo
action: acme:gitops:add-app
input:
appName: ${{ parameters.name }}
sourceRepoUrl: ${{ steps.publish.output.remoteUrl }}
environment: dev
- id: register
name: Register in catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
output:
text:
- title: Done. Your service will be deployed in ~3 minutes.
content: |
- Repo: ${{ steps.publish.output.remoteUrl }}
- ArgoCD: https://argocd.acme.internal/applications/${{ parameters.name }}
links:
- title: Open in Backstage
icon: catalog
entityRef: ${{ steps.register.output.entityRef }}
The interesting step is acme:gitops:add-app. Rather than calling ArgoCD’s API directly, the template writes to the GitOps repo. ArgoCD picks up the change on its sync interval. This is the only sane approach for a real onboarding flow because the GitOps repo becomes the single audit log of what’s deployed.
3. The Custom GitOps Action
The action lives in a backend plugin. It clones the GitOps repo, writes a new application directory, and pushes a PR (or commits directly to main if you trust the platform team’s scaffolder enough):
// plugins/scaffolder-actions/src/actions/gitops-add-app.ts
import { createTemplateAction } from '@backstage/plugin-scaffolder-node';
import { Octokit } from '@octokit/rest';
import * as yaml from 'yaml';
export const createGitopsAddAppAction = (opts: { token: string; gitopsRepo: string }) =>
createTemplateAction<{
appName: string;
sourceRepoUrl: string;
environment: 'dev' | 'staging' | 'prod';
}>({
id: 'acme:gitops:add-app',
schema: {
input: {
type: 'object',
required: ['appName', 'sourceRepoUrl', 'environment'],
properties: {
appName: { type: 'string', pattern: '^[a-z][a-z0-9-]{2,30}$' },
sourceRepoUrl: { type: 'string', format: 'uri' },
environment: { type: 'string', enum: ['dev', 'staging', 'prod'] },
},
},
},
async handler(ctx) {
const { appName, sourceRepoUrl, environment } = ctx.input;
const octokit = new Octokit({ auth: opts.token });
const [owner, repo] = opts.gitopsRepo.split('/');
const path = `environments/${environment}/applications/${appName}.yaml`;
const application = {
apiVersion: 'argoproj.io/v1alpha1',
kind: 'Application',
metadata: { name: appName, namespace: 'argocd' },
spec: {
project: environment,
source: {
repoURL: sourceRepoUrl,
targetRevision: 'main',
path: 'deploy/overlays/' + environment,
},
destination: {
server: 'https://kubernetes.default.svc',
namespace: appName,
},
syncPolicy: {
automated: { prune: true, selfHeal: true },
syncOptions: ['CreateNamespace=true', 'ServerSideApply=true'],
},
},
};
const content = Buffer.from(yaml.stringify(application)).toString('base64');
await octokit.repos.createOrUpdateFileContents({
owner,
repo,
path,
message: `feat(${environment}): register ${appName} application`,
content,
branch: 'main',
});
ctx.output('argocdAppName', appName);
ctx.logger.info(`Registered ${appName} in GitOps repo at ${path}`);
},
});
Register this action against the scaffolder via a backend module (the previous post on custom plugins covers the module pattern). The action runs after publish:github so the source repo URL is real, and before catalog:register so the catalog entry reflects the deployed state.
4. The Skeleton’s Deploy Manifests
The skeleton folder includes Kubernetes manifests under deploy/. Use Kustomize overlays so each environment differs only by what should differ (replicas, resource limits, image tag policy):
# skeleton/deploy/base/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: ${{ values.name }}
spec:
replicas: 1
selector:
matchLabels: { app: ${{ values.name }} }
template:
metadata:
labels: { app: ${{ values.name }} }
spec:
containers:
- name: app
image: ghcr.io/acme-engineering/${{ values.name }}:main
ports: [{ containerPort: 8080 }]
env:
- name: PORT
value: "8080"
readinessProbe:
httpGet: { path: /health, port: 8080 }
initialDelaySeconds: 5
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 500m, memory: 256Mi }
# skeleton/deploy/overlays/dev/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: ${{ values.name }}
resources:
- ../../base
images:
- name: ghcr.io/acme-engineering/${{ values.name }}
newTag: main
The base manifest deliberately doesn’t pin a version. The dev overlay floats on main, which matches the ArgoCD application’s targetRevision: main. Production overlays pin specific tags promoted via PR. Don’t let dev’s float-on-main behavior leak into production.
5. The ArgoCD Side
ArgoCD 2.13 watches the GitOps repo and reconciles applications. The bootstrap for the cluster is itself an ArgoCD application (the “app of apps” pattern), which sweeps in every environments/*/applications/*.yaml:
# gitops/bootstrap/app-of-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: app-of-apps-dev
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/acme-engineering/gitops.git
targetRevision: main
path: environments/dev/applications
directory: { recurse: true }
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated: { prune: true, selfHeal: true }
When the scaffolder writes the new application file, the app-of-apps detects it, ArgoCD creates the Application, ArgoCD then reconciles that into a real deployment. Sync intervals in 2.13 default to three minutes. Drop it to one for dev cluster snappiness:
data:
timeout.reconciliation: 60s
timeout.hard.reconciliation: 0s
(That’s a snippet for the argocd-cm ConfigMap.)
user ---> Backstage scaffolder ---> GitHub (source repo)
|
+---> GitOps repo (application yaml)
|
v
ArgoCD (watches repo)
|
v
Kubernetes dev cluster
|
v
Backstage entity page (shows pods)
6. The Entity Page Loop Closes
The final piece is the Backstage entity page showing real signals. The Kubernetes plugin renders pods, services, and recent events when the entity carries the right annotation:
metadata:
annotations:
backstage.io/kubernetes-id: ${{ values.name }}
argocd/app-name: ${{ values.name }}
The Kubernetes plugin is configured in app-config.yaml:
kubernetes:
serviceLocatorMethod: { type: 'multiTenant' }
clusterLocatorMethods:
- type: 'config'
clusters:
- name: dev
url: https://dev.k8s.acme.internal
authProvider: 'serviceAccount'
serviceAccountToken: ${K8S_DEV_SA_TOKEN}
skipTLSVerify: false
caData: ${K8S_DEV_CA_DATA}
For the ArgoCD plugin, add the Roadie plugin and configure it to talk to the same ArgoCD instance the scaffolder targeted. The entity page now has a tab showing the ArgoCD sync status, the last sync’s commit, and links to the ArgoCD UI for full history.
The first-day experience: an engineer who’s never seen the platform before clicks through the scaffolder, watches the “Open in Backstage” link appear, and within three minutes can see their own running pod’s logs and metrics from the same UI. That feedback loop is the whole game.
7. Adding Permissions
The default permission policy on a fresh portal is allow-all, which is fine for the first week and dangerous within a month. Replace it with a policy that gates scaffolder execution and catalog deletions:
// packages/backend/src/permissions.ts
import { createBackendModule } from '@backstage/backend-plugin-api';
import { policyExtensionPoint } from '@backstage/plugin-permission-node/alpha';
import { AuthorizeResult } from '@backstage/plugin-permission-common';
import { catalogEntityDeletePermission } from '@backstage/plugin-catalog-common/alpha';
import { actionExecutePermission, taskCreatePermission } from '@backstage/plugin-scaffolder-common/alpha';
export const permissionsModule = createBackendModule({
pluginId: 'permission',
moduleId: 'acme-policy',
register(reg) {
reg.registerInit({
deps: { policy: policyExtensionPoint },
async init({ policy }) {
policy.setPolicy({
handle: async (request, user) => {
const refs = user?.identity.ownershipEntityRefs ?? [];
const isPlatform = refs.includes('group:default/team-platform');
if (request.permission.name === catalogEntityDeletePermission.name) {
return { result: isPlatform ? AuthorizeResult.ALLOW : AuthorizeResult.DENY };
}
if (request.permission.name === actionExecutePermission.name) {
const dangerous = ['acme:gitops:add-app-prod', 'acme:vault:create-policy'];
if (dangerous.includes(request.permission.resourceRef ?? '')) {
return { result: isPlatform ? AuthorizeResult.ALLOW : AuthorizeResult.DENY };
}
}
return { result: AuthorizeResult.ALLOW };
},
});
},
});
},
});
New hires get to run the onboarding template (dev environment, low blast radius). They cannot delete catalog entities or run prod-targeted actions. Membership in team-platform lifts the restriction.
Common Pitfalls
- Onboarding template that drops the user in a confused state. Test the template by having a new hire literally use it on their first day with someone watching. Every failure point you fix in that first observation pays back ten times. The classic failure is the entity page showing “no Kubernetes data” because the
backstage.io/kubernetes-idannotation didn’t match the deployment label. - Scaffolder direct-talks to ArgoCD. Skipping the GitOps step and calling ArgoCD’s API from the template seems faster. It is, until you need an audit trail of what changed when. Always go through Git.
- No dev environment to break in. New hires need a sandbox cluster where deploying broken code doesn’t matter. If your only cluster is prod-ish, the onboarding flow has to be much more defensive, and the loop is no longer fast.
- Permission policy that’s too restrictive day one. If the new hire can’t run the onboarding template until a manual approval, you’ve defeated the purpose. Make the policy precise. Open everything except the genuinely dangerous actions.
Troubleshooting
- Scaffolder succeeds but ArgoCD doesn’t show the new app. The app-of-apps Application is out of sync. ArgoCD’s recursive directory mode only picks up new files on its next reconciliation. Force-sync the parent or wait the sync interval. Logs will show “no manifest changes” if the file isn’t being seen.
- Entity page Kubernetes tab is empty. The annotation on the entity and the labels on the deployment don’t match. Specifically, the plugin filters pods where
backstage.io/kubernetes-idannotation on the entity equals the value of the label of the same name on the workload. Yes, “label of the same name as the annotation”. Easy to miss. - Sign-in fails for new hire with “user entity not found”. The catalog hasn’t refreshed since the user was added to the GitHub org. Either wait the schedule out, or trigger a manual catalog refresh via the catalog API.
Wrapping Up
A new hire shipping a real deploy in their first morning is a teachable moment that reverberates for months. They learn the platform’s habits before learning to fight them. The infrastructure to make this happen isn’t exotic: a portal, a templated scaffolder, GitOps, and an entity page that pulls live signals. It’s making them all reliable together that matters.
The official Backstage docs cover each piece in depth. The next post in this series steps back and looks at service catalog design at scale.