Measuring Developer Experience with DORA and SPACE in Backstage

Measuring Developer Experience with DORA and SPACE in Backstage

October 29, 2025 · 11 min read · by Muhammad Amal programming

TL;DR — DORA gives you four lagging indicators; SPACE gives you a frame for the qualitative half. Both live naturally inside a service catalog because the catalog already knows who owns what. This tutorial wires deployment events, change failure signals, and developer surveys into a Backstage 1.34 plugin that shows the metrics per team and per service.

DORA metrics (deployment frequency, lead time, change fail rate, mean time to recover) are useful and overused. They tell you whether your engineering org is shipping at the right pace and whether it’s breaking too much along the way. They don’t tell you why. The SPACE framework (satisfaction, performance, activity, communication, efficiency) covers the why, but it requires self-reported data, which means a survey loop nobody has time to run unless it’s built into the developer’s daily tools.

Backstage is the natural place to surface both, because the catalog already knows ownership. Connect ArgoCD’s deployment events, GitHub’s PR data, and a lightweight in-portal survey, and you’ve got a real DX measurement stack. This tutorial gets you there. It builds on the catalog design and Backstage portal posts earlier in this series.

The pitfall to avoid up front: don’t use these metrics to grade individuals or teams. Use them to identify systemic problems. A team with a high change fail rate is showing you that something in the platform or process is broken. The right response is investigation, not a performance review.

1. The Data Sources

The four DORA metrics need three data sources at minimum:

Deployment events: when a service was deployed, with the commit SHA and the target environment. Sourced from ArgoCD’s sync events or directly from CI.
Pull request data: when a PR was opened, when it was merged, what commits it contained. Sourced from GitHub’s API.
Incident data: when an incident was opened, when it was resolved, which services it affected. Sourced from PagerDuty, Opsgenie, or an in-house system.

The SPACE metrics add a fourth source: developer self-report data, sourced from a recurring in-portal survey.

I store the raw events in a Postgres schema separate from the Backstage catalog schema:

CREATE SCHEMA dora;

CREATE TABLE dora.deployments (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  entity_ref      TEXT NOT NULL,
  environment     TEXT NOT NULL,
  commit_sha      TEXT NOT NULL,
  deployed_at     TIMESTAMPTZ NOT NULL,
  source          TEXT NOT NULL, -- 'argocd', 'github-actions', 'manual'
  failed          BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE INDEX deployments_entity_idx ON dora.deployments(entity_ref, deployed_at DESC);

CREATE TABLE dora.pull_requests (
  id              BIGINT PRIMARY KEY,
  entity_ref      TEXT NOT NULL,
  opened_at       TIMESTAMPTZ NOT NULL,
  merged_at       TIMESTAMPTZ,
  first_commit_at TIMESTAMPTZ NOT NULL,
  author          TEXT NOT NULL
);

CREATE TABLE dora.incidents (
  id              TEXT PRIMARY KEY,
  affected_refs   TEXT[] NOT NULL,
  opened_at       TIMESTAMPTZ NOT NULL,
  resolved_at     TIMESTAMPTZ,
  caused_by_change BOOLEAN
);

The entity_ref column matches Backstage’s catalog refs (component:default/payments-api). This is the join column for everything.

2. Ingesting Deployment Events

ArgoCD 2.13 emits Kubernetes events on every sync. The cleanest way to capture them is a Kubernetes watcher that turns those events into rows in dora.deployments. Run it as a deployment in the same cluster as ArgoCD:

// services/dora-ingester/src/argocd-watcher.ts
import * as k8s from '@kubernetes/client-node';
import { Pool } from 'pg';

const kc = new k8s.KubeConfig();
kc.loadFromCluster();
const client = kc.makeApiClient(k8s.CustomObjectsApi);
const db = new Pool({ connectionString: process.env.DATABASE_URL });

async function watchSyncs() {
  const informer = k8s.makeInformer(
    kc,
    '/apis/argoproj.io/v1alpha1/applications',
    () => client.listClusterCustomObject({ group: 'argoproj.io', version: 'v1alpha1', plural: 'applications' }),
  );

  informer.on('update', async (app: any) => {
    const status = app.status?.operationState;
    if (!status || status.phase !== 'Succeeded') return;
    if (!status.finishedAt) return;

    const entityRef = `component:default/${app.metadata.name}`;
    const env = app.spec.project ?? 'unknown';
    const sha = status.syncResult?.revision;
    const finishedAt = new Date(status.finishedAt);

    await db.query(
      `INSERT INTO dora.deployments (entity_ref, environment, commit_sha, deployed_at, source, failed)
       VALUES ($1, $2, $3, $4, 'argocd', $5)
       ON CONFLICT DO NOTHING`,
      [entityRef, env, sha, finishedAt, status.syncResult?.resources?.some((r: any) => r.status === 'SyncFailed')],
    );
  });

  await informer.start();
}

The ingester runs as a sidecar in the same namespace as ArgoCD with read access on applications.argoproj.io. Every successful sync becomes a dora.deployments row. The deduplication via ON CONFLICT DO NOTHING would need a unique constraint on (entity_ref, commit_sha, environment, deployed_at); add one to the migration.

For deployments outside ArgoCD (legacy CI pipelines that kubectl-apply), publish a webhook event from CI to the same ingester’s HTTP endpoint.

3. Ingesting Pull Request Data

GitHub’s GraphQL API lets you batch PR queries efficiently. A simple worker pulls merged PRs hourly:

// services/dora-ingester/src/github-prs.ts
import { graphql } from '@octokit/graphql';
import { Pool } from 'pg';

const gh = graphql.defaults({ headers: { authorization: `token ${process.env.GITHUB_TOKEN}` } });
const db = new Pool({ connectionString: process.env.DATABASE_URL });

const QUERY = `
  query($org: String!, $since: DateTime!, $cursor: String) {
    organization(login: $org) {
      repositories(first: 50, after: $cursor) {
        pageInfo { hasNextPage endCursor }
        nodes {
          name
          pullRequests(states: MERGED, first: 50, orderBy: { field: UPDATED_AT, direction: DESC }) {
            nodes {
              number
              createdAt
              mergedAt
              author { login }
              commits(first: 1) { nodes { commit { committedDate } } }
            }
          }
        }
      }
    }
  }
`;

async function fetchAndStore(since: Date) {
  let cursor: string | null = null;
  do {
    const res: any = await gh(QUERY, { org: 'acme-engineering', since: since.toISOString(), cursor });
    for (const repo of res.organization.repositories.nodes) {
      const entityRef = `component:default/${repo.name}`;
      for (const pr of repo.pullRequests.nodes) {
        if (!pr.mergedAt) continue;
        await db.query(
          `INSERT INTO dora.pull_requests (id, entity_ref, opened_at, merged_at, first_commit_at, author)
           VALUES ($1, $2, $3, $4, $5, $6)
           ON CONFLICT (id) DO UPDATE SET merged_at = $4`,
          [pr.number, entityRef, pr.createdAt, pr.mergedAt, pr.commits.nodes[0].commit.committedDate, pr.author.login],
        );
      }
    }
    cursor = res.organization.repositories.pageInfo.hasNextPage ? res.organization.repositories.pageInfo.endCursor : null;
  } while (cursor);
}

The first_commit_at field is the key for lead-time calculations. Lead time is “time from first commit to deploy in production”, not “PR open to merge”. Distinguishing the two matters when you compare to the DORA benchmarks, which use the broader definition.

4. Computing the Metrics

The four DORA metrics get computed by SQL on the ingested data:

-- Deployment frequency: deployments per day per entity, last 30 days
SELECT
  entity_ref,
  COUNT(*) / 30.0 AS deploys_per_day
FROM dora.deployments
WHERE environment = 'prod'
  AND deployed_at > NOW() - INTERVAL '30 days'
GROUP BY entity_ref;

-- Lead time for changes: median hours from first commit to prod deploy
WITH deploys AS (
  SELECT d.entity_ref, d.commit_sha, d.deployed_at, pr.first_commit_at
  FROM dora.deployments d
  JOIN dora.pull_requests pr ON pr.entity_ref = d.entity_ref
  WHERE d.environment = 'prod'
    AND d.deployed_at > NOW() - INTERVAL '30 days'
    AND pr.merged_at < d.deployed_at
    AND pr.merged_at > d.deployed_at - INTERVAL '7 days'
)
SELECT
  entity_ref,
  PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (deployed_at - first_commit_at)) / 3600) AS median_lead_hours
FROM deploys
GROUP BY entity_ref;

-- Change fail rate: % of deploys followed by an incident within 24h
WITH counted AS (
  SELECT
    d.entity_ref,
    d.id,
    EXISTS (
      SELECT 1 FROM dora.incidents i
      WHERE d.entity_ref = ANY(i.affected_refs)
        AND i.opened_at BETWEEN d.deployed_at AND d.deployed_at + INTERVAL '24 hours'
        AND i.caused_by_change = TRUE
    ) AS caused_incident
  FROM dora.deployments d
  WHERE d.environment = 'prod'
    AND d.deployed_at > NOW() - INTERVAL '30 days'
)
SELECT entity_ref,
       100.0 * AVG(CASE WHEN caused_incident THEN 1 ELSE 0 END) AS change_fail_rate_pct
FROM counted GROUP BY entity_ref;

-- Mean time to recover: median minutes from incident open to resolved
SELECT
  unnest(affected_refs) AS entity_ref,
  PERCENTILE_CONT(0.5) WITHIN GROUP (
    ORDER BY EXTRACT(EPOCH FROM (resolved_at - opened_at)) / 60
  ) AS median_mttr_minutes
FROM dora.incidents
WHERE opened_at > NOW() - INTERVAL '30 days'
  AND resolved_at IS NOT NULL
GROUP BY entity_ref;

Materialize these as views and refresh nightly. Don’t compute them on every page load. They’re slow enough that you’ll feel it.

5. The Backstage Plugin

Build a Backstage plugin that surfaces these metrics on the entity page. The pattern is the same as the deployments plugin from the earlier custom-plugins post. Backend route, frontend card. The card:

// plugins/dora/src/components/DoraCard.tsx
import React from 'react';
import { useEntity } from '@backstage/plugin-catalog-react';
import { useApi } from '@backstage/core-plugin-api';
import { InfoCard, Progress } from '@backstage/core-components';
import useAsync from 'react-use/lib/useAsync';
import { Grid, Typography } from '@material-ui/core';
import { doraApiRef } from '../api';

const Metric = ({ label, value, unit, benchmark }: any) => (
  <Grid item xs={6} md={3}>
    <Typography variant="overline">{label}</Typography>
    <Typography variant="h4">{value} <Typography component="span" variant="body2">{unit}</Typography></Typography>
    <Typography variant="caption" color="textSecondary">{benchmark}</Typography>
  </Grid>
);

export const DoraCard = () => {
  const { entity } = useEntity();
  const api = useApi(doraApiRef);
  const { value, loading } = useAsync(() => api.metrics(`component:default/${entity.metadata.name}`));

  if (loading || !value) return <Progress />;

  return (
    <InfoCard title="DORA metrics (last 30 days)">
      <Grid container spacing={2}>
        <Metric label="Deploy frequency" value={value.deploysPerDay.toFixed(2)} unit="/ day"
                benchmark={value.deploysPerDay > 1 ? 'Elite' : value.deploysPerDay > 0.14 ? 'High' : 'Medium'} />
        <Metric label="Lead time" value={value.medianLeadHours.toFixed(1)} unit="hrs"
                benchmark={value.medianLeadHours < 24 ? 'Elite' : value.medianLeadHours < 168 ? 'High' : 'Medium'} />
        <Metric label="Change fail rate" value={value.changeFailRate.toFixed(1)} unit="%"
                benchmark={value.changeFailRate < 5 ? 'Elite' : value.changeFailRate < 10 ? 'High' : 'Medium'} />
        <Metric label="MTTR" value={value.medianMttrMinutes.toFixed(0)} unit="min"
                benchmark={value.medianMttrMinutes < 60 ? 'Elite' : value.medianMttrMinutes < 1440 ? 'High' : 'Medium'} />
      </Grid>
    </InfoCard>
  );
};

The benchmark thresholds come from the 2024 DORA report; they’re a reasonable anchor. Don’t obsess over the elite tier. Steady improvement is the goal.

6. The SPACE Survey

For the qualitative side, run a quarterly survey from inside Backstage. The plugin renders a banner asking three to five questions, the responses go into a new Postgres table, and the team-aggregated results show up on the team’s Group entity page.

CREATE TABLE dora.survey_responses (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_ref        TEXT NOT NULL,
  group_ref       TEXT NOT NULL,
  survey_id       TEXT NOT NULL,
  submitted_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  satisfaction    INT,  -- 1-5
  flow_state      INT,  -- 1-5
  blockers_text   TEXT,
  CONSTRAINT range_check CHECK (satisfaction BETWEEN 1 AND 5 AND flow_state BETWEEN 1 AND 5)
);

Two numeric questions and one open text. People will answer two numbers in fifteen seconds. Twelve questions get you a 30 percent response rate; three questions get you 80.

The frontend renders a small banner on the Backstage home page if the current user hasn’t responded to the active survey:

const SurveyBanner = () => {
  const api = useApi(surveyApiRef);
  const { value: pending } = useAsync(() => api.pendingSurveys());
  if (!pending?.length) return null;
  return (
    <Banner severity="info">
      Quick check-in (under a minute):{' '}
      <Link to={`/dora-survey/${pending[0].id}`}>{pending[0].title}</Link>
    </Banner>
  );
};

Aggregate responses per team weekly. Don’t show individual responses to managers. Don’t show them ever, in fact; the credibility of the survey collapses the moment someone suspects their answer is identifiable.

+----------+    sync    +-----------+      +---------+
|  ArgoCD  +----------->+ ingester  +----->+ Postgres|
+----------+            +-----------+      +----+----+
+----------+    PRs                              |
|  GitHub  +-------------------+                 |
+----------+                   |                 |
+----------+    incidents      |  views          v
|  PagerDuty+------------------+----------> Backstage
+----------+                                  entity card
+----------+    responses                       ^
|  Survey  +-----------------------------------+
+----------+

7. Closing the Loop

Metrics that nobody acts on are worse than no metrics, because they create the illusion of feedback without the substance. Two practices keep the loop closed.

First, the entity page DORA card links to a “improvement candidates” view that suggests next actions. If lead time is high, the suggestion is “median PR review time on this service is 14 hours; review the PR ownership rules”. If change fail rate is high, “5 of the last 12 deploys caused an incident; consider adding a smoke test workflow”.

Second, the quarterly survey results inform OKRs for the platform team, not for individual product teams. “Improve the median flow-state score from 3.2 to 3.6” is a platform team goal. The teams just answer the survey honestly.

Common Pitfalls

Comparing teams to each other. Same service type, same domain, same scope, and yet two teams will have different metrics because they have different histories. Use the trends per team. The cross-team comparison is rarely meaningful.
Counting failed deploys in deploy frequency. The DORA definition counts successful prod deploys only. Easy to get wrong if you don’t filter on success in the ingester.
Pulling lead time from PR merged-at to deploy. The original DORA paper measures first-commit to deploy. PR merge-to-deploy is shorter and flatters the platform. Be consistent and use the longer definition.
Letting the survey become mandatory. The instant compliance becomes the goal, the data quality drops. Survey is opt-in. Keep it short. Show the team-level results back to the team so people see the value.

Troubleshooting

No deploys showing up in dora.deployments. The ArgoCD watcher’s role doesn’t have applications.argoproj.io/get cluster-wide, or the informer crashed silently on a network blip. Add a Prometheus counter on the informer’s update handler and alert on it dropping to zero.
Lead time metric returns NULL. The join from deployments to pull_requests couldn’t find a PR for the deployed commit. Usually means the PR was merged via squash and the deployed SHA isn’t a PR head commit. Add a merge_commit_sha column to pull_requests and join on that as a secondary match.
Survey response rate drops below 30%. The survey is too long, or the prompts aren’t relevant. Cut questions until you’re back at 70%+ response. Three questions is fine for ongoing tracking; do longer surveys once a year if you must.

Wrapping Up

DORA and SPACE inside Backstage isn’t a dashboard project. It’s a long-running data ingestion stack with a small UI on top. The ingesters take a week to write and a year to harden. Once they’re stable, you have a feedback loop that catches systemic problems early, and a quarterly survey that tells you whether the platform team’s work is showing up where it matters. Don’t expect overnight insight, expect a year of compounding small adjustments.

The DORA reports and the official Backstage docs are both worth keeping bookmarked. This was the last post in this series; the earlier posts cover the portal, templates, plugins, TechDocs, Score, onboarding, and catalog design.

1. The Data Sources

2. Ingesting Deployment Events

3. Ingesting Pull Request Data

4. Computing the Metrics

5. The Backstage Plugin

6. The SPACE Survey

7. Closing the Loop

Common Pitfalls

Troubleshooting

Wrapping Up

Related posts

Developer Onboarding with Backstage and ArgoCD, An End to End Tutorial

Designing a Service Catalog Developers Actually Use

Golden Paths, How Self-Service Actually Sticks

Service Catalog Design That Scales, Lessons From Production

TechDocs at Scale with Backstage, A Production Setup

Writing Custom Backstage Plugins in TypeScript, A Hands On Tutorial

Golden Paths and Software Templates in Backstage, A Step by Step Guide

Building an Internal Developer Portal with Backstage 1.34

Let’s Start a Project