background-shape
Measuring Developer Experience, DORA Metrics in Practice
January 22, 2024 · 7 min read · by Muhammad Amal programming

TL;DR — DORA’s four metrics still hold up in 2024 but they’re not enough on their own. Pair them with SPACE-style survey data and one platform-specific signal (template adoption). Don’t track per-individual. Don’t tie any of it to performance reviews.

We’ve built the platform, shipped a golden path, and figured out the catalog. The next question every executive asks is: “Is it working?” That question is harder than it sounds, and the way you answer it shapes the platform team’s incentives for the next two years.

I’ve been on platform teams where DX metrics turned into a stick used by leadership to hit engineers with. I’ve also been on teams where they were genuinely useful navigation aids. The difference is mostly in how you set them up, who owns them, and what you don’t measure.

DORA, briefly

The DORA team’s research gave us four metrics:

  1. Deployment frequency — how often does the org ship to production?
  2. Lead time for changes — commit to production.
  3. Change failure rate — what percent of deploys cause a problem requiring intervention?
  4. Mean time to recovery (MTTR) — when something does break, how fast does it recover?

The framing that’s held up: high performers improve all four. Optimizing one at the expense of another is usually a sign you’ve gamed the metric. You can ship to prod twenty times a day if you skip tests and let half your changes break. That’s not “high performing.”

The 2023 Accelerate State of DevOps report added a fifth pillar — reliability — and clarified that the four-metrics view isn’t a leaderboard. It’s a diagnostic.

Instrumenting DORA without overhead

The pragmatic approach is to derive DORA metrics from your existing CI/CD systems. Don’t build a survey tool. Don’t ask engineers to self-report anything.

For deployment frequency and lead time:

# GitHub Actions workflow that emits deployment events
name: Deploy
on:
  push:
    branches: [main]
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Deploy
        run: ./scripts/deploy.sh
      - name: Emit deployment event
        env:
          ENDPOINT: ${{ secrets.METRICS_ENDPOINT }}
        run: |
          curl -X POST "$ENDPOINT/deployments" \
            -H "Content-Type: application/json" \
            -d @- <<EOF
          {
            "service": "${{ github.event.repository.name }}",
            "commit": "${{ github.sha }}",
            "deployed_at": "$(date -u +%FT%TZ)",
            "first_commit_at": "$(git log -1 --format=%cI ${{ github.sha }})"
          }
          EOF

That POST hits a small service that records the event in Postgres. Deployment frequency is count() per day per service. Lead time is deployed_at - first_commit_at for each deployment, with the first commit being the earliest commit in the merged PR (you may need to track this more precisely if your branch model is unusual).

For change failure rate and MTTR, the cheap version is to ingest incident events from PagerDuty or your incident management tool. A “change failure” is an incident opened within a defined window (say, 60 minutes) of a deployment to the affected service. MTTR is resolved_at - created_at.

This approach gives you DORA metrics with no engineer effort and minimal new infrastructure.

Where DORA breaks down

Two systematic issues:

It’s organization-level by design. DORA scores tell you the org is shipping fast or slow, but they don’t tell you why a particular team is stuck. A team migrating off a legacy stack will have terrible lead times for six months and that’s fine. Aggregate metrics can hide both signals.

It can’t see friction. A team can have great DORA numbers and still be miserable because they spend half their week fighting CI flakiness. DORA captures the throughput. It misses the experience of producing it.

This is where SPACE comes in.

SPACE for the parts DORA misses

SPACE (Forsgren, Storey, Maddila, Zimmermann, Houck — 2021) is a framework, not a fixed set of metrics. The five categories:

  • Satisfaction and wellbeing
  • Performance
  • Activity
  • Communication and collaboration
  • Efficiency and flow

The useful piece of SPACE in 2024 is the explicit acknowledgment that you need both system-level metrics (DORA-style) and human-reported metrics (surveys) to get the full picture.

The lightest-weight implementation: a quarterly developer experience survey, anonymous, six questions:

  1. How often did infra/tooling friction block your work last quarter? (Never / Rarely / Sometimes / Often / Always)
  2. When you needed help from the platform team, how was it? (Very poor / Poor / OK / Good / Excellent)
  3. How confident are you in our deployment process?
  4. How satisfied are you with the developer tooling overall?
  5. What’s the single biggest source of friction in your work?
  6. What’s the single most useful platform feature?

Sample size matters. Below ~50 respondents the trend lines are noise. Above that you can track quarter-over-quarter.

Critically: this is a survey of engineers as customers of the platform. It is not an individual performance instrument. The platform team owns the metrics. They are not shared per-engineer with management.

The platform-specific signal

DORA tells you about the whole org. SPACE tells you how engineers feel. Neither tells you whether the platform is the cause or the effect.

The third signal I track is golden path adoption:

  • Percent of services created in the last 90 days that used a platform template (target: >80%)
  • Percent of services on the current major version of their template (target: >70%)
  • Number of services that have drifted from the template defaults (target: trending down)

These are pure platform metrics. They tell the platform team whether their product is being used. Combined with DORA (is the org shipping?) and SPACE (do engineers like working here?), you can triangulate.

-- A simplified query against catalog metadata + git activity
WITH service_first_commits AS (
  SELECT
    service_name,
    MIN(committed_at) AS created_at,
    MAX(template_version) AS template_version
  FROM commits
  WHERE path = 'catalog-info.yaml'
  GROUP BY service_name
)
SELECT
  COUNT(*) FILTER (WHERE template_version IS NOT NULL)::float
    / COUNT(*) AS template_adoption_rate
FROM service_first_commits
WHERE created_at > NOW() - INTERVAL '90 days';

That single number on a dashboard is more useful than ten vanity metrics about platform team output.

The surveillance trap

Every DX metrics conversation eventually hits the same fork: “Can we slice these by individual engineer?”

The answer is no. Not because the data isn’t available — it is — but because the moment you slice DORA per engineer, you change behavior in destructive ways. People stop helping teammates because their own throughput dips. They avoid hard problems because failure rates spike on hard work. They game the metric instead of doing the work.

The 2021 Microsoft research (Forsgren et al.) on SPACE was explicit about this. The original DORA work was the same. Pretend it’s tempting all you want — don’t do it.

The rule I follow: any metric collected on individual engineers stays with that individual and is not shared upward. Aggregate metrics are shared. Team-level metrics are shared with the team. Per-engineer metrics, if collected at all, exist only for the engineer’s self-reflection.

Common Pitfalls

  • Tying any DX metric to compensation or reviews. It will be gamed within a quarter and lose all signal.
  • Measuring without context. A jump in lead time during a migration isn’t a regression. Annotate the timeline.
  • Vanity dashboards. “Deployments per day” without “change failure rate” is misleading. Always pair throughput and stability.
  • Survey fatigue. Once per quarter is enough. Weekly surveys train people to ignore them.
  • Hiding the data from engineers. The platform team should publish the dashboard. Engineers should be able to see how the org is doing. Hiding the numbers signals distrust.
  • No baseline. Collect two quarters of data before you set targets. Otherwise you’re chasing noise.

The one I got wrong: I tried to track 14 different DX metrics in my first platform org. The dashboard was unreadable and nobody made decisions from it. Cut it to five and used them. The metrics you don’t use are worse than the ones you don’t have, because they cost effort to collect.

Wrapping Up

DORA gives you the throughput-and-stability view. SPACE gives you the human view. Adoption metrics tell the platform team whether the product is real. Three signals, set up once, reviewed quarterly, never used as a performance stick. That’s the recipe.

Next post moves on to one of the most interesting platform engineering developments of 2024 — workload specifications with Score, and what they let you decouple from your IaC stack. We’ve talked plenty about Kubernetes; it’s time to talk about decoupling from it.