Productivity Metrics That Actually Matter

Productivity article cover illustration on a gradient background

December 23, 2022 · 5 min read · by Muhammad Amal programming

TL;DR — Don’t measure lines of code or commits. Use DORA (deploy frequency, lead time, MTTR, change failure rate) or SPACE (satisfaction, performance, activity, communication, efficiency). AI tools shift the inputs, not the outcomes. Outcomes still matter.

After AI legal , the operational layer. With AI tools claiming 25-50% productivity gains, the question is “by what measure?” Most existing measures don’t capture it well.

What NOT to measure

Lines of code, commits per day, hours worked — all bad. They:

Reward verbose code
Reward many small commits over fewer thoughtful ones
Encourage padding

Worse: AI tools make all of these easier to inflate. A Copilot user can write more lines per hour; doesn’t mean shipping more value.

Don’t go there.

DORA — the four metrics

DevOps Research and Assessment (DORA) metrics. Mature; widely adopted.

1. Deployment frequency. How often do you ship to production?

Elite: multiple times per day
High: between weekly and daily
Medium: between weekly and monthly
Low: less than monthly

2. Lead time for changes. From commit to production.

Elite: < 1 hour
High: < 1 day
Medium: < 1 week
Low: > 1 week

3. Mean Time to Recovery (MTTR). When something breaks, how fast do you fix it?

Elite: < 1 hour
High: < 1 day
Medium: < 1 day
Low: > 1 week

4. Change failure rate. What % of deploys cause incidents?

Elite: 0-15%
High: 16-30%
Medium: 16-30%
Low: 16-30%

These four cover speed (1, 2) and reliability (3, 4). Improving all four together = real improvement.

AI tools shift #1 and #2 (you ship more often, faster). #3 and #4 depend on judgment AI doesn’t help with directly.

SPACE — broader framework

For human-side measurement, SPACE:

Satisfaction: are engineers happy?
Performance: are systems and code working?
Activity: are PRs being merged, code being written?
Communication & collaboration: are decisions reaching the right people?
Efficiency & flow: is work moving without blockers?

Each tracked with mix of metrics, surveys, anecdote. SPACE pairs well with DORA: DORA for output; SPACE for sustainability.

What changes with AI tools

Output metrics likely improve:

Deploy frequency may rise (faster scaffolding)
Lead time may shrink
Lines of code per dev definitely rises (and matters less)

Outcome metrics might not change:

Customer satisfaction
Revenue per engineer
Number of features shipped that actually got used

If AI makes you 25% faster at writing code but you ship the same features, the productivity gain didn’t translate to outcomes. Common pattern.

A useful dashboard

For a 10-engineer team:

┌─────────────────────────────────────────────────────────────┐
│ Engineering Health — Last 30 Days                            │
├─────────────────────────────────────────────────────────────┤
│ Deploy frequency:      8.2/day      ●●●○ (target: 10/day)   │
│ Lead time:             4.5 hours    ●●●● (target: < 8h)     │
│ MTTR:                  45 min       ●●●● (target: < 2h)     │
│ Change failure rate:   12%          ●●●○ (target: < 15%)    │
├─────────────────────────────────────────────────────────────┤
│ PRs merged:            142          (-5% vs prev period)    │
│ Reviews avg time:      3.2 hours    (target: < 8h)          │
│ Build success rate:    97%          ●●●●                    │
├─────────────────────────────────────────────────────────────┤
│ Engineer satisfaction: 7.2/10       (quarterly survey)      │
│ On-call pages:         18 / week    (-30% vs prev quarter)  │
│ Postmortems published: 3            (process working)       │
└─────────────────────────────────────────────────────────────┘

Three sections: DORA, supporting activity metrics, human / health metrics. Reviewed weekly; trends matter more than absolute numbers.

What I personally track

Less formal:

Time from “idea to user-facing feature” — best signal of overall throughput
Number of unplanned interrupts per week — measure of how much OOO / on-call dominates
Fraction of week in “deep work” (uninterrupted 90+ min blocks) — leading indicator of sustained output

These aren’t team-level metrics. They’re personal health checks.

Anti-patterns

Stack ranking by metrics. Optimizing the metric distracts from the work. Goodhart’s law.

Per-engineer DORA metrics. Deploy frequency is a team metric. Per-engineer breakdown is meaningless and gameable.

Adding tools to track activity. Productivity tracking software (“how many keystrokes per hour”) shows lack of trust; engineers performatively type. Don’t.

AI-tool-specific metrics. “How many Copilot completions accepted?” doesn’t measure productivity. Ignore.

Optimization without buy-in. Imposing DORA metrics top-down without team buy-in = check-the-box compliance. Engineers should want to know the numbers.

When metrics help

For teams looking to identify bottlenecks:

Low deploy frequency? CI is slow OR release process is heavy OR fear of breaking things
High change failure rate? Test gaps OR poor incident response OR over-aggressive deploys
High MTTR? Poor observability OR no on-call rotation OR runbooks missing
High lead time? Approval bottleneck OR code review backlog OR slow CI

Each metric points at specific systemic issues. Address those.

When metrics don’t help

For high-functioning teams already shipping well:

Marginal metric improvement is rarely worth the optimization cost
Focus on outcomes (customer impact, revenue, growth) instead
Engineers’ intuition about “what’s slow” is often more useful than the dashboard

Use metrics as diagnostics. Stop measuring constantly once they’re green.

Common Pitfalls

Lines-of-code metric. Always wrong. AI makes it even more meaningless.

Commits per day. Encourages padding.

Hours worked. Encourages presenteeism.

Per-engineer DORA. Team metric, not individual.

Tools for activity tracking. Trust your team.

Optimization to hit a number. The number isn’t the goal; the underlying improvement is.

Wrapping Up

Measure outcomes more than inputs. DORA + SPACE > LoC + commits. AI tools shift inputs; outcomes still depend on judgment.

What NOT to measure

DORA — the four metrics

SPACE — broader framework

What changes with AI tools

A useful dashboard

What I personally track

Anti-patterns

When metrics help

When metrics don’t help

Common Pitfalls

Wrapping Up

Related posts

Prometheus 101, Metrics, Scraping, and PromQL

Measuring Support Engineering Effectiveness, Metrics That Matter

Why Senior Developers Are Moving to Neovim in 2024

Building a Unified Developer Productivity Dashboard With n8n, Postgres, and Metabase

Prompt-Style Comments to Steer Copilot

A Year With GitHub Copilot in Production

November Retro, Security Hardening Sprint

Audit Logging for Backend APIs

Let’s Start a Project