How to measure and improve developer productivity with Nicole Forsgren

Executive overview

Most teams want to move faster but can't agree on what that means. The root problem — identified in 80% of engagements — is failing to define the goal before starting any measurement.

Two research-backed frameworks cut through this: DORA measures software delivery performance with four metrics; SPACE provides a structured way to pick balanced metrics for any complex creative work. Together they form a complete approach to measuring, diagnosing, and improving developer productivity.

Speed and stability are not a trade-off — they move in the same direction. Shipping smaller changes more often produces more stable systems.

The DORA four metrics

  • Lead time for changes: time from code commit to production
  • Deployment frequency: how often code is deployed
  • Mean time to restore (MTTR): how long recovery takes after an incident
  • Change fail rate: percentage of changes that cause incidents requiring intervention

Elite performer benchmarks (2019, remain largely consistent):

  1. Deployment frequency: on-demand
  2. Lead time: less than a day
  3. MTTR: less than an hour
  4. Change fail rate: 0–15%

Why speed and stability move together

  • Frequent deployments force smaller batch sizes, which shrinks blast radius
  • Smaller changes are easier to debug and faster to roll back
  • Infrequent deployments cause large batch merges — more merge conflicts, harder root-cause analysis
  • The old ITIL assumption that two-week change freezes improve stability is wrong; they cause it
  • Company size has no statistically significant effect on these metrics — the benchmarks apply to startups and enterprises equally

The SPACE framework

SPACE provides five dimensions for picking balanced productivity metrics. Use at least three dimensions at once to avoid gaming any single signal.

  • S — Satisfaction and well-being: survey-based; correlates strongly with all other dimensions; early warning signal when things degrade
  • P — Performance: outcome of a process (e.g. reliability, change fail rate)
  • A — Activity: count-based metrics instrumented from systems (PRs, commits, deployments)
  • C — Communication and collaboration: how people and systems interact; includes code searchability, meeting load
  • E — Efficiency and flow: time through the system; number of hops a ticket takes; uninterrupted coding time

Dora is an implementation of SPACE focused on the outer loop (code commit to production). SPACE is the meta-framework for choosing metrics when you want to measure any specific capability.

Picking the right metrics

  • Never pick only activity metrics (commits, PRs, lines of code) — they create perverse incentives
  • Always pair metrics in tension: e.g. alert frequency (activity) vs. uninterrupted coding time (efficiency)
  • Add a satisfaction metric to surface what instrumentation cannot show
  • Run surveys periodically (every few months), not continuously
  • System data and people data are complements — each reveals blind spots the other misses
  • Version control data will never reveal code that isn't being committed; only surveys will

Common pitfalls

  • Starting without a clearly written problem statement — teams run for months then discover misalignment
  • Pursuing measurement only top-down or only bottom-up; both directions are needed
  • Picking all metrics from one SPACE dimension (usually activity)
  • Treating DORA benchmarks as the goal rather than a starting point — progress matters more than tier
  • Assuming AI tools mean you need fewer engineers: AI shifts time from writing to reviewing code, freeing cognitive capacity for harder problems, not halving headcount

How to start from nothing

  1. Write down the specific problem or goal — be precise about whether it's friction, culture, tooling, or something else
  2. Check whether any existing data or signal is already available
  3. If no data exists, interview a handful of developers: what are the biggest barriers to their productivity?
  4. Use the DORA quick check at dora.dev to benchmark current state and identify likely constraints for your industry and performance profile
  5. Use SPACE to select balanced metrics once you know what you want to improve
  6. Start heavy on survey data; shift toward instrumented system data as measurement matures

The four-box framework for measurement hypotheses

A structured approach to connecting a hypothesis to the data used to test it.

Draw two rows and two columns. The top row is labeled words; the bottom row is data. An arrow connects the left box to the right box in each row.

  • Top-left: the concept you believe is the cause (e.g. "customer satisfaction")
  • Top-right: the outcome you expect (e.g. "return customers")
  • Bottom-left: the data proxy for the cause (e.g. CSAT score, NPS)
  • Bottom-right: the data proxy for the outcome (e.g. return visits, referral links)

Why it works:

  • Forces clarity on what is actually being measured before touching data
  • Separates disagreements about the hypothesis (top row) from disagreements about the data (bottom row)
  • Prevents spurious correlations from being mistaken for causal relationships
  • Advanced mode: start from available data and work upward to articulate what relationship the data actually represents, then validate with interviews

AI and developer productivity

  • AI coding tools shift the ratio of time spent: roughly 50% of developer time is now reviewing AI-generated code rather than writing from scratch
  • Specific tasks (e.g. building an HTTP server) can be completed ~50% faster, but this is not the right productivity frame
  • The real benefit is cognitive offload that frees capacity for harder, more novel work
  • AI is changing the friction model, cognitive load expectations, and reliance patterns — existing metrics need revisiting
  • SPACE will likely need a new dimension around trust and over-reliance on AI-generated output
  • Impact on novices versus experts is an open and important research question

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.