How to measure AI developer productivity and improve developer experience

Executive overview

Most productivity metrics are a lie. Lines of code, PR counts, and even Dora metrics miss critical phenomena when AI is generating large portions of code. The real question is not how much code ships, but what value reaches customers and how fast.

Core insight: AI accelerates coding but developers aren't speeding up as much as expected because broken builds, flaky tests, trust gaps in AI-generated code, and poor processes remain unchanged — and those bottlenecks compound.

Nicole Forsgren, creator of the Dora and Space frameworks and author of the forthcoming book Frictionless, argues that measuring productivity requires aligning metrics to what leadership actually cares about — speed to market, margin, or transformation — and then building the data foundation to support those specific conversations.

Why most productivity metrics fail with AI

  • Lines of code was always a weak proxy; AI makes it trivially gameable — LLMs produce verbose code by default
  • Dora metrics (deployment frequency, lead time, MTTR, change fail rate) remain valid for pipeline health but miss new, earlier feedback loops AI enables
  • Space is a framework, not a prescriptive set of metrics, which makes it more durable: satisfaction, performance, activity, communication, efficiency and flow
  • A new dimension to add to Space: trust — LLMs are non-deterministic; code can't just be accepted, it must be evaluated for hallucinations, reliability, and style
  • Code survivability rate and which code originated from AI vs. humans are now meaningful downstream questions

Signs your team has room to improve

  • Builds are frequently breaking
  • Flaky tests producing false positives
  • High switching cost between projects — people avoid moving across teams because re-onboarding to different systems is as costly as being a new hire
  • Engineers constantly talk about "the system" being hard to work with
  • Context-switching friction is high

Flow state and cognitive load in an AI-assisted world

  • Three interdependent factors: flow state, cognitive load, feedback loops
  • AI interrupts traditional flow (prompt → wait → review → integrate) but can also create a new kind of flow at a higher level of abstraction
  • Senior engineers are building multi-agent workflows: define architecture up front, assign parallel agents to components, enforce API conventions at the start, review consolidated output — output is closer to production quality
  • Humans get roughly four hours of quality deep work per day; AI may make shorter blocks (45 minutes) productive by handling context re-entry and generating system diagrams to resume quickly
  • Reviewing code is now a larger share of work than writing it; rethinking daily structure around this is an open and important research question

What actually drives developer productivity beyond AI tooling

  • AI accelerates coding, not the whole system — broken processes and misaligned strategy remain the ceiling
  • Strategy determines what to ship; without it, you ship trash faster
  • Rapid prototyping and A/B testing timelines have collapsed from months to days, but only if infrastructure supports it
  • AI is particularly effective at: finding gnarly bugs, writing and spinning up unit tests, generating and cleaning up documentation
  • Better documentation improves AI tool output — agents rely on high-quality grounding data

The seven-step Frictionless framework

Nicole Forsgren and Abi Noda's framework for building or improving a developer experience program:

  1. Start the journey — listening tour, synthesise findings, visualise current-state tooling and workload
  2. Get a quick win — start small, pick achievable projects, share results visibly
  3. Use data to optimise — establish data foundation, run surveys for fast signal, start collecting new instrumentation
  4. Decide strategy and priority — use an evaluation framework to choose what to tackle next from the remaining list
  5. Sell your strategy — get feedback, communicate the rationale, don't make stakeholders work to understand value
  6. Drive change at your scale — grassroots (local scope) or top-down (global scope) or both; tailor the approach to your authority
  7. Evaluate progress and show value — loop back; track time savings, cost reduction, speed to value, risk reduction

How to measure the impact of AI on productivity today

  • Start by asking: what does your leadership chain care about most — market share, margin, velocity, or transformation?
  • Frame metrics in their language; don't make them translate
  • If the focus is market share: measure speed from idea to customer or idea to experiment
  • If the focus is margin: quantify time savings, cloud cost from test suite cleanup, vendor spend reductions, recovered headcount cost
  • If the focus is velocity: track feature-to-production or feature-to-experiment cycle time
  • Attribution is messy — disclose that both AI tooling and DevEx improvements likely contributed; don't pretend you can isolate one variable cleanly

Building the data foundation

  • If starting from scratch, surveys beat instrumentation: they're faster and don't assume you already have the right metrics
  • Ask developers to name their top three friction points (not all of them — three forces prioritisation) and rate frequency (hourly, daily, weekly, quarterly)
  • Open-text field catches signal you wouldn't have thought to ask for
  • Avoid multi-part survey questions ("were the build and test systems slow or complicated?") — they produce uninterpretable data; get help from someone familiar with survey design or use an LLM to review question quality
  • Satisfaction is more actionable than happiness — ask about satisfaction with specific tools and processes, not general wellbeing

Starting and scaling a developer experience team

  • Minimum viable team: two engineers plus a PM, PGM, or TPM for communication
  • Comms plans are critical — without visibility, DevEx feels like an isolated internal project nobody cares about
  • Look for paper cuts: small, high-frequency friction that any team can fix without major engineering investment
  • Impact follows a J-curve: early quick wins look large, then a dip while infrastructure and telemetry are built out, then compounding returns
  • Reported business impact: hundreds of thousands of dollars for smaller companies; billions for large ones
  • The Atlassian acquisition of DX for ~$1B signals how much enterprise value is locked in developer experience measurement

Applying a product mindset to DevEx

  • Treat DevEx improvements as products: define the user problem, build MVPs, get fast feedback, iterate
  • Know your addressable market (which teams, which workflows)
  • Define success upfront; build a go-to-market function including comms
  • Ask whether existing metrics are still driving useful decisions — sunset metrics that no longer inform action
  • This product discipline is especially important now because AI is changing the meaning of metrics faster than most teams update them

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.