How OpenAI Codex is redefining the software engineering team

Executive overview

Human review speed — not model intelligence — is now the binding constraint on AI-assisted software development. Codex, OpenAI's coding agent, ships as an IDE extension and CLI tool that acts as a proactive software engineering teammate, not just an autocomplete assistant.

The key insight: the best way for any agent to use a computer is to write code. Coding is therefore a foundational competency for all agents, including general assistants like ChatGPT.

The bottleneck to unlocking AGI-level productivity is human typing and review speed, not model capability.

What Codex is and how it works

  • An IDE extension (VS Code) and CLI tool that pairs with engineers on real codebases
  • Runs tasks inside a sandbox with access to local dependencies — no environment setup required
  • Uses the shell natively rather than bespoke tool APIs, enabling tight model-harness co-optimisation
  • "Compaction" extends task runs beyond context window limits; tasks now routinely run overnight or 24 hours
  • Codex models are the most-served coding models in both the ChatGPT product and the OpenAI API

Why the initial cloud-only version underperformed

  • Codex Cloud (async, remote, parallel tasks) was the long-term vision but too hard to onboard
  • OpenAI's internal engineers were comfortable with async reasoning; the broader market was not
  • The fix: ship a local, interactive version first — let users build trust incrementally, then migrate them to delegation
  • Dog-fooding gave misleading signal because OpenAI staff are outliers in prompting fluency

How Codex achieved 20x growth since August

  • GPT-5 launch was the primary catalyst; the latest model (GPT-5.0.1 Codex Max) is ~30% faster and unlocks higher reasoning
  • Tightly integrated product and research team iterates on model, API, and harness simultaneously
  • Feedback loops monitored on Reddit (real signal) and Twitter/X (hype signal); r/Codex watched closely
  • D7 retention and early-user experience are the core product metrics — power-user features are deprioritised

Acceleration in practice at OpenAI

  • Sora Android app: zero to employee launch in 18 days, public GA in 28 days, with two to three engineers; became the number one app in the App Store
  • Atlas browser: tasks that previously took two to three engineers two to three weeks now take one engineer one week
  • Designers vibe-code prototypes directly into production PRs; product marketers push string changes from Slack
  • Codex reviews its own training infrastructure code and has caught configuration mistakes; early experiments with Codex monitoring its own training runs

The teammate vision and the review bottleneck

  • The goal is a proactive teammate that surfaces work without being prompted — not a tool you invoke thousands of times a day
  • Current blocker: humans must still prompt and review all agent output; this is the underappreciated productivity ceiling
  • Code review UX is being redesigned — show image preview before diff, AI-assisted confidence scoring before human review
  • Proactivity requires context; the Atlas browser provides first-class context by sitting inside the rendering engine rather than relying on screenshots or accessibility APIs

Coding as the foundation of all agents

  • Every agent that uses a computer benefits from writing code — it is faster and more reliable than point-and-click automation
  • Codex is therefore building a core competency that feeds into ChatGPT and all future OpenAI agents
  • Non-technical users will interact with these agents without knowing they are using a coding agent, just as users do not think about whether Wi-Fi is on

AGI timeline and what unlocks the hockey stick

  • AGI is not a single event; it will arrive sector by sector as agent self-sufficiency is unlocked
  • Startups on modern stacks may see hockey-stick productivity as soon as next year
  • Large enterprises with legacy systems (e.g. SAP) will take years to unlock the same gains
  • The inflection point comes when agent productivity loops no longer require constant human prompting and review
  • Execution and deep customer understanding matter more than ever; the building advantage is eroding, so distribution and problem insight become the differentiators

Advice for engineers and career direction

  • Give Codex your hardest real task, not a trivial test — it is built for professional-grade problems
  • Start by aligning on a plan or plan.md before delegating a long task; verifiable steps extend run length
  • Systems thinking and cross-team communication skills remain critical; typing speed and algorithm recall matter less
  • Being at the knowledge frontier is still valuable — frontier problems force creative use of agents and are where models are weakest

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.