The original is one click away. Open original ↗
How OpenAI Codex is redefining the software engineering team
Executive overview
Human review speed — not model intelligence — is now the binding constraint on AI-assisted software development. Codex, OpenAI's coding agent, ships as an IDE extension and CLI tool that acts as a proactive software engineering teammate, not just an autocomplete assistant.
The key insight: the best way for any agent to use a computer is to write code. Coding is therefore a foundational competency for all agents, including general assistants like ChatGPT.
The bottleneck to unlocking AGI-level productivity is human typing and review speed, not model capability.
What Codex is and how it works
- An IDE extension (VS Code) and CLI tool that pairs with engineers on real codebases
- Runs tasks inside a sandbox with access to local dependencies — no environment setup required
- Uses the shell natively rather than bespoke tool APIs, enabling tight model-harness co-optimisation
- "Compaction" extends task runs beyond context window limits; tasks now routinely run overnight or 24 hours
- Codex models are the most-served coding models in both the ChatGPT product and the OpenAI API
Why the initial cloud-only version underperformed
- Codex Cloud (async, remote, parallel tasks) was the long-term vision but too hard to onboard
- OpenAI's internal engineers were comfortable with async reasoning; the broader market was not
- The fix: ship a local, interactive version first — let users build trust incrementally, then migrate them to delegation
- Dog-fooding gave misleading signal because OpenAI staff are outliers in prompting fluency
How Codex achieved 20x growth since August
- GPT-5 launch was the primary catalyst; the latest model (GPT-5.0.1 Codex Max) is ~30% faster and unlocks higher reasoning
- Tightly integrated product and research team iterates on model, API, and harness simultaneously
- Feedback loops monitored on Reddit (real signal) and Twitter/X (hype signal); r/Codex watched closely
- D7 retention and early-user experience are the core product metrics — power-user features are deprioritised
Acceleration in practice at OpenAI
- Sora Android app: zero to employee launch in 18 days, public GA in 28 days, with two to three engineers; became the number one app in the App Store
- Atlas browser: tasks that previously took two to three engineers two to three weeks now take one engineer one week
- Designers vibe-code prototypes directly into production PRs; product marketers push string changes from Slack
- Codex reviews its own training infrastructure code and has caught configuration mistakes; early experiments with Codex monitoring its own training runs
The teammate vision and the review bottleneck
- The goal is a proactive teammate that surfaces work without being prompted — not a tool you invoke thousands of times a day
- Current blocker: humans must still prompt and review all agent output; this is the underappreciated productivity ceiling
- Code review UX is being redesigned — show image preview before diff, AI-assisted confidence scoring before human review
- Proactivity requires context; the Atlas browser provides first-class context by sitting inside the rendering engine rather than relying on screenshots or accessibility APIs
Coding as the foundation of all agents
- Every agent that uses a computer benefits from writing code — it is faster and more reliable than point-and-click automation
- Codex is therefore building a core competency that feeds into ChatGPT and all future OpenAI agents
- Non-technical users will interact with these agents without knowing they are using a coding agent, just as users do not think about whether Wi-Fi is on
AGI timeline and what unlocks the hockey stick
- AGI is not a single event; it will arrive sector by sector as agent self-sufficiency is unlocked
- Startups on modern stacks may see hockey-stick productivity as soon as next year
- Large enterprises with legacy systems (e.g. SAP) will take years to unlock the same gains
- The inflection point comes when agent productivity loops no longer require constant human prompting and review
- Execution and deep customer understanding matter more than ever; the building advantage is eroding, so distribution and problem insight become the differentiators
Advice for engineers and career direction
- Give Codex your hardest real task, not a trivial test — it is built for professional-grade problems
- Start by aligning on a plan or plan.md before delegating a long task; verifiable steps extend run length
- Systems thinking and cross-team communication skills remain critical; typing speed and algorithm recall matter less
- Being at the knowledge frontier is still valuable — frontier problems force creative use of agents and are where models are weakest
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.