The original is one click away. Open original ↗
Bob McGrew on AI agents, reasoning, and the path to AGI
Executive overview
Pre-training on text is hitting a data wall. Reasoning and test-time compute offer a new scaling curve — one that directly unlocks reliable AI agents.
The shift matters because reliability is the bottleneck for agents: going from 90% to 99% accuracy requires roughly 10x more compute, and reasoning lets you spend that compute at inference time rather than only at training time. A clear path to AGI-level scaling now exists.
The core insight: reasoning doesn't just make models smarter — it makes agents trustworthy enough to act.
From research lab to scaling law
- OpenAI's early projects — Dota 2, the robot hand solving a Rubik's Cube — built conviction that scale was the core lever for AI progress.
- The robot hand and Dota 2 shared the same idea: feed massive amounts of experience into a neural network and let it generalize.
- Alec Radford's GPT-1 showed that predicting the next token on a transformer was sufficient signal for coherent text generation — widely dismissed at the time.
- GPT-2, 3, and 4 applied the Dota/robotics insight of diverse data and larger scale to language.
- OpenAI's culture sat between DeepMind (top-down plan) and Google Brain (unconstrained academia): opinionated about what to pursue, without centralizing all bets.
Why scaling continues despite the data wall
- Pre-training scaling is running into a hard data ceiling.
- Reasoning (as in O1/O3 and Gemini Flash Thinking) is a new S-curve — the same pattern as Dennard scaling giving way to new architectural tricks in Moore's law.
- Going from 90% → 99% → 99.9% reliability historically required training bigger models; reasoning lets longer inference chains substitute for that compute.
- Two levers on any scaling law: pure scale, and improving the slope (better architectures, optimization algorithms).
Agents: reliability is the unlock
- Agents have always been technically possible; the missing ingredient was reliability.
- Users won't wait five minutes or five hours for an action if it fails regularly.
- Reasoning gives models a coherent chain of thought that sustains progress over long horizons — the same property that enables longer tasks.
- The path to higher reliability is now clear, even if the engineering is hard.
Distillation and the startup playbook
- Frontier labs have learned to take a large model's output distribution for a specific task and train a much smaller, faster model to approximate it.
- Every major lab now has a small/fast sibling model (Sonnet/Haiku, O1/O1-mini, Gemini/Gemini Flash).
- For founders: start with the best available model to validate value, then distill once you know what you're building.
- Speed to market matters more than cost optimization early on.
The AI adoption puzzle
- Despite models now passing the original "AGI" bar (Turing test, code, images), job displacement hasn't materialized in productivity statistics.
- The bottleneck isn't capability — it's software that connects AI to the specific problem a real user actually has.
- The forward deployed engineer model (sitting next to the user, building exactly what they need) is the template: understanding the workflow first, then reimagining it.
- AI needs its "Palantir moment" — not faster access to existing workflows, but a twist that reframes the problem entirely.
Robotics and the innovator horizon
- Robotics companies are roughly where LLM companies were five years ago.
- Foundation models for robots (Skilled AI, Physical Intelligence) are showing rapid progress but are still in the zero-to-one phase.
- Reasoning models may unlock scientific innovation (autonomous hypothesis generation) before robotics can run the physical experiments.
- The bottleneck shifts: whatever part of the stack isn't automated becomes the next constraint.
Two roles in an AI-native world
- Future human roles: lone genius (individual leveraged enormously by AI tools) and manager (CEO of a firm that is mostly AI).
- The camera analogy: photography didn't kill painting — it expanded appreciation for visual art and increased the number of people who paint.
- Agricultural automation took 90% of human jobs over a century; the new jobs were incomprehensible to farmers of 1880.
- Teaching children to code remains valuable not for the output but for building intuition about what is and isn't possible — the resistance of the medium.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.