Bob McGrew on AI agents, reasoning, and the path to AGI

Executive overview

Pre-training on text is hitting a data wall. Reasoning and test-time compute offer a new scaling curve — one that directly unlocks reliable AI agents.

The shift matters because reliability is the bottleneck for agents: going from 90% to 99% accuracy requires roughly 10x more compute, and reasoning lets you spend that compute at inference time rather than only at training time. A clear path to AGI-level scaling now exists.

The core insight: reasoning doesn't just make models smarter — it makes agents trustworthy enough to act.

From research lab to scaling law

OpenAI's early projects — Dota 2, the robot hand solving a Rubik's Cube — built conviction that scale was the core lever for AI progress.
The robot hand and Dota 2 shared the same idea: feed massive amounts of experience into a neural network and let it generalize.
Alec Radford's GPT-1 showed that predicting the next token on a transformer was sufficient signal for coherent text generation — widely dismissed at the time.
GPT-2, 3, and 4 applied the Dota/robotics insight of diverse data and larger scale to language.
OpenAI's culture sat between DeepMind (top-down plan) and Google Brain (unconstrained academia): opinionated about what to pursue, without centralizing all bets.

Why scaling continues despite the data wall

Pre-training scaling is running into a hard data ceiling.
Reasoning (as in O1/O3 and Gemini Flash Thinking) is a new S-curve — the same pattern as Dennard scaling giving way to new architectural tricks in Moore's law.
Going from 90% → 99% → 99.9% reliability historically required training bigger models; reasoning lets longer inference chains substitute for that compute.
Two levers on any scaling law: pure scale, and improving the slope (better architectures, optimization algorithms).

Agents: reliability is the unlock

Agents have always been technically possible; the missing ingredient was reliability.
Users won't wait five minutes or five hours for an action if it fails regularly.
Reasoning gives models a coherent chain of thought that sustains progress over long horizons — the same property that enables longer tasks.
The path to higher reliability is now clear, even if the engineering is hard.

Distillation and the startup playbook

Frontier labs have learned to take a large model's output distribution for a specific task and train a much smaller, faster model to approximate it.
Every major lab now has a small/fast sibling model (Sonnet/Haiku, O1/O1-mini, Gemini/Gemini Flash).
For founders: start with the best available model to validate value, then distill once you know what you're building.
Speed to market matters more than cost optimization early on.

The AI adoption puzzle

Despite models now passing the original "AGI" bar (Turing test, code, images), job displacement hasn't materialized in productivity statistics.
The bottleneck isn't capability — it's software that connects AI to the specific problem a real user actually has.
The forward deployed engineer model (sitting next to the user, building exactly what they need) is the template: understanding the workflow first, then reimagining it.
AI needs its "Palantir moment" — not faster access to existing workflows, but a twist that reframes the problem entirely.

Robotics and the innovator horizon

Robotics companies are roughly where LLM companies were five years ago.
Foundation models for robots (Skilled AI, Physical Intelligence) are showing rapid progress but are still in the zero-to-one phase.
Reasoning models may unlock scientific innovation (autonomous hypothesis generation) before robotics can run the physical experiments.
The bottleneck shifts: whatever part of the stack isn't automated becomes the next constraint.

Two roles in an AI-native world

Future human roles: lone genius (individual leveraged enormously by AI tools) and manager (CEO of a firm that is mostly AI).
The camera analogy: photography didn't kill painting — it expanded appreciation for visual art and increased the number of people who paint.
Agricultural automation took 90% of human jobs over a century; the new jobs were incomprehensible to farmers of 1880.
Teaching children to code remains valuable not for the output but for building intuition about what is and isn't possible — the resistance of the medium.

Bob McGrew on AI agents, reasoning, and the path to AGI

Executive overview

From research lab to scaling law

Why scaling continues despite the data wall

Agents: reliability is the unlock

Distillation and the startup playbook

The AI adoption puzzle

Robotics and the innovator horizon

Two roles in an AI-native world

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

From research lab to scaling law

Why scaling continues despite the data wall

Agents: reliability is the unlock

Distillation and the startup playbook

The AI adoption puzzle

Robotics and the innovator horizon

Two roles in an AI-native world

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.