The original is one click away. Open original ↗
How fluid intelligence and program search point toward AGI
Executive overview
Scaling up pre-training produced AI with vast memorized skill but near-zero fluid intelligence — the ability to handle genuinely novel problems. Test-time adaptation fixed part of this, but deep learning models still lack the compositional generalisation needed for real AGI.
The missing piece is combining two types of abstraction: perception-based (type one) and program-based (type two). AGI requires a system that can synthesise new programs on the fly, guided by deep-learning intuition, rather than fetching pre-recorded templates.
Fluid intelligence is not skill — it is the efficiency with which past experience is converted into the ability to handle future novelty.
Why pre-training scaling failed
- Intelligence is an efficiency ratio: how well past experience converts to performance on genuinely new situations.
- Scaling up basal LLMs by 50,000x moved ARC-1 accuracy from ~0% to ~10%; humans score above 95%.
- Benchmarks based on known tasks measure memorised skill, not intelligence — they are designed for humans who cannot pre-memorise answers.
- Static inference (querying pre-loaded knowledge) cannot demonstrate fluid intelligence, no matter the model size.
- Goodhart's Law: chasing task-specific skill benchmarks optimises for automation, not invention.
What test-time adaptation changed
- Test-time adaptation (TTA) lets models modify their behaviour during inference — via test-time training, chain-of-thought synthesis, or program synthesis.
- Every AI system that performs meaningfully above zero on ARC uses TTA.
- OpenAI's O3, fine-tuned on ARC, reached human-level performance on ARC-1.
- ARC-1 is now saturated; it was a binary signal — either near-zero fluid intelligence or near-human.
ARC-2 and what it measures
- ARC-2 (released March 2025) targets compositional generalisation — tasks requiring deliberate reasoning, not just pattern recall.
- Validated with 400 non-expert humans in San Diego; all tasks solved by at least two people, average seven per task.
- Basal LLMs score 0%. Single-chain-of-thought reasoning systems score 1–2%. Only TTA systems do meaningfully better.
- Even O3 remains below human level on ARC-2, showing current TTA is not sufficient for AGI.
- ARC-3 (developer preview July 2025, full launch early 2026) will assess agency: exploring unknown environments, setting and achieving goals, with strict action-efficiency limits matching human performance.
Two types of abstraction
- Type one (value-centric): continuous distance functions; underlies perception, pattern recognition, intuition, and modern deep learning.
- Type two (program-centric): discrete graph comparison via exact structure matching; underlies explicit reasoning, planning, and software engineering abstraction.
- Transformers excel at type one; they struggle with simple type two tasks like sorting or digit addition.
- Human intelligence combines both: type one intuition prunes the search space so type two reasoning stays tractable (e.g., chess: pattern recognition selects which moves to calculate).
The role of discrete program search
- Deep learning alone does not invent — it automates.
- All known AI systems capable of genuine invention rely on discrete search (genetic algorithms, AlphaGo's Move 37, AlphaEvolve).
- Program synthesis treats learning as combinatorial search over a graph of symbolic operations.
- Program synthesis is data-efficient (fits from 2–3 examples) but hits combinatorial explosion as complexity grows.
- The solution: use type-one deep learning intuition to guide and prune type-two program search — analogous to embedding a discrete graph into a latent space where approximate distance functions control combinatorial explosion.
Architecture of the target system
- A programmer-like meta-learner that, when given a new task, synthesises a bespoke program on the fly.
- Programs blend deep-learning submodules (type one perception) with algorithmic modules (type two reasoning).
- Assembly is driven by discrete program search guided by learned intuition about program space.
- A global abstraction library accumulates reusable building blocks; new abstractions discovered during task-solving are uploaded back (like open-source libraries on GitHub).
- The system improves continuously: both the library and the intuition over program space grow over time.
- First milestone at Ndea (Chollet's research lab): solve ARC-AGI using a system that starts with zero knowledge of ARC.
Implications for AGI timelines
- Current TTA models are a major step — on-the-fly recombination is now possible — but remain far too compute-inefficient (thousands of dollars to solve ARC-1 at human level).
- Deep learning requires 3–4 orders of magnitude more data than humans to distill simple abstractions.
- You are close to AGI when it becomes hard to construct tasks that humans can solve but AI cannot; we are not there yet.
- AGI defined as autonomous invention and discovery — not just 80% task completion — is what unlocks acceleration of scientific progress.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.