Why scaling alone won't get us to AGI: François Chollet on ARC and intelligence

Executive overview

Current AI progress automates domains with verifiable rewards — code, math — but this is not the same as general intelligence. General intelligence is human-level skill-acquisition efficiency across arbitrary new tasks, not just performance on trained domains.

LLMs scaled 50,000x with no meaningful gain on ARC V1 until reasoning models arrived. That gap is the signal: pre-training scale alone cannot produce fluid intelligence.

Chollet's lab, Endia, is building an alternative foundation — symbolic program synthesis — to target optimality directly rather than patching the LLM stack.

The core insight: intelligence is efficient learning, not stored knowledge — and those require fundamentally different architectures.

Why the LLM stack has a ceiling

Verifiable-reward domains (code, math) can be fully automated with current technology via RL post-training loops
Domains without formal verification (essays, law) will see slow or stalling progress
Scaling pre-training 50,000x left ARC V1 scores near zero — more parameters alone don't produce fluid intelligence
Reasoning models caused a step-function jump on ARC V1; RL harnesses saturated ARC V2 — but neither indicates higher fluid intelligence, only better training in specific domains
Human-engineered harnesses being required to crack benchmarks is itself evidence we are short of AGI

What Endia is building

Replacing parametric curves (neural nets) with the shortest possible symbolic models of data — minimum description length as the target
Gradient descent replaced by symbolic descent: search over symbolic space guided by deep learning
Models are expected to be tiny at inference time, generalise better, and compose more cleanly
Estimated ~10–15% chance of success — worth attempting because no one else is doing it
Retrospective prediction: AGI, once found, will be less than 10,000 lines of code; the compute of the 1980s would have been sufficient

ARC as a barometer of AI progress

ARC V1 (2019): static pattern tasks requiring causal modelling from provided data; base LLMs scored near zero
ARC V2: same format, harder composition; saturated by RL harnesses fine-tuned on self-generated verified reasoning chains
ARC V3 (2026): agentic — agent dropped into an unseen mini game with no instructions, no stated goal, no controls; must explore, form a world model, set goals, and solve efficiently
Scored on action efficiency matched against human baselines; brute-force exploration scores extremely low
Private test set is deliberately unlike the public set to resist targeted fine-tuning
ARC 4 planned: continual/curriculum learning across compounding game levels
ARC 5: focused on invention (details withheld)
AGI moment defined as when the measurable gap between human and AI learning efficiency effectively closes

What makes domains automatable now

True, trustable verification signals enable RL post-training loops that self-generate training data at scale
Code was first: unit tests provide dense, reliable reward; models learn execution traces the way human programmers mentally simulate code
Mathematics is next for the same reason
The key human contribution shrinks to designing the environment; from that, exponentially more training data is generated autonomously
Removing humans from the improvement loop — not recursive self-improvement per se — is the prerequisite for compounding capability gains

Intelligence, efficiency, and the knowledge–intelligence trade-off

Competence requires either high intelligence or high knowledge; better training substitutes for fluid intelligence in bounded domains
LLMs are effectively large knowledge bases — modular vector programs mapping input patterns to output patterns
Fluid intelligence is the ability to model a new environment efficiently from scratch, with little data
Humans solve novel ARC V3 games in hundreds to thousands of actions with no prior training; frontier models are far from matching this
Science itself is symbolic compression: observations → shortest symbolic rule; Endia is attempting to build this process algorithmically

Advice for researchers and founders exploring alternative approaches

If an idea has low probability but high impact and no one else is doing it, that is sufficient reason to pursue it
Look for approaches that scale without human bottlenecks — capability must improve with compute/data, not engineer-hours
Read AI research from the 1970s–80s: more diverse ideas were being explored before the field collapsed into one paradigm
Genetic algorithms are underexplored and may have significant scaling potential
Build a compounding stack — reusable foundations, not a series of disconnected experiments
For open-source projects: prioritise API simplicity and onboarding; docs should teach the domain, not just the tool; hire your most enthusiastic community members

Why scaling alone won't get us to AGI: François Chollet on ARC and intelligence

Executive overview

Why the LLM stack has a ceiling

What Endia is building

ARC as a barometer of AI progress

What makes domains automatable now

Intelligence, efficiency, and the knowledge–intelligence trade-off

Advice for researchers and founders exploring alternative approaches

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

Why the LLM stack has a ceiling

What Endia is building

ARC as a barometer of AI progress

What makes domains automatable now

Intelligence, efficiency, and the knowledge–intelligence trade-off

Advice for researchers and founders exploring alternative approaches

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.