Fei-Fei Li on spatial intelligence as the next frontier in AI

Executive overview

Language took less than a million years to evolve; vision took 540 million. That gap explains why 3D spatial understanding remains unsolved while LLMs already pass the Turing test. Fei-Fei Li argues AGI cannot be complete without spatial intelligence — the ability to understand, generate, and reason about the 3D world.

She founded World Labs to build world models that go beyond flat pixels and language tokens. The goal: a foundation model for spatial intelligence with applications spanning robotics, design, gaming, and the metaverse.

Spatial intelligence is the hardest open problem in AI — and the most consequential.

Why spatial intelligence is harder than language

Language is 1D and purely generative; the real world is 3D and physically constrained.
Vision sensing is a projection — collapsing 3D to 2D — which is mathematically ill-posed.
World models must fluidly span generation (gaming, metaverse) and reconstruction (robotics).
Language data is abundant on the internet; spatial data largely exists only in embodied experience.
The visual cortex consumes far more of the human brain than language areas.

The ImageNet origin story

In 2007, Li and her students bet that AI needed a paradigm shift toward data-driven methods.
They scraped one billion images from the internet and built a full visual taxonomy.
The project was open-sourced from day one and paired with an annual public challenge.
For three years (2009–2012) there was little signal that it was working.
In 2012, Hinton's team (Supervision/AlexNet) used CNNs plus two GPUs to achieve a step-change in error rate — the first moment data, GPUs, and neural networks converged.

From objects to scenes to worlds

ImageNet solved object recognition; the next problem was scene understanding.
Li's lifelong goal was machine storytelling — describing a full scene the way humans do.
Around 2015, Andrej Karpathy and Li published some of the first image-captioning papers, combining vision and natural language.
The reverse problem — generating images from text — seemed like a joke in 2015; it is now generative AI.
The arc: objects → scenes → 3D world models.

World Labs and what spatial AI enables

Co-founded with Justin Johnson, Ben Mildenhall (NeRF), and Christoph Lassner (precursor to Gaussian Splatting).
Target use cases: 3D content creation for designers, architects, game developers, and artists.
Longer-term: robotics, metaverse, marketing, and entertainment.
World models must obey physics and support both generative and reconstructive use cases.
Data strategy is hybrid — quality matters as much as quantity; details not public.

Hiring and what makes great researchers

Li's single hiring criterion: intellectual fearlessness — the willingness to embrace hard problems and go all in.
Applies equally to PhD students, researchers, and engineering hires at World Labs.
World Labs is actively hiring across engineering, product, 3D, and generative AI.

Advice for founders and PhD students

Grad school is for burning curiosity; startups require a more focused commercial goal — know which you're in.
PhD students should target problems where compute and scale alone won't win: interdisciplinary AI, theory, explainability, small-data regimes.
Academia no longer leads on compute or data; find the North Stars industry can't easily reach.
Immigrant and minority founders: develop the capacity to not overindex on being the outsider — focus on building.
"Forget what you've done. Forget what others think. Hunker down and build."

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks