Fei-Fei Li on spatial intelligence as the next frontier in AI

Executive overview

Language took less than a million years to evolve; vision took 540 million. That gap explains why 3D spatial understanding remains unsolved while LLMs already pass the Turing test. Fei-Fei Li argues AGI cannot be complete without spatial intelligence — the ability to understand, generate, and reason about the 3D world.

She founded World Labs to build world models that go beyond flat pixels and language tokens. The goal: a foundation model for spatial intelligence with applications spanning robotics, design, gaming, and the metaverse.

Spatial intelligence is the hardest open problem in AI — and the most consequential.

Why spatial intelligence is harder than language

  • Language is 1D and purely generative; the real world is 3D and physically constrained.
  • Vision sensing is a projection — collapsing 3D to 2D — which is mathematically ill-posed.
  • World models must fluidly span generation (gaming, metaverse) and reconstruction (robotics).
  • Language data is abundant on the internet; spatial data largely exists only in embodied experience.
  • The visual cortex consumes far more of the human brain than language areas.

The ImageNet origin story

  • In 2007, Li and her students bet that AI needed a paradigm shift toward data-driven methods.
  • They scraped one billion images from the internet and built a full visual taxonomy.
  • The project was open-sourced from day one and paired with an annual public challenge.
  • For three years (2009–2012) there was little signal that it was working.
  • In 2012, Hinton's team (Supervision/AlexNet) used CNNs plus two GPUs to achieve a step-change in error rate — the first moment data, GPUs, and neural networks converged.

From objects to scenes to worlds

  • ImageNet solved object recognition; the next problem was scene understanding.
  • Li's lifelong goal was machine storytelling — describing a full scene the way humans do.
  • Around 2015, Andrej Karpathy and Li published some of the first image-captioning papers, combining vision and natural language.
  • The reverse problem — generating images from text — seemed like a joke in 2015; it is now generative AI.
  • The arc: objects → scenes → 3D world models.

World Labs and what spatial AI enables

  • Co-founded with Justin Johnson, Ben Mildenhall (NeRF), and Christoph Lassner (precursor to Gaussian Splatting).
  • Target use cases: 3D content creation for designers, architects, game developers, and artists.
  • Longer-term: robotics, metaverse, marketing, and entertainment.
  • World models must obey physics and support both generative and reconstructive use cases.
  • Data strategy is hybrid — quality matters as much as quantity; details not public.

Hiring and what makes great researchers

  • Li's single hiring criterion: intellectual fearlessness — the willingness to embrace hard problems and go all in.
  • Applies equally to PhD students, researchers, and engineering hires at World Labs.
  • World Labs is actively hiring across engineering, product, 3D, and generative AI.

Advice for founders and PhD students

  • Grad school is for burning curiosity; startups require a more focused commercial goal — know which you're in.
  • PhD students should target problems where compute and scale alone won't win: interdisciplinary AI, theory, explainability, small-data regimes.
  • Academia no longer leads on compute or data; find the North Stars industry can't easily reach.
  • Immigrant and minority founders: develop the capacity to not overindex on being the outsider — focus on building.
  • "Forget what you've done. Forget what others think. Hunker down and build."

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.