Fei-Fei Li on AI history, world models, and spatial intelligence

Executive overview

AI did not emerge from a single breakthrough — it required decades of foundational work, culminating in the ImageNet dataset that unlocked modern deep learning. The missing ingredient was not a smarter algorithm but massive labeled data, mirroring how humans and animals learn through experience.

The next frontier is spatial intelligence: models that can reason, interact, and create in three-dimensional worlds. This capability is the key missing link for robotics and for augmenting human cognition beyond language.

AI is shaped by human choices — not an autonomous force — and everyone has a role in steering it.

From AI winter to modern deep learning

  • Pre-2000 AI cycled through logic systems, expert systems, and early neural networks but lacked scale.
  • The core insight behind ImageNet (2006–2007): human intelligence is built on big-data experience; models need the same.
  • 15 million images across 22,000 concepts were curated and open-sourced to the research community.
  • The 2012 AlexNet result combined ImageNet data, neural networks, and two consumer NVIDIA GPUs — the proof of concept for modern AI.
  • The same trio — internet-scale data, neural networks, GPUs — still underlies LLMs today.
  • "AI" was a marketing liability as late as 2016; by 2017 every company was calling itself an AI company.

Why current models are not enough

  • Today's AI cannot count chairs in a video — a task trivial for a child.
  • No current model could derive Newtonian mechanics from celestial data, even with modern instruments Newton lacked.
  • Emotional intelligence in conversation — understanding motivation, distress, passion — remains out of reach.
  • Scaling laws (more data, bigger models, more GPUs) have headroom, but the field needs new innovations, not just more compute.
  • AGI is a poorly defined marketing term; the scientific north star is simply: can machines think and act as humans can?

Spatial intelligence and world models

  • Language models handle tokens in; tokens out — training data and output are perfectly aligned.
  • Robots need actions in 3D worlds; web video data lacks that alignment, creating a fundamental gap.
  • Spatial intelligence is the ability to create, reason, and interact within 3D and 4D environments — not just generate flat video.
  • World models take a text prompt or image and produce an infinitely navigable, interactable 3D world.
  • Humans already use spatial intelligence for scientific discovery — Watson and Crick deduced the double helix from a 2D X-ray diffraction photo by reasoning in 3D.

Why the bitter lesson alone won't solve robotics

  • The bitter lesson: simpler models trained on more data consistently beat complex models with less data.
  • For language, training data and model output share the same form (text) — the lesson applies cleanly.
  • Robots require actions in physical 3D space; available data (web video) doesn't supply that directly.
  • Physical systems also need hardware maturity, supply chains, and real-world deployment — not just better algorithms.
  • Self-driving cars have had 20 years since the first DARPA challenge and are still not fully solved; robot manipulation is harder.
  • World models can help by generating diverse synthetic training environments for robots at scale.

Marble and World Labs

  • World Labs was founded ~18 months before this recording by Fei-Fei Li, Justin Johnson, Christoph Lassner, and Ben Mildenhall.
  • The company's thesis: spatial intelligence is as important as — and complementary to — language models.
  • Marble is the first product: prompt with text or an image, receive a fully 3D navigable world.
  • Early use cases: virtual film production (40x speed-up cited by a Sony collaboration), game development, robotic simulation, VR, and psychology research (exposure therapy environments).
  • Team size: ~30 people, predominantly researchers and research engineers.
  • Marble scenes can be exported as video or as 3D mesh for use in game engines.

Founding World Labs and career advice

  • Intellectual fearlessness — willingness to dive into the unknown without cataloguing every failure mode — has guided Fei-Fei's career moves (Princeton to Stanford, SAIL directorship, Google, World Labs).
  • The AI talent market is intensely competitive in ways that surprised even an experienced founder.
  • Advice to young engineers: focus on mission alignment, belief in the team, and impact — not every marginal variable of a job offer.

AI's impact on people and society

  • Technology is historically net positive, but every technology is a double-edged sword — outcomes depend on how society uses it.
  • Everyone has a role: artists should embrace AI as a creative tool; nurses can be augmented by smart cameras and robotic assistance; citizens should have a voice in AI governance.
  • Human dignity and human agency must sit at the center of AI development, deployment, and governance.
  • Stanford's Human-Centered AI Institute (HAI) bridges Silicon Valley and policymakers, supports interdisciplinary research across all eight Stanford schools, and has shaped legislation including a national AI research cloud bill.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.