The original is one click away. Open original ↗
Fei-Fei Li on AI history, world models, and spatial intelligence
Executive overview
AI did not emerge from a single breakthrough — it required decades of foundational work, culminating in the ImageNet dataset that unlocked modern deep learning. The missing ingredient was not a smarter algorithm but massive labeled data, mirroring how humans and animals learn through experience.
The next frontier is spatial intelligence: models that can reason, interact, and create in three-dimensional worlds. This capability is the key missing link for robotics and for augmenting human cognition beyond language.
AI is shaped by human choices — not an autonomous force — and everyone has a role in steering it.
From AI winter to modern deep learning
- Pre-2000 AI cycled through logic systems, expert systems, and early neural networks but lacked scale.
- The core insight behind ImageNet (2006–2007): human intelligence is built on big-data experience; models need the same.
- 15 million images across 22,000 concepts were curated and open-sourced to the research community.
- The 2012 AlexNet result combined ImageNet data, neural networks, and two consumer NVIDIA GPUs — the proof of concept for modern AI.
- The same trio — internet-scale data, neural networks, GPUs — still underlies LLMs today.
- "AI" was a marketing liability as late as 2016; by 2017 every company was calling itself an AI company.
Why current models are not enough
- Today's AI cannot count chairs in a video — a task trivial for a child.
- No current model could derive Newtonian mechanics from celestial data, even with modern instruments Newton lacked.
- Emotional intelligence in conversation — understanding motivation, distress, passion — remains out of reach.
- Scaling laws (more data, bigger models, more GPUs) have headroom, but the field needs new innovations, not just more compute.
- AGI is a poorly defined marketing term; the scientific north star is simply: can machines think and act as humans can?
Spatial intelligence and world models
- Language models handle tokens in; tokens out — training data and output are perfectly aligned.
- Robots need actions in 3D worlds; web video data lacks that alignment, creating a fundamental gap.
- Spatial intelligence is the ability to create, reason, and interact within 3D and 4D environments — not just generate flat video.
- World models take a text prompt or image and produce an infinitely navigable, interactable 3D world.
- Humans already use spatial intelligence for scientific discovery — Watson and Crick deduced the double helix from a 2D X-ray diffraction photo by reasoning in 3D.
Why the bitter lesson alone won't solve robotics
- The bitter lesson: simpler models trained on more data consistently beat complex models with less data.
- For language, training data and model output share the same form (text) — the lesson applies cleanly.
- Robots require actions in physical 3D space; available data (web video) doesn't supply that directly.
- Physical systems also need hardware maturity, supply chains, and real-world deployment — not just better algorithms.
- Self-driving cars have had 20 years since the first DARPA challenge and are still not fully solved; robot manipulation is harder.
- World models can help by generating diverse synthetic training environments for robots at scale.
Marble and World Labs
- World Labs was founded ~18 months before this recording by Fei-Fei Li, Justin Johnson, Christoph Lassner, and Ben Mildenhall.
- The company's thesis: spatial intelligence is as important as — and complementary to — language models.
- Marble is the first product: prompt with text or an image, receive a fully 3D navigable world.
- Early use cases: virtual film production (40x speed-up cited by a Sony collaboration), game development, robotic simulation, VR, and psychology research (exposure therapy environments).
- Team size: ~30 people, predominantly researchers and research engineers.
- Marble scenes can be exported as video or as 3D mesh for use in game engines.
Founding World Labs and career advice
- Intellectual fearlessness — willingness to dive into the unknown without cataloguing every failure mode — has guided Fei-Fei's career moves (Princeton to Stanford, SAIL directorship, Google, World Labs).
- The AI talent market is intensely competitive in ways that surprised even an experienced founder.
- Advice to young engineers: focus on mission alignment, belief in the team, and impact — not every marginal variable of a job offer.
AI's impact on people and society
- Technology is historically net positive, but every technology is a double-edged sword — outcomes depend on how society uses it.
- Everyone has a role: artists should embrace AI as a creative tool; nurses can be augmented by smart cameras and robotic assistance; citizens should have a voice in AI governance.
- Human dignity and human agency must sit at the center of AI development, deployment, and governance.
- Stanford's Human-Centered AI Institute (HAI) bridges Silicon Valley and policymakers, supports interdisciplinary research across all eight Stanford schools, and has shaped legislation including a national AI research cloud bill.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.