Alexandr Wang on building Scale AI, agentic work, and competing with China

Executive overview

Scale AI began as a data-labeling API built during a chatbot boom, but its real advantage came from betting early on self-driving cars. That narrow focus created the operational foundation to serve OpenAI, the US Department of Defense, and eventually every major vertical of enterprise AI.

The core pattern throughout Scale's history: build ahead of the wave. Because AI needs data before it needs products, Scale was always one step upstream — and that timing has now made it one of the largest AI applications businesses in the industry.

Every enterprise's future competitive moat is a specialized model trained on its own proprietary data and environments.

From chatbots to self-driving to foundation models

  • Started at YC in 2016 targeting chatbot companies; pivoted within weeks to a generic "API for human labor"
  • Cruise became Scale's first major customer organically; Wang convinced investors to go all-in on self-driving despite concern the market was too small
  • Self-driving was too narrow to sustain the business long-term, but it built the operational muscle and credibility to move upstream
  • Started working with OpenAI on language models in 2019 (GPT-2 era); scaling laws became undeniable with GPT-3 in 2020
  • GPT-4 made clear that demand for data would grow to consume all available human knowledge
  • Mechanical Turk was the incumbent; its reputation as "just awful" was a green light

Scale's product evolution

  • Core business: producing high-quality training data for AI model developers
  • 2021–2022: expanded into AI-based applications and agentic workflows for enterprises and government
  • The applications business is growing faster than the data business and is treated as an effectively infinite market
  • Analogy: Amazon building AWS — seemingly unrelated to the core business, but enabled by operational scale and conviction in a permanently expanding market
  • Works with the world's largest pharma, telco, bank, healthcare provider, and extensively with the US DoD
  • Partners with (more than competes with) Palantir; the market is too large for winner-takes-all dynamics

What enterprises should actually do with AI

  • Every firm's core IP will shift from its codebase to its specialized fine-tuned model
  • Data, environments, and evals are the new moat — giving them to a model provider erases your advantage
  • Most enterprise AI workflows start with prompting; reinforcement learning gets you beyond what prompting can reach
  • The playbook: identify repetitive human workflows → convert them into environments and datasets → automate via agents
  • Lowest-hanging fruit: "deep research plus" tasks — pulling information from multiple sources, synthesizing, producing analysis
  • Scale uses agents internally across hiring, quality control, data processes, and sales reporting

The future of work

  • Coding is the case study: assistant → cursor-style pair programming → swarm of agents managed by one person
  • The terminal state of the economy is large-scale human management of agents
  • Management of agents is not trivial: vision-setting, debugging failures, and coordinating workflows remain hard
  • Self-driving analogy: getting to 90% is easy; the final 10% requires a lot of work — the same will be true for large-scale agent deployments
  • Human demand is historically insatiable; as AI makes the economy more efficient, demand expands to fill the gap
  • The leverage programmers have had for decades (infinite replicas, infinite runs) will extend to all human workers

Humanity's last exam and the evaluation problem

  • Built in partnership with the Center for AI Safety: researchers contributed novel problems from their own recent work, never published anywhere
  • When launched, best models scored ~7–8%; now north of 20% — moved very quickly
  • The AI industry suffers from a lack of hard evals that genuinely probe the frontier of model capabilities
  • A popular benchmark sets the North Star for researchers; building the eval shapes what the field optimizes for
  • Eventually all benchmarks get saturated; the next generation of evals will use real-world tasks, which are fundamentally fuzzier

China, compute, and the AI race

  • The simplest explanation for how fast Chinese labs have progressed: espionage of tacit training knowledge from US frontier labs
  • China is likely at a half-step behind on models but has structural advantages on data: government labeling centers, college programs, robotics data factories, and no copyright or privacy constraints
  • US grid capacity is flat; China's has doubled over the past decade — a pure policy failure that constrains US compute build-out
  • On algorithms, the US is more innovative net, but espionage levels the playing field
  • Overall: ~60–70% probability the US maintains a sustained lead, but many scenarios where China catches up or overtakes
  • Hardware manufacturing is a deeper problem: a humanoid robot costs $20–30k to build in the US; the equivalent is $2–4k in China
  • Future conflict is drone-and-robot-driven, not carrier-and-jet-driven; the shift is toward smaller, faster, more attritable assets
  • Scale is building Thunderforge with Indo-Pacific Command: converts 72-hour military planning cycles into 10-minute agent-driven workflows

Hiring and company-building

  • Wang reviews and approves every hire at Scale personally
  • "Quality is fractal" — high standards trickle down; once people sense their manager doesn't care, they stop caring
  • The single most important trait: caring deeply, to the point where poor work is genuinely painful and great work is genuinely satisfying
  • Young founders have poor sense of alpha — what they're uniquely positioned to do — and gravitate toward mimetic ideas
  • Startups need a strategy for walking up the capability curve: whatever you build must benefit from increasingly capable models

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.