Alexandr Wang on building Scale AI, agentic work, and competing with China

Executive overview

Scale AI began as a data-labeling API built during a chatbot boom, but its real advantage came from betting early on self-driving cars. That narrow focus created the operational foundation to serve OpenAI, the US Department of Defense, and eventually every major vertical of enterprise AI.

The core pattern throughout Scale's history: build ahead of the wave. Because AI needs data before it needs products, Scale was always one step upstream — and that timing has now made it one of the largest AI applications businesses in the industry.

Every enterprise's future competitive moat is a specialized model trained on its own proprietary data and environments.

From chatbots to self-driving to foundation models

Started at YC in 2016 targeting chatbot companies; pivoted within weeks to a generic "API for human labor"
Cruise became Scale's first major customer organically; Wang convinced investors to go all-in on self-driving despite concern the market was too small
Self-driving was too narrow to sustain the business long-term, but it built the operational muscle and credibility to move upstream
Started working with OpenAI on language models in 2019 (GPT-2 era); scaling laws became undeniable with GPT-3 in 2020
GPT-4 made clear that demand for data would grow to consume all available human knowledge
Mechanical Turk was the incumbent; its reputation as "just awful" was a green light

Scale's product evolution

Core business: producing high-quality training data for AI model developers
2021–2022: expanded into AI-based applications and agentic workflows for enterprises and government
The applications business is growing faster than the data business and is treated as an effectively infinite market
Analogy: Amazon building AWS — seemingly unrelated to the core business, but enabled by operational scale and conviction in a permanently expanding market
Works with the world's largest pharma, telco, bank, healthcare provider, and extensively with the US DoD
Partners with (more than competes with) Palantir; the market is too large for winner-takes-all dynamics

What enterprises should actually do with AI

Every firm's core IP will shift from its codebase to its specialized fine-tuned model
Data, environments, and evals are the new moat — giving them to a model provider erases your advantage
Most enterprise AI workflows start with prompting; reinforcement learning gets you beyond what prompting can reach
The playbook: identify repetitive human workflows → convert them into environments and datasets → automate via agents
Lowest-hanging fruit: "deep research plus" tasks — pulling information from multiple sources, synthesizing, producing analysis
Scale uses agents internally across hiring, quality control, data processes, and sales reporting

The future of work

Coding is the case study: assistant → cursor-style pair programming → swarm of agents managed by one person
The terminal state of the economy is large-scale human management of agents
Management of agents is not trivial: vision-setting, debugging failures, and coordinating workflows remain hard
Self-driving analogy: getting to 90% is easy; the final 10% requires a lot of work — the same will be true for large-scale agent deployments
Human demand is historically insatiable; as AI makes the economy more efficient, demand expands to fill the gap
The leverage programmers have had for decades (infinite replicas, infinite runs) will extend to all human workers

Humanity's last exam and the evaluation problem

Built in partnership with the Center for AI Safety: researchers contributed novel problems from their own recent work, never published anywhere
When launched, best models scored ~7–8%; now north of 20% — moved very quickly
The AI industry suffers from a lack of hard evals that genuinely probe the frontier of model capabilities
A popular benchmark sets the North Star for researchers; building the eval shapes what the field optimizes for
Eventually all benchmarks get saturated; the next generation of evals will use real-world tasks, which are fundamentally fuzzier

China, compute, and the AI race

The simplest explanation for how fast Chinese labs have progressed: espionage of tacit training knowledge from US frontier labs
China is likely at a half-step behind on models but has structural advantages on data: government labeling centers, college programs, robotics data factories, and no copyright or privacy constraints
US grid capacity is flat; China's has doubled over the past decade — a pure policy failure that constrains US compute build-out
On algorithms, the US is more innovative net, but espionage levels the playing field
Overall: ~60–70% probability the US maintains a sustained lead, but many scenarios where China catches up or overtakes
Hardware manufacturing is a deeper problem: a humanoid robot costs $20–30k to build in the US; the equivalent is $2–4k in China
Future conflict is drone-and-robot-driven, not carrier-and-jet-driven; the shift is toward smaller, faster, more attritable assets
Scale is building Thunderforge with Indo-Pacific Command: converts 72-hour military planning cycles into 10-minute agent-driven workflows

Hiring and company-building

Wang reviews and approves every hire at Scale personally
"Quality is fractal" — high standards trickle down; once people sense their manager doesn't care, they stop caring
The single most important trait: caring deeply, to the point where poor work is genuinely painful and great work is genuinely satisfying
Young founders have poor sense of alpha — what they're uniquely positioned to do — and gravitate toward mimetic ideas
Startups need a strategy for walking up the capability curve: whatever you build must benefit from increasingly capable models

Alexandr Wang on building Scale AI, agentic work, and competing with China

Executive overview

From chatbots to self-driving to foundation models

Scale's product evolution

What enterprises should actually do with AI

The future of work

Humanity's last exam and the evaluation problem

China, compute, and the AI race

Hiring and company-building

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

From chatbots to self-driving to foundation models

Scale's product evolution

What enterprises should actually do with AI

The future of work

Humanity's last exam and the evaluation problem

China, compute, and the AI race

Hiring and company-building

More like this — when you're ready for early access.

More in Founder Stories

What a $7B founder learned building Glean from scratch

From four failed co-founder splits to a $66M solo startup

The real cost of avoiding hard conversations in leadership

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.