How Scale AI trains frontier models and why expert data is the new moat

Executive overview

AI models have shifted from knowing things to doing things — and getting agents to act reliably inside real-world software systems is far harder than headlines suggest. Scale AI sits at the center of this work: supplying expert-labelled data and reinforcement learning environments to frontier labs, and building AI applications for enterprise and government customers.

The core bottleneck is no longer raw data volume but human judgment — specifically, the kind of deep domain expertise that tells a model what "good" looks like in a given context.

The real infrastructure of AI progress is expert humans digitising their judgment so models can act reliably in the real world.

What Scale actually does

Meta invested $14B for 49% non-voting stock; Scale remains fully independent with its own board and governance
Alex Wang moved to Meta to lead a superintelligence team; Jason Droege now runs Scale
Two major business units, each with hundreds of millions in revenue: data supply to model builders, and AI applications/services for enterprise and government
About 1,100 employees; 250 open roles as of recording
The company has grown every month since the Meta deal

How data labelling has evolved

18 months ago: annotators compared short stories and gave preference rankings — basic, generalist work
Today: tasks involve world-class engineers building full websites, PhDs explaining nuanced cancer topics to models — hours of work per task
80% of Scale's contributor network holds a bachelor's degree or higher; ~15% hold PhDs
Contributors are found primarily through peer referrals, campus programmes, and LinkedIn — the best come from grassroots networks
Referrals dominate because contributors find the work meaningful: using expertise to fix a model's gaps in their own field is intrinsically motivating

Reinforcement learning environments

RL environments are sandboxes where AI agents practice completing goals inside real software systems (e.g. navigating a Salesforce instance)
The agent must learn: how to read configurations, how to execute a business process reliably, and when to escalate to a human
The key research question: how generalizable is each task? More generalizable data is more valuable
The number of software environments times the number of goals within each is effectively infinite — so the strategy is collecting data that transfers broadly, not exhaustively

Evals: what "good" looks like

Evals are the benchmark for model quality — a comprehensive set of examples showing what the correct or preferred output is
For enterprise and government customers, evals are the primary work: the customer's own experts must define what good looks like in their specific context
A document with identical wording can mean something different at two different companies — off-the-shelf models plus RAG plus fine-tuning can only get you so far
The bottleneck is digitising human judgment at the company-specific level, not just general expertise

Enterprise AI: what's actually working

Most POCs reach 60–70% accuracy and teams assume "the rest is easy" — it isn't; each additional sigma of reliability is an order-of-magnitude harder
Robust deployment takes 6–12 months when done properly: legal, policy, regulatory, change management, and accuracy thresholds all have to align
The 95% POC failure rate is somewhat overstated — it reflects how easy it is to start a project, not how often serious efforts fail
AI performs best where current human accuracy is low (10–20%); it struggles to close the last 2% in processes already running at 98% accuracy
Healthcare example: an AI tool that reads 200–300 pages of patient records and surfaces the top 5–10 considerations — including non-obvious drug-allergy conflicts a human might miss

Where models are heading (2–3 year view)

The shift is from knowing to doing — knowledge benchmarks are near saturation; the frontier is reliable action
Agent reliability inside real systems (calendar, CRM, healthcare) is just beginning; trajectory uncertainty is high
Technology will likely reach a point in 2–3 years that forces policy makers and organisations to respond — the bottleneck becomes change management, not capability
The "white collar apocalypse" thesis is premature for the next 1–2 years; human adaptability is consistently underestimated in these predictions

Building and evaluating new businesses

Two things make companies work: a founder who is a force of nature over a long duration, and a fundamentally good market/business model
Quick filters: does the business have network effects, lock-in, and increasing value at scale? If not, why not?
High gross margin is a coarse but fast instrument — if you can't defend a high margin, ask why; the answer usually reveals the real competitive problem
The urgency of the buyer matters more than the value of the product — building something valuable that isn't the customer's top daily priority creates a very long road

Hiring and team composition

For ~95% of roles: hire for curious problem-solving, ability to work across people, and leadership potential — not specific prior experience
For the other ~5% (roles where speed to market is critical, e.g. frontier researchers): prior experience and relationships override the general framework
A stable management team that knows each other's strengths and compensates for weaknesses outperforms a serially "upgraded" team as the company scales
The world is changing fast enough that adaptability and growth trajectory matter more than one-to-one experience match

Lessons from Uber Eats

Launched December 2015 in Toronto; $20K in sales within two hours
Grew from zero to $20B GMV in four and a half years; now ~$80B
Key insight: restaurant incremental gross margin on delivery orders is 70–80% (ingredients scale, labour and real estate are fixed) — that justified a 25–30% take rate
Pushed McDonald's away for four to five months on principle; the delay led to better deal terms and an exclusive relationship that accelerated chain adoption
Tried and abandoned: convenience vans, generalist point-to-point delivery, grocery — food delivery was the one signal that kept strengthening on every dimension

On independent thinking and founder mindset

The question to ask before starting anything: "Why do I have an insight that a million other smart, working entrepreneurs don't have?"
Don't fall in love with ideas — the mission is solving the customer's problem, not validating your prior belief
"Not losing is a precursor to winning" — survival enables the timing and insight corrections that eventually produce success; high-risk decisions that fail leave no path forward
The end is never the end: the moments that feel impassable almost always have an imperfect but workable solution

How Scale AI trains frontier models and why expert data is the new moat

Executive overview

What Scale actually does

How data labelling has evolved

Reinforcement learning environments

Evals: what "good" looks like

Enterprise AI: what's actually working

Where models are heading (2–3 year view)

Building and evaluating new businesses

Hiring and team composition

Lessons from Uber Eats

On independent thinking and founder mindset

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

What Scale actually does

How data labelling has evolved

Reinforcement learning environments

Evals: what "good" looks like

Enterprise AI: what's actually working

Where models are heading (2–3 year view)

Building and evaluating new businesses

Hiring and team composition

Lessons from Uber Eats

On independent thinking and founder mindset

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.