The original is one click away. Open original ↗
How Mercor became the fastest-growing company in history by matching experts with AI labs
Executive overview
AI labs can improve models only as fast as they can measure what "good" looks like — and that measurement requires domain experts, not crowdsourced generalists. Mercor identified this bottleneck early and built a marketplace that sources, vets, and deploys high-caliber professionals to write evals and post-training data for the world's top AI labs.
The result: $1 to $400M revenue run rate in 16 months, zero customer churn, and net retention above 1,600%.
The core insight: if the model is the product, the eval is the PRD — and every capability gap needs a human expert to define what success looks like.
What evals are and why they matter
- An eval is a systematic way to measure whether a model achieves a desired capability — equivalent to a product requirement document for AI.
- Researchers run dozens of experiments daily against eval sets; once a good eval exists, reinforcement learning can climb it rapidly.
- Evals serve double duty: internal benchmark for researchers and external sales collateral demonstrating model capabilities.
- The shift from academic evals (Olympiad math, GPQA) to practical evals (legal redlining, investment analysis) is now underway.
- For enterprises: the prerequisite to deploying AI across your value chain is defining how you measure success in that value chain.
What Mercor's experts actually do
- Experts create rubrics — structured criteria for what correct model output looks like — used both to score outputs and to reward model trajectories in RL.
- Example: a lawyer defines what an accurate contract redline looks like, point by point; the model then optimizes against that rubric.
- This is post-training work, not pre-training data collection: it shapes reasoning and prioritization, not raw knowledge ingestion.
- Data types range from supervised fine-tuning (input/output pairs) to RLHF preference labels to verifier rubrics for reinforcement learning from AI feedback (RLAIF).
- The market is bounded by the set of things humans can do that models cannot — which remains large across legal, medical, creative, and engineering domains.
- Tens of thousands of experts are active at any time; hundreds of thousands more broadly.
- Median pay is $95/hour; specialized experts earn up to $500/hour.
The competitive landscape
- Early data labeling (Scale, Surge) focused on volume of low-to-medium-skilled workers writing grammatically correct sentences for early LLMs.
- The market shifted: labs now need sourcing and vetting of professionals — software engineers, bankers, doctors, lawyers, screenwriters.
- Mercor's differentiation: a labor marketplace that retains expert dignity, pays well, and cuts out intermediaries; similar economics to Uber/DoorDash but for high-skilled talent.
- The top 10% of experts on a project typically drive the majority of model improvement — identifying that cohort reliably is Mercor's core proprietary advantage.
How Mercor found product-market fit
- Founded January 2023 at age 19; bootstrapped to $1M revenue run rate before dropping out of college.
- Early signal: a meeting with the xAI co-founding team while still in college revealed that labs urgently needed quality-focused expert talent.
- Inflection point: an incumbent crowdsourcing company used Mercor's platform to hire 1,000+ people and then failed to pay them — exposing incumbents' poor talent experience.
- Response: started working directly with labs in May 2024; grew to $400M run rate in 16 months.
- Key lesson: product-market fit looks like surprising ease of selling to the marginal customer, not grinding through objections.
- Remain stubborn on thesis (how the world will change), but stay open on form (exactly how your company fits into it).
Building Mercor's culture
Three core values:
- Can-do attitude — set audacious revenue targets and build the trajectory around them (e.g., called $50M run rate from $1.5M; hit it within two weeks of the deadline).
- High standards — extreme patience hiring the first 10 people; half were former founders or executives; initial talent density shapes the entire org as it scales.
- Intensity — output-oriented, not hours-oriented; people who care deeply tend to work hard without mandated schedules.
- In retrospect, scaling from 10 to 100 people could have moved faster once demand clearly exceeded capacity.
- Sales and marketing headcount was near zero for the first 18 months; growth came entirely from word of mouth and customer obsession.
The future of evals and labor markets
- Evals are evergreen: as long as there are capabilities to build, experts will be needed to define success criteria for them.
- The entire economy may become an "RL environment machine" — every workflow converted into a measurable context for model improvement.
- Super intelligence within three years is likely overstated; a 10-year road of post-training improvements is more realistic.
- Post-training data is far more data-efficient than scaling pre-training; the frontier will be pushed by quality of evals, not volume of compute.
Skills and jobs that will last
- Roles in industries with elastic demand — where 10x productivity creates 10x more work, not 10x fewer jobs — are best positioned.
- Software development is the most elastic: demand for features is effectively unlimited.
- Product management, operations, consulting, and creative work also fall in this category.
- Accounting and similar fixed-volume domains are more exposed.
- The differentiating skill across all fields: the ability to leverage AI to amplify existing domain expertise.
- Assessment is shifting from "can you do the task?" to "what can you build in an hour using all available AI tools?"
Hiring advice for founders
- Wait for genuine pull before stepping on the gas: if selling to the marginal customer is extremely hard, you haven't found the market yet.
- First 10 hires: be patient and disciplined; this talent density compounds across the org.
- Once demand clearly exceeds capacity: prioritize speed over selectivity.
- Focus on people's strengths rather than improving weaknesses — a lesson Brendan attributes partly to being dyslexic and navigating around his own limitations.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.