How Tom Brown went from GPT-3 to co-founding Anthropic

Executive overview

Tom Brown co-founded Anthropic after a non-linear path: self-taught engineer, early YC startup employee, brief stint at Google Brain, and a key role building GPT-3 at OpenAI. The core insight driving the founding was scaling laws — a straight-line relationship between compute and intelligence across 12 orders of magnitude — which convinced a small group that transformative AI was not a distant abstraction.

Anthropic launched with seven co-founders during COVID, no product roadmap, and no certainty of success. The bet paid off when Claude 3.5 Sonnet hit unexpected product-market fit in coding, and Claude Code — originally an internal engineering tool — became a market-leading agentic coding product.

The model that doesn't teach to the test wins the real-world test.

From software engineer to AI researcher

Left Grouper (a blind-match dating app) in 2014 burned out; spent three months recovering before pivoting to AI.
Ran out of personal runway; took a three-month Twitch contract to fund six months of self-study.
Self-study stack: Coursera ML course, Kaggle projects, Linear Algebra Done Right, a statistics textbook, rented GPU via YC credits.
Got into OpenAI via Greg Brockman; first nine months were pure engineering (StarCraft environment), no ML work.
Key realization: there was a shortage of people who combined ML knowledge with distributed systems — that gap was the entry point.

Scaling laws and GPT-3

Dario Amodei identified the core trend: reliable intelligence gains from more compute with the right recipe.
The original scaling laws paper showed a straight line over 12 orders of magnitude — unprecedented in computer science.
Danny Hernandez's complementary paper showed algorithmic efficiency improving at a compounding rate; the two trends stacked.
Reaction inside the field was hostile: critics called it wasteful, inelegant, "just stacking more layers."
The GPT-3 architecture shift from TPUs to GPUs was driven by PyTorch being a better software stack than TensorFlow — faster iteration, not raw hardware advantage.

Founding Anthropic

The founding group came from OpenAI's safety and scaling orgs — the teams that took scaling laws most seriously.
Started with seven co-founders, grew to ~25 ex-OpenAI people within months; all joined for the mission, not prestige.
First year priorities: build training infrastructure, secure compute, handle company setup (Brex accounts and all).
Had a Slack-bot version of Claude 1 running in the YC Slack ~nine months before ChatGPT launched — but hesitated to productize it, under-investing in serving infrastructure as a result.
Didn't feel like a viable company until Claude 3.5 Sonnet ~a year before this interview.

Claude's coding advantage

Claude gained coding share in YC batches from single digits to 80–90%+ for coding use cases.
Anthropic does not have a team dedicated to gaming benchmarks; internal benchmarks and real-world dogfooding drive model development instead.
Train-test mismatch explains the gap between benchmark scores and user preference.
Claude 3.5 Sonnet's product-market fit was a surprise internally; Claude 3.7 Sonnet's unlock of agentic coding was also unexpected.
Dog-fooding with internal engineers is a top priority — accelerating Anthropic's own engineers is treated as a primary signal.

Claude Code: from internal hack to product

Claude Code began as a tool built by engineer Boris for internal use at Anthropic.
Anthropic had previously committed to an API-first strategy, assuming startups would build better products on top.
Claude Code broke that assumption — it outperformed existing market products for agentic coding.
Proposed explanation: the team treated Claude itself as a primary user, designing around what Claude needs to be effective (right tools, right context).
The same user-empathy framing that produced MCP (Model Context Protocol) — the tool-calling standard that succeeded where others failed.

Advice for founders building on AI APIs

Claude Code's advantage was empathy for the model as a user, not a proprietary technical moat — a startup could replicate that.
Anthropic wants to be the most developer- and API-focused lab; infrastructure for others to build on is a strategic priority.
Large opportunity in coaching models to do useful business tasks: current agentic coding covers a tiny fraction of work done in businesses.
Models need better context, better tool access, and better coaching — rich space for startups.

Compute infrastructure and bottlenecks

Humanity is on track for the largest infrastructure build-out in history — larger than Apollo and Manhattan projects combined, on current trajectory.
Compute spending is growing ~3x per year; locked in for next year, open for 2027.
Anthropic uses GPUs, TPUs, and Trainium — three chip families — to absorb available capacity and match chips to workloads (inference vs. training).
Cost of multi-platform strategy: performance engineering teams are split, multiplying software work.
Power is the primary bottleneck, not chips — US permitting and data center construction are the binding constraints.
Nuclear and renewables are both needed; nuclear permitting reform is a stated policy priority.

Career lessons

"Wolf vs. dog" mindset: early startup experience forces you to hunt rather than wait for tasks — the most durable career lesson.
Taking six months to build courage before joining OpenAI was a mistake in retrospect; taking the risk earlier would have been better.
Credentials and big-tech jobs are increasingly irrelevant signals; work on things an idealized version of yourself would be proud of.
Intrinsic motivation over extrinsic: choose work your smartest friends would find genuinely impressive, not socially legible.

How Tom Brown went from GPT-3 to co-founding Anthropic

Executive overview

From software engineer to AI researcher

Scaling laws and GPT-3

Founding Anthropic

Claude's coding advantage

Claude Code: from internal hack to product

Advice for founders building on AI APIs

Compute infrastructure and bottlenecks

Career lessons

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

From software engineer to AI researcher

Scaling laws and GPT-3

Founding Anthropic

Claude's coding advantage

Claude Code: from internal hack to product

Advice for founders building on AI APIs

Compute infrastructure and bottlenecks

Career lessons

More like this — when you're ready for early access.

More in Founder Stories

What a $7B founder learned building Glean from scratch

From four failed co-founder splits to a $66M solo startup

The real cost of avoiding hard conversations in leadership

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.