The original is one click away. Open original ↗
How Tom Brown went from GPT-3 to co-founding Anthropic
Executive overview
Tom Brown co-founded Anthropic after a non-linear path: self-taught engineer, early YC startup employee, brief stint at Google Brain, and a key role building GPT-3 at OpenAI. The core insight driving the founding was scaling laws — a straight-line relationship between compute and intelligence across 12 orders of magnitude — which convinced a small group that transformative AI was not a distant abstraction.
Anthropic launched with seven co-founders during COVID, no product roadmap, and no certainty of success. The bet paid off when Claude 3.5 Sonnet hit unexpected product-market fit in coding, and Claude Code — originally an internal engineering tool — became a market-leading agentic coding product.
The model that doesn't teach to the test wins the real-world test.
From software engineer to AI researcher
- Left Grouper (a blind-match dating app) in 2014 burned out; spent three months recovering before pivoting to AI.
- Ran out of personal runway; took a three-month Twitch contract to fund six months of self-study.
- Self-study stack: Coursera ML course, Kaggle projects, Linear Algebra Done Right, a statistics textbook, rented GPU via YC credits.
- Got into OpenAI via Greg Brockman; first nine months were pure engineering (StarCraft environment), no ML work.
- Key realization: there was a shortage of people who combined ML knowledge with distributed systems — that gap was the entry point.
Scaling laws and GPT-3
- Dario Amodei identified the core trend: reliable intelligence gains from more compute with the right recipe.
- The original scaling laws paper showed a straight line over 12 orders of magnitude — unprecedented in computer science.
- Danny Hernandez's complementary paper showed algorithmic efficiency improving at a compounding rate; the two trends stacked.
- Reaction inside the field was hostile: critics called it wasteful, inelegant, "just stacking more layers."
- The GPT-3 architecture shift from TPUs to GPUs was driven by PyTorch being a better software stack than TensorFlow — faster iteration, not raw hardware advantage.
Founding Anthropic
- The founding group came from OpenAI's safety and scaling orgs — the teams that took scaling laws most seriously.
- Started with seven co-founders, grew to ~25 ex-OpenAI people within months; all joined for the mission, not prestige.
- First year priorities: build training infrastructure, secure compute, handle company setup (Brex accounts and all).
- Had a Slack-bot version of Claude 1 running in the YC Slack ~nine months before ChatGPT launched — but hesitated to productize it, under-investing in serving infrastructure as a result.
- Didn't feel like a viable company until Claude 3.5 Sonnet ~a year before this interview.
Claude's coding advantage
- Claude gained coding share in YC batches from single digits to 80–90%+ for coding use cases.
- Anthropic does not have a team dedicated to gaming benchmarks; internal benchmarks and real-world dogfooding drive model development instead.
- Train-test mismatch explains the gap between benchmark scores and user preference.
- Claude 3.5 Sonnet's product-market fit was a surprise internally; Claude 3.7 Sonnet's unlock of agentic coding was also unexpected.
- Dog-fooding with internal engineers is a top priority — accelerating Anthropic's own engineers is treated as a primary signal.
Claude Code: from internal hack to product
- Claude Code began as a tool built by engineer Boris for internal use at Anthropic.
- Anthropic had previously committed to an API-first strategy, assuming startups would build better products on top.
- Claude Code broke that assumption — it outperformed existing market products for agentic coding.
- Proposed explanation: the team treated Claude itself as a primary user, designing around what Claude needs to be effective (right tools, right context).
- The same user-empathy framing that produced MCP (Model Context Protocol) — the tool-calling standard that succeeded where others failed.
Advice for founders building on AI APIs
- Claude Code's advantage was empathy for the model as a user, not a proprietary technical moat — a startup could replicate that.
- Anthropic wants to be the most developer- and API-focused lab; infrastructure for others to build on is a strategic priority.
- Large opportunity in coaching models to do useful business tasks: current agentic coding covers a tiny fraction of work done in businesses.
- Models need better context, better tool access, and better coaching — rich space for startups.
Compute infrastructure and bottlenecks
- Humanity is on track for the largest infrastructure build-out in history — larger than Apollo and Manhattan projects combined, on current trajectory.
- Compute spending is growing ~3x per year; locked in for next year, open for 2027.
- Anthropic uses GPUs, TPUs, and Trainium — three chip families — to absorb available capacity and match chips to workloads (inference vs. training).
- Cost of multi-platform strategy: performance engineering teams are split, multiplying software work.
- Power is the primary bottleneck, not chips — US permitting and data center construction are the binding constraints.
- Nuclear and renewables are both needed; nuclear permitting reform is a stated policy priority.
Career lessons
- "Wolf vs. dog" mindset: early startup experience forces you to hunt rather than wait for tasks — the most durable career lesson.
- Taking six months to build courage before joining OpenAI was a mistake in retrospect; taking the risk earlier would have been better.
- Credentials and big-tech jobs are increasingly irrelevant signals; work on things an idealized version of yourself would be proud of.
- Intrinsic motivation over extrinsic: choose work your smartest friends would find genuinely impressive, not socially legible.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.