Windsurf CEO: how a GPU startup pivoted twice to build an AI coding tool

Executive overview

Most AI developer tools iterated on GitHub Copilot. Windsurf's founders scrapped their GPU infrastructure business in a weekend, bet on agentic coding before the models could support it, and shipped a forked VS Code IDE in under three months.

The core insight is that every competitive advantage is a depreciating asset — continuous, compounding insight execution is the only durable moat.

Startups die slowly when they stop generating and testing new hypotheses, not when individual bets fail.

The pivot: from GPU virtualisation to Codeium

  • Company started in 2021 as ExaFunction — VMware-style GPU virtualisation for deep learning workloads
  • By mid-2022: managing 10,000+ GPUs, $2M+ revenue, 8 people, Series A raised
  • Saw that transformer models would commoditise custom ML pipelines — eliminating the rationale for their product
  • Decision made in a single weekend; entire team told Monday; coding on Codeium started the same day
  • Key conditions enabling the pivot: lean team, cash-flow positive, $28M raised, full team conviction on the new direction
  • Shipped a VS Code extension within two months and posted on Hacker News

Building a better autocomplete: early technical edges

  • First version used a free open-source model — materially worse than Copilot, but free
  • Retained inference speed advantage from prior GPU runtime infrastructure
  • Trained their own fill-in-the-middle model within months — outperformed Copilot on that specific capability by early 2023
  • Expanded to all major IDEs (JetBrains, Eclipse, Vim) early, driven by enterprise need — JP Morgan, Dell were early customers
  • Shared cross-IDE infrastructure meant low marginal cost to add each new editor

Why they built their own IDE (Windsurf)

  • By mid-2024 enterprise revenue was well over eight figures from the Codeium extension product
  • VS Code's ceiling limited the agentic UX they wanted to ship
  • Forked VS Code; shipped across all operating systems in under three months with an engineering team of fewer than 25
  • Core bet: agents, not chat+autocomplete, were the right paradigm — Windsurf was the first agentic editor
  • Deliberately avoided over-configuration; invested instead in deep code-base understanding and fast in-place edits
  • Maintained a unified timeline of developer and agent actions so the AI always has current context

Context retrieval: going beyond RAG

  • Rejected pure vector-database RAG as insufficient for code — precision and recall need to be very high
  • Built a multi-signal retrieval system: keyword search, vector search, abstract syntax tree parsing, and GPU-powered real-time reranking
  • Motivation: a query like "upgrade all instances of this API" fails if embedding search misses even a few hits
  • Prior GPU infrastructure enabled running reranking across large code chunks in under one second
  • Design principle: strive for what works, not complexity; added AST parsing only after evals proved it necessary

Evaluation infrastructure

  • Used a property unique to code: it can be run
  • Core eval: take open-source commits with tests, delete implementation, ask the model to reproduce it; measure retrieval accuracy, intent accuracy, and test-pass rate
  • Also masks partial changes to test intent prediction (similar to Google's autocomplete task)
  • Evals drove investment decisions — complexity was added only when evals showed it helped
  • Combination of eval-driven and vibe-driven improvement: evals suited to opaque retrieval systems; user data and intuition suited to UI-level changes

Using Windsurf in production

  • Internal teams commit code changes frequently as the primary safety net when using agents
  • Agents can change more than intended if intent is underspecified — surgical prompting and frequent commits reduce this
  • Boilerplate and repetitive tasks (e.g. server deployments) now handled entirely within Windsurf workflows
  • Non-technical employees build internal tools directly — removing the PM/engineer backlog bottleneck
  • Non-technical users represent a meaningful share of active users; many never open the code editor, living entirely in the Cascade agent panel

Hiring and interviews in the AI coding era

  • Still maintain a high technical bar — problem-solving skill remains the key proxy
  • Interviews include AI-assisted sessions to screen out people who resist the tools
  • Also include in-person sessions without AI — basic coding without AI assistance indicates reasoning ability
  • Open-ended system design questions with trade-offs replacing pure algorithmic questions; no single correct answer
  • Engineering headcount is growing, not shrinking — the problem ceiling is high and the 99% time-reduction mission requires many more capabilities (design, deploy, debug)

Opportunities for new AI coding startups

  • Large legacy migration market: COBOL-to-Java, JVM version upgrades, Rails upgrades — billions spent annually, few specialised tools
  • Automated alert and bug resolution: significant enterprise spend, no clearly dominant product yet
  • Both are niches deep enough to support large companies, not just one winner
  • General principle: pick one thing and do it exceptionally well rather than building another general coding assistant

Compounding advantage and the competition

  • Every insight depreciates; Nvidia-style compounding requires continuous new bets
  • Comfortable with a 50% failure rate on internal bets — 100% success signals insufficient ambition
  • Competitor landscape has shifted repeatedly (Copilot → Devin → Cursor); long-term strategy + execution flexibility matters more than reacting to rivals
  • Alpha over base models must grow proportionally as base models improve — the gap between foundation model output and 100% is the product opportunity
  • Treat pivots as a badge of honour; most founders fail because they prefer consistent failure over the discomfort of changing course

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.