How Fireworks AI scaled 100x in six months by solving GPU cost and latency

Executive overview

Most companies attempting an AI-first transition lack the infrastructure expertise to do it cheaply or quickly. Fireworks AI was built to fill that gap — offering the model serving, GPU efficiency, and multi-modal routing that only large teams like Meta's could previously access.

The core insight: GPU cost and latency are the two killers of AI product viability — solving both unlocks the 10x growth loops startups need.

The Fireworks AI origin

  • Lin Qiao and co-founders spent years at Meta building AI infrastructure for hundreds of millions of users
  • Friends at other large companies lacked comparable ML infra teams — and struggled through the AI-first transition
  • Fireworks AI's mission: let any company build on GenAI without a 100-person ML or infrastructure team
  • Two core pain points they set out to solve: high latency and prohibitive GPU costs
  • Software stack designed to minimise GPU usage — making 10x, 100x, 1000x scale economically viable for customers

Growth and traction

  • 100x traffic growth in six months
  • Processing 150 billion tokens per day; generating 1 million images per day
  • Series A: $25M from Benchmark; Series B: $52M led by Sequoia at $552M post-money valuation (4x step-up)

Startup operating principles

  • Only pursue 10x improvements — not incremental gains
  • Speed is a structural advantage: no coordination burden, no slow decision-making
  • Saying no is a core competency — time-slicing across too many priorities kills focus
  • The right question for every project: will this visibly move business metrics?
  • Constant urgency: "Why not today? Why not yesterday? Why not faster?"

The multimodal and compound AI roadmap

  • Text-in, text-out LLMs are no longer sufficient for real business tasks
  • Expanding to image understanding, image generation, audio models — 100+ models across modalities already on platform
  • Hallucination is addressed through a proprietary routing layer called function calling
  • Routes queries to the best-fit specialist model or external API (search, weather, stock prices)
  • This architecture — pulling together models and APIs — is the foundation of compound AI systems

Hiring philosophy

  • Aptitude over experience
  • "Fire in the valley" — hunger and motivation outweigh prior credentials
  • Fast learning and determined problem-solving matter most in a fast-moving technology environment

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.