How o1's reasoning capabilities unlock real-world startup applications

Executive overview

Current AI models already surpass what was considered impossible just years ago, yet most builders underestimate how big an unlock reasoning models represent — separate from raw scale. o1 is not an incremental improvement: it enables tasks that simply failed with GPT-4o, not because of more data, but because of chain-of-thought reasoning trained via reinforcement learning.

The implication for startups: the moat is not the model. It is proprietary eval sets, domain-specific data, and the technical depth to reach the final 10% accuracy that enterprise customers will pay for.

Reasoning-capable models create a step-function capability unlock — not a gradual improvement — in domains requiring complex multi-step thinking.

What o1 actually is and how it works

  • o1 is trained with reinforcement learning on chain-of-thought reasoning, not just next-token prediction.
  • Inspired by OpenAI's earlier Dota work: RL against itself over millions of games, with a reward function rather than brute force.
  • A large dataset of factually correct reasoning traces (math, science problems) was likely used to teach the model how to think, not just what to output.
  • The chain of thought is currently opaque and non-editable — a key limitation that future versions are expected to address.
  • Two parallel research directions: scaling up the base LM, and "unhobbling" models via RL at inference time. Both are advancing simultaneously.

Circuit design: Diode Computer

  • PCB design has four steps: system design, component selection, schematic layout, routing.
  • Routing is NP-complete; previously required armies of electrical engineers at companies like Nvidia and Intel.
  • GPT-4o could automate schematic layout but failed at system design and component selection.
  • With o1, the same prompts worked immediately — no new engineering, just a model swap.
  • The product now takes a high-level spec ("wearable heart rate monitor with accelerometer") and outputs a full PCB, including component matching from data sheets.
  • Architecture: GPT-4o mini extracts structured data from PDF data sheets; o1 handles component reasoning. Different models for different task types.

CAD design: Camfer

  • Camfer generates CAD designs from natural language — effectively a co-pilot for SolidWorks.
  • o1 solved partial differential equations (Navier-Stokes) to optimise airfoils for specific flight conditions — work that would normally require a mechanical engineer running simulations.
  • Built as a desktop executable that opens SolidWorks and operates it autonomously.
  • o1's reasoning traces are visible to the team, enabling fine-tuning of individual reasoning steps.
  • Target customer: aerospace and precision manufacturing, where 100% accuracy is required — not just hobbyists prototyping.

Customer support: GigaML

  • GigaML pivoted from fine-tuning open-source models to AI customer support after the fine-tuning market commoditised.
  • Previous implementation (GPT + rules) had a 70% error rate on complex cases — practically 0% accuracy on the hard edge cases enterprises care about.
  • After applying o1 (preview only) with rigorous evals following Jake Heller's framework: error rate dropped to 5%; accuracy on complex cases went from ~0% to 85%.
  • Zepto signed on, automating 30,000 support tickets per day — a role with months-long average tenure due to how rote the work is.
  • The unlock was not just the model but the combination: eval-driven development + o1 reasoning.

Evals as the real moat

  • The core moat question: as GPT-5 and further scaling arrives, what defensible advantage can a startup hold?
  • Proprietary eval sets — 10,000+ test cases built from data not publicly available online — are the answer.
  • Consumer and publicly available data will be absorbed into base models. The edge is everything else: legal, financial, engineering, arcane domain knowledge.
  • Getting embedded in enterprises via direct sales is the mechanism for acquiring this data.
  • Classic moats still apply on top: switching costs, distribution, brand, integrations, and UI.

What kinds of startups benefit most from o1

  • Any domain requiring multi-step reasoning over technical constraints: mechanical engineering, electrical engineering, chemical engineering, bioengineering.
  • Verticals where accuracy requirements are extremely high and the customer will pay a premium for the final 10%.
  • Applications where rules-based systems handle simple cases but fail on complex edge cases — that gap is now closable.
  • Strong technical teams capture disproportionate value; o1 does not commoditise technical depth, it raises the ceiling on what's achievable.

What to watch for next

  • Full o1 (beyond preview) represents another significant step up in capability.
  • o2 and o3 are in development and described as "not far behind."
  • Key missing feature: editable chain-of-thought, allowing users to branch or redirect reasoning mid-process.
  • AI coding agents may face headwinds — o1's built-in chain-of-thought subsumes infrastructure teams previously built themselves.
  • Phone-calling AI agents serve as a precedent: took two batches to move from "doesn't work" to "blowing up" once models crossed a threshold.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.