The original is one click away. Open original ↗
How o1's reasoning capabilities unlock real-world startup applications
Executive overview
Current AI models already surpass what was considered impossible just years ago, yet most builders underestimate how big an unlock reasoning models represent — separate from raw scale. o1 is not an incremental improvement: it enables tasks that simply failed with GPT-4o, not because of more data, but because of chain-of-thought reasoning trained via reinforcement learning.
The implication for startups: the moat is not the model. It is proprietary eval sets, domain-specific data, and the technical depth to reach the final 10% accuracy that enterprise customers will pay for.
Reasoning-capable models create a step-function capability unlock — not a gradual improvement — in domains requiring complex multi-step thinking.
What o1 actually is and how it works
- o1 is trained with reinforcement learning on chain-of-thought reasoning, not just next-token prediction.
- Inspired by OpenAI's earlier Dota work: RL against itself over millions of games, with a reward function rather than brute force.
- A large dataset of factually correct reasoning traces (math, science problems) was likely used to teach the model how to think, not just what to output.
- The chain of thought is currently opaque and non-editable — a key limitation that future versions are expected to address.
- Two parallel research directions: scaling up the base LM, and "unhobbling" models via RL at inference time. Both are advancing simultaneously.
Circuit design: Diode Computer
- PCB design has four steps: system design, component selection, schematic layout, routing.
- Routing is NP-complete; previously required armies of electrical engineers at companies like Nvidia and Intel.
- GPT-4o could automate schematic layout but failed at system design and component selection.
- With o1, the same prompts worked immediately — no new engineering, just a model swap.
- The product now takes a high-level spec ("wearable heart rate monitor with accelerometer") and outputs a full PCB, including component matching from data sheets.
- Architecture: GPT-4o mini extracts structured data from PDF data sheets; o1 handles component reasoning. Different models for different task types.
CAD design: Camfer
- Camfer generates CAD designs from natural language — effectively a co-pilot for SolidWorks.
- o1 solved partial differential equations (Navier-Stokes) to optimise airfoils for specific flight conditions — work that would normally require a mechanical engineer running simulations.
- Built as a desktop executable that opens SolidWorks and operates it autonomously.
- o1's reasoning traces are visible to the team, enabling fine-tuning of individual reasoning steps.
- Target customer: aerospace and precision manufacturing, where 100% accuracy is required — not just hobbyists prototyping.
Customer support: GigaML
- GigaML pivoted from fine-tuning open-source models to AI customer support after the fine-tuning market commoditised.
- Previous implementation (GPT + rules) had a 70% error rate on complex cases — practically 0% accuracy on the hard edge cases enterprises care about.
- After applying o1 (preview only) with rigorous evals following Jake Heller's framework: error rate dropped to 5%; accuracy on complex cases went from ~0% to 85%.
- Zepto signed on, automating 30,000 support tickets per day — a role with months-long average tenure due to how rote the work is.
- The unlock was not just the model but the combination: eval-driven development + o1 reasoning.
Evals as the real moat
- The core moat question: as GPT-5 and further scaling arrives, what defensible advantage can a startup hold?
- Proprietary eval sets — 10,000+ test cases built from data not publicly available online — are the answer.
- Consumer and publicly available data will be absorbed into base models. The edge is everything else: legal, financial, engineering, arcane domain knowledge.
- Getting embedded in enterprises via direct sales is the mechanism for acquiring this data.
- Classic moats still apply on top: switching costs, distribution, brand, integrations, and UI.
What kinds of startups benefit most from o1
- Any domain requiring multi-step reasoning over technical constraints: mechanical engineering, electrical engineering, chemical engineering, bioengineering.
- Verticals where accuracy requirements are extremely high and the customer will pay a premium for the final 10%.
- Applications where rules-based systems handle simple cases but fail on complex edge cases — that gap is now closable.
- Strong technical teams capture disproportionate value; o1 does not commoditise technical depth, it raises the ceiling on what's achievable.
What to watch for next
- Full o1 (beyond preview) represents another significant step up in capability.
- o2 and o3 are in development and described as "not far behind."
- Key missing feature: editable chain-of-thought, allowing users to branch or redirect reasoning mid-process.
- AI coding agents may face headwinds — o1's built-in chain-of-thought subsumes infrastructure teams previously built themselves.
- Phone-calling AI agents serve as a precedent: took two batches to move from "doesn't work" to "blowing up" once models crossed a threshold.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.