The original is one click away. Open original ↗
Recursive self-improvement as a startup alternative to fine-tuning
Executive overview
Fine-tuning frontier models costs millions and is obsolete the moment a new model ships. Poetic builds "reasoning harnesses" — agentic systems layered on top of existing LLMs — that outperform the base models without retraining.
The harness is compatible with any new model, so performance gains compound rather than reset. A seven-person team hit state-of-the-art on both ARC-AGI v2 and Humanity's Last Exam at a fraction of frontier training costs.
Automated prompt and reasoning-strategy optimisation is a more durable moat than fine-tuning.
What Poetic builds and why it beats fine-tuning
- Fine-tuning locks you to one model version; the next frontier release erases the investment
- A reasoning harness is code, prompts, and data sitting on top of one or more LLMs
- When a new base model ships, the same harness applies immediately — often with a larger performance jump
- Poetic's meta system generates these harnesses automatically, far faster and cheaper than hand-building them
- The meta system can also optimise an existing agent a startup has already built — targeting prompts, reasoning strategies, or the full pipeline
Benchmark results
- ARC-AGI v2: Poetic scored 54%, versus Gemini 3 Deep Think at 45% — achieved on Gemini 3 Pro (a cheaper model), at half the cost (~$32 vs ~$70 per problem)
- Humanity's Last Exam: 55% versus Anthropic's Claude Opus 4.6 at 53.1%, with optimisation costs under $100K
- Both results produced by a team of seven research scientists and engineers
How the meta system works
- The meta system observes a task, generates candidate reasoning strategies, and recursively refines them
- Outputs include non-obvious prompts — ones a human would not write — and deliberately simple (sometimes technically wrong) examples that still improve performance
- The system decides how much context stuffing, example generation, or re-ranking is needed; humans do not tune this manually
- Prompt optimisation alone yields modest gains; adding reasoning strategies encoded in code is where large jumps occur (one task: 5% → 95% accuracy)
- Each underlying model has its own S-curve; as models and the meta system both improve, the ceiling keeps rising
Who Poetic is for
- Startups that have exhausted context engineering and can't get reliable, robust results from base models
- Teams that have already built an agent but need a performance step-change
- Any builder who wants to stay at or above SOTA without re-running expensive training each model cycle
- Early access available at poetic.ai
Ian Fischer's path from mobile tools to DeepMind to Poetic
- First YC company (Aportable) was acquired by Google; used the transition to move into AI and robotics research
- Realised hardware was not the focus; pivoted fully into machine learning research at Google Brain and DeepMind for ~10 years
- Advice for engineers entering AI: experiment daily, push the boundaries of what models can do, build what you want to build
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.