Recursive self-improvement as a startup alternative to fine-tuning

Executive overview

Fine-tuning frontier models costs millions and is obsolete the moment a new model ships. Poetic builds "reasoning harnesses" — agentic systems layered on top of existing LLMs — that outperform the base models without retraining.

The harness is compatible with any new model, so performance gains compound rather than reset. A seven-person team hit state-of-the-art on both ARC-AGI v2 and Humanity's Last Exam at a fraction of frontier training costs.

Automated prompt and reasoning-strategy optimisation is a more durable moat than fine-tuning.

What Poetic builds and why it beats fine-tuning

Fine-tuning locks you to one model version; the next frontier release erases the investment
A reasoning harness is code, prompts, and data sitting on top of one or more LLMs
When a new base model ships, the same harness applies immediately — often with a larger performance jump
Poetic's meta system generates these harnesses automatically, far faster and cheaper than hand-building them
The meta system can also optimise an existing agent a startup has already built — targeting prompts, reasoning strategies, or the full pipeline

Benchmark results

ARC-AGI v2: Poetic scored 54%, versus Gemini 3 Deep Think at 45% — achieved on Gemini 3 Pro (a cheaper model), at half the cost (~$32 vs ~$70 per problem)
Humanity's Last Exam: 55% versus Anthropic's Claude Opus 4.6 at 53.1%, with optimisation costs under $100K
Both results produced by a team of seven research scientists and engineers

How the meta system works

The meta system observes a task, generates candidate reasoning strategies, and recursively refines them
Outputs include non-obvious prompts — ones a human would not write — and deliberately simple (sometimes technically wrong) examples that still improve performance
The system decides how much context stuffing, example generation, or re-ranking is needed; humans do not tune this manually
Prompt optimisation alone yields modest gains; adding reasoning strategies encoded in code is where large jumps occur (one task: 5% → 95% accuracy)
Each underlying model has its own S-curve; as models and the meta system both improve, the ceiling keeps rising

Who Poetic is for

Startups that have exhausted context engineering and can't get reliable, robust results from base models
Teams that have already built an agent but need a performance step-change
Any builder who wants to stay at or above SOTA without re-running expensive training each model cycle
Early access available at poetic.ai

Ian Fischer's path from mobile tools to DeepMind to Poetic

First YC company (Aportable) was acquired by Google; used the transition to move into AI and robotics research
Realised hardware was not the focus; pivoted fully into machine learning research at Google Brain and DeepMind for ~10 years
Advice for engineers entering AI: experiment daily, push the boundaries of what models can do, build what you want to build

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks