Recursive self-improvement as a startup alternative to fine-tuning

Executive overview

Fine-tuning frontier models costs millions and is obsolete the moment a new model ships. Poetic builds "reasoning harnesses" — agentic systems layered on top of existing LLMs — that outperform the base models without retraining.

The harness is compatible with any new model, so performance gains compound rather than reset. A seven-person team hit state-of-the-art on both ARC-AGI v2 and Humanity's Last Exam at a fraction of frontier training costs.

Automated prompt and reasoning-strategy optimisation is a more durable moat than fine-tuning.

What Poetic builds and why it beats fine-tuning

  • Fine-tuning locks you to one model version; the next frontier release erases the investment
  • A reasoning harness is code, prompts, and data sitting on top of one or more LLMs
  • When a new base model ships, the same harness applies immediately — often with a larger performance jump
  • Poetic's meta system generates these harnesses automatically, far faster and cheaper than hand-building them
  • The meta system can also optimise an existing agent a startup has already built — targeting prompts, reasoning strategies, or the full pipeline

Benchmark results

  • ARC-AGI v2: Poetic scored 54%, versus Gemini 3 Deep Think at 45% — achieved on Gemini 3 Pro (a cheaper model), at half the cost (~$32 vs ~$70 per problem)
  • Humanity's Last Exam: 55% versus Anthropic's Claude Opus 4.6 at 53.1%, with optimisation costs under $100K
  • Both results produced by a team of seven research scientists and engineers

How the meta system works

  • The meta system observes a task, generates candidate reasoning strategies, and recursively refines them
  • Outputs include non-obvious prompts — ones a human would not write — and deliberately simple (sometimes technically wrong) examples that still improve performance
  • The system decides how much context stuffing, example generation, or re-ranking is needed; humans do not tune this manually
  • Prompt optimisation alone yields modest gains; adding reasoning strategies encoded in code is where large jumps occur (one task: 5% → 95% accuracy)
  • Each underlying model has its own S-curve; as models and the meta system both improve, the ceiling keeps rising

Who Poetic is for

  • Startups that have exhausted context engineering and can't get reliable, robust results from base models
  • Teams that have already built an agent but need a performance step-change
  • Any builder who wants to stay at or above SOTA without re-running expensive training each model cycle
  • Early access available at poetic.ai

Ian Fischer's path from mobile tools to DeepMind to Poetic

  • First YC company (Aportable) was acquired by Google; used the transition to move into AI and robotics research
  • Realised hardware was not the focus; pivoted fully into machine learning research at Google Brain and DeepMind for ~10 years
  • Advice for engineers entering AI: experiment daily, push the boundaries of what models can do, build what you want to build

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.