Four AI Capabilities That Crossed the Threshold in Six Months

Executive overview

AI capabilities cross a usability threshold unpredictably — tasks that failed months ago may now work out of the box. Missing that moment means competitors gain a lead while you play catch-up. The fix is a systematic retesting process tied to model releases and quarterly reviews.

The competitive edge is not adopting AI early — it is knowing when a failed test is worth retrying.

Three capabilities that crossed the threshold

  • Complex PDF extraction: a project that took five weeks of engineering and six models six months ago now works in one prompt with Gemini Flash.
  • Image consistency: generating images with a consistent face, product, or logo across outputs was nearly impossible; current models handle it accurately.
  • Handwriting extraction: accuracy on cursive and poor handwriting has jumped from ~60% to ~100% with models like Gemini Pro.

Why tests fail for the wrong reason

  • Vague prompts and missing context make AI appear incapable when it is not.
  • Before writing off a capability, confirm the prompt has: specific task context, clear expectations, and concrete success criteria.
  • Only after a high-quality prompt still fails can you judge the model's actual limit.

The AI wish list system

  • Keep a running list of tasks AI cannot yet do: record the task, the date tested, and the result.
  • Store it anywhere you will actually use — Google Doc, Apple Notes, or similar.
  • This list is a retesting queue, not a graveyard.

When to retest

  • New model release from Anthropic, OpenAI, or Google — test relevant wish-list items immediately.
  • New feature release from any of the big three — test if it addresses your use case.
  • Quarterly calendar reminder — review all items even without a major announcement, as models improve silently in the background.

Retesting protocol

  • Run the wish-list task with a basic prompt against the new model or feature.
  • If it works, remove it from the list.
  • If it fails, improve the prompt: add context, sharpen expectations, check current best practices.
  • Only if it still fails with a strong prompt does the item go back on the list with an updated date.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.