How to audit your Claude prompts after a model upgrade

Executive overview

When Anthropic releases a new model, most use cases improve — but some quietly degrade. Four behavioral shifts in Claude Opus 4.7 (literalness, response length variability, tone, and tool-use skipping) can silently break setups that worked perfectly before.

Run the Canary Test: four targeted checks on your three to five most critical projects or skills. Takes 15 minutes.

More intelligent models need fewer instructions, but more precise ones.

The four habits that changed in Opus 4.7

  • Literal — the model interprets every word exactly; vague terms now produce unpredictable results
  • Adaptive length — the model sizes responses to perceived task complexity, so output length is no longer consistent
  • Direct tone — personality shifted; words like "warm" or "conversational" produce different results than in 4.6
  • Tool skipping — the model may decide a tool call is unnecessary and silently omit it

Check 1: clarity

  • Audit system prompts for vague terms: "worth pursuing," "appropriate," "handle correctly," "flag anything important"
  • Replace each with explicit criteria — e.g. "worth pursuing = company >50 employees, contact is director or above, pain point stated in prior conversation"
  • If the model can't infer intent, it either asks (good) or acts on its own interpretation (often bad)

Check 2: length

  • Test whether outputs are consistent across multiple runs of the same input
  • If length varies, add an explicit format constraint — e.g. "always return exactly five bullets, one sentence each"
  • Variability is caused by adaptive thinking: the model sizes effort to perceived complexity

Check 3: tone

  • Adjectives alone ("warm," "casual") are insufficient — the model's baseline for those words has shifted
  • Collect three to five real examples of writing you're happy with
  • Add them to the project knowledge base and instruct the model to match rhythm, openers, and sentence length
  • Teach by example rather than by description

Check 4: tool use (actions)

  • Verify each tool call in every multi-step workflow still fires reliably
  • Silent omissions compound — a skipped CRM update may go unnoticed for weeks
  • Fix: be explicit in the prompt — "when given a meeting transcript, you must update the Airtable CRM before drafting the email"
  • Specify the required order; don't leave sequencing to the model's judgment

Golden input / golden output

  • For each high-priority use case, save one representative input and the best output you've ever received
  • Store both in a folder labelled with model name, date, and use case
  • When a new model ships, rerun the saved input and compare outputs directly
  • This gives a concrete before/after signal — not guesswork

Subtracting from prompts

  • As models get more intelligent, over-specified prompts become a liability
  • Expect to remove instructions more often than add them
  • Precision matters more than volume — every word in a prompt is acted on literally

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.