How GPT-4.1 changes prompting best practices

Executive overview

Newer models like GPT-4.1, Gemini 2.5 Pro, and Claude 3.7 have made many legacy prompting tricks obsolete. Better instruction-following means you no longer need all-caps yelling, emotional manipulation, or convoluted workarounds to get reliable behaviour. OpenAI's accompanying prompt guide codifies a cleaner system prompt structure and signals a broader convergence on XML delimiters across providers.

The core shift: prompting is becoming less about coaxing models and more about clearly structured specification.

What older tricks you can drop

  • All-caps emphasis, threats, and bribery prompts are no longer needed — models follow plain instructions reliably.
  • Negating terms ("never do X", "do not include Y") are now safe to use; models no longer accidentally do the forbidden thing.
  • For RAG setups, a simple "say I don't know if the answer isn't in the database" now reliably prevents hallucination without elaborate grounding workarounds.

Instruction following improvements

  • Models can now handle ordered multi-step instructions ("first do X, then Y, then Z") reliably — critical for agent pipelines.
  • Reranking and content requirements can be specified normally without forcing constructs.
  • Overconfidence suppression in RAG agents works with a plain instruction rather than complex system-prompt scaffolding.

Context window and global rules

  • GPT-4.1 supports a one-million-token context window.
  • Historically, AI IDEs (Cursor, Windsurf) struggled to honour global rules when project context filled the model's working memory.
  • Gemini 2.5 Pro is the first model to consistently reference global rules across large project contexts; GPT-4.1 also improves here.
  • Reliable global rule adherence reduces errors and improves one-shot completion rates in AI-assisted development.

Recommended system prompt structure (OpenAI guide)

  1. Role — the persona (e.g. professional coder, writer).
  2. Task/objective — the goal to achieve.
  3. Instructions — with subcategories for specificity.
  4. Reasoning steps — explicit step-by-step logic baked in (GPT-4.1 is generative, not a reasoning model, so reasoning must be specified).
  5. Output format — XML is now recommended for complex prompts; converges with Anthropic's long-standing practice.
  6. Examples — few-shot examples to increase response reliability.
  7. Context — large variable context block placed between instruction repetitions.

Instruction placement for large context prompts

  • Repeat critical instructions at both the top and bottom of the prompt when context is large.
  • If choosing only one position, the top outperforms the bottom.
  • This conflicts with previous caching advice (put static content at the top only) — the trade-off between cost savings and instruction retention is unresolved for GPT-4.1.

XML delimiter convergence

  • OpenAI now recommends XML tags for structuring complex system prompts.
  • Anthropic has used XML from the start; other providers are moving the same direction.
  • XML outperforms markdown delimiters when prompts are long and multi-sectioned.

Benchmark context: GPT-4.1 vs. the field

  • Fiction.livebench tests multi-fact reasoning across a long document — more representative of real use cases than needle-in-a-haystack tests.
  • Gemini 2.5 Pro leads this benchmark by a wide margin at 120k tokens; GPT-4.1 scores ~62, below Grok Mini and GPT-4.0.
  • For tool calling, Gemini 2.5 Pro again ranks highest; Claude 3.5 Sonnet outperforms 3.7 Sonnet for execution tasks.
  • Practical recommendation: use Gemini 2.5 Pro or Claude 3.7 for strategy/planning; Claude 3.5 Sonnet for execution; GPT-4.1 is improved but not the benchmark leader.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.