How GPT-4.1 changes prompting best practices

Executive overview

Newer models like GPT-4.1, Gemini 2.5 Pro, and Claude 3.7 have made many legacy prompting tricks obsolete. Better instruction-following means you no longer need all-caps yelling, emotional manipulation, or convoluted workarounds to get reliable behaviour. OpenAI's accompanying prompt guide codifies a cleaner system prompt structure and signals a broader convergence on XML delimiters across providers.

The core shift: prompting is becoming less about coaxing models and more about clearly structured specification.

What older tricks you can drop

All-caps emphasis, threats, and bribery prompts are no longer needed — models follow plain instructions reliably.
Negating terms ("never do X", "do not include Y") are now safe to use; models no longer accidentally do the forbidden thing.
For RAG setups, a simple "say I don't know if the answer isn't in the database" now reliably prevents hallucination without elaborate grounding workarounds.

Instruction following improvements

Models can now handle ordered multi-step instructions ("first do X, then Y, then Z") reliably — critical for agent pipelines.
Reranking and content requirements can be specified normally without forcing constructs.
Overconfidence suppression in RAG agents works with a plain instruction rather than complex system-prompt scaffolding.

Context window and global rules

GPT-4.1 supports a one-million-token context window.
Historically, AI IDEs (Cursor, Windsurf) struggled to honour global rules when project context filled the model's working memory.
Gemini 2.5 Pro is the first model to consistently reference global rules across large project contexts; GPT-4.1 also improves here.
Reliable global rule adherence reduces errors and improves one-shot completion rates in AI-assisted development.

Recommended system prompt structure (OpenAI guide)

Role — the persona (e.g. professional coder, writer).
Task/objective — the goal to achieve.
Instructions — with subcategories for specificity.
Reasoning steps — explicit step-by-step logic baked in (GPT-4.1 is generative, not a reasoning model, so reasoning must be specified).
Output format — XML is now recommended for complex prompts; converges with Anthropic's long-standing practice.
Examples — few-shot examples to increase response reliability.
Context — large variable context block placed between instruction repetitions.

Instruction placement for large context prompts

Repeat critical instructions at both the top and bottom of the prompt when context is large.
If choosing only one position, the top outperforms the bottom.
This conflicts with previous caching advice (put static content at the top only) — the trade-off between cost savings and instruction retention is unresolved for GPT-4.1.

XML delimiter convergence

OpenAI now recommends XML tags for structuring complex system prompts.
Anthropic has used XML from the start; other providers are moving the same direction.
XML outperforms markdown delimiters when prompts are long and multi-sectioned.

Benchmark context: GPT-4.1 vs. the field

Fiction.livebench tests multi-fact reasoning across a long document — more representative of real use cases than needle-in-a-haystack tests.
Gemini 2.5 Pro leads this benchmark by a wide margin at 120k tokens; GPT-4.1 scores ~62, below Grok Mini and GPT-4.0.
For tool calling, Gemini 2.5 Pro again ranks highest; Claude 3.5 Sonnet outperforms 3.7 Sonnet for execution tasks.
Practical recommendation: use Gemini 2.5 Pro or Claude 3.7 for strategy/planning; Claude 3.5 Sonnet for execution; GPT-4.1 is improved but not the benchmark leader.

Building $10,000 software MVPs with AI in under an hour

Brett Malinowski May 14, 2026

AI tools & automation 9

MVP & prototyping 8

Automation & tools 6

One person with Claude Code can replace a three-person agency team
Partner with niche creators who already have audience and distribution
Use pre-built components for payments and chat — don't build infrastructure from scratch

AI strategy & adoption

YouTube

How to actually make money with AI: five brutal truths

Dan Martell May 14, 2026

AI strategy & adoption 9

Business models 8

Automation & tools 5

AI is a hammer — you still need to find the nail
Validate with manual "Wizard of Oz" delivery before automating anything
Future orgs are workflow-based; humans own outcomes, agents own tasks

AI strategy & adoption

YouTube

How to choose the right home for your AI workflow

Dylan Davis May 13, 2026

AI strategy & adoption 9

Automation & tools 6

AI defaults to building apps — that's usually the wrong choice
85–90% of workflows belong inside a project or skill, not deployed code
Deploying an app triggers per-token API costs that subscriptions don't cover

How GPT-4.1 changes prompting best practices

Executive overview

What older tricks you can drop

Instruction following improvements

Context window and global rules

Recommended system prompt structure (OpenAI guide)

Instruction placement for large context prompts

XML delimiter convergence

Benchmark context: GPT-4.1 vs. the field

More like this — when you're ready for early access.

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.

Executive overview

What older tricks you can drop

Instruction following improvements

Context window and global rules

Recommended system prompt structure (OpenAI guide)

Instruction placement for large context prompts

XML delimiter convergence

Benchmark context: GPT-4.1 vs. the field

More like this — when you're ready for early access.

More in AI

Building $10,000 software MVPs with AI in under an hour

How to actually make money with AI: five brutal truths

How to choose the right home for your AI workflow

Get early access to the full library.

Be among the first to get personalised recommendations tailored to your stage in business.

Be among the first to get personalised recommendations tailored to your stage in business.