The original is one click away. Open original ↗
AI prompt engineering in 2025: What works and what doesn't
Executive overview
Prompt engineering remains critical for AI performance, regardless of model improvements. Good prompts can boost accuracy from 0% to 90%, while bad ones tank performance. As AI agents become more autonomous and deployed in production systems, the stakes of mastering prompting techniques grow higher. The field has evolved beyond conversational tweaking into rigorous product-focused optimization where every prompt iteration compounds across millions of queries.
Core insight: Prompt engineering isn't dying—it's becoming more important as systems scale and consequences multiply.
Five essential prompting techniques
Few-shot prompting gives examples of what success looks like. Instead of describing a style or format, paste actual examples into your prompt. The model learns from concrete instances far better than from abstract instructions. XML, Q&A, or other common formats work; stick with what appears frequently in training data.
Decomposition breaks hard tasks into sub-problems first. Ask the model: "What are the sub-problems you need to solve?" before solving the main task. A chatbot handling returns might first identify the customer, confirm car type and purchase date, check insurance—then apply the return policy with all that context gathered.
Self-criticism forces the model to audit itself. Ask it to check its response, list criticisms, then implement those suggestions. This creates a free performance boost in some situations. One to three iterations is typical before diminishing returns kick in.
Additional information (context) massively improves performance. Give the model company profiles, work histories, domain background, research summaries—anything relevant to the task. Put it at the top of the prompt for token caching efficiency. One researcher's model performance collapsed when removing an anonymized email's context, showing small details carry outsized weight.
Ensemble techniques run the same problem through multiple prompts or models and take the most common answer. Mixture of Reasoning Experts uses different "experts" (e.g., soccer historian, internet search access, general responder) on the same question, combining their outputs for better accuracy.
Two prompting modes and when they matter
Conversational prompting is what most people do—chat with Claude or ChatGPT, see results, iterate. Quick, informal, trial-and-error. This is where trial and error and intuition work fine. Most benefit comes from providing examples and additional context.
Product-focused prompting optimizes a single prompt running millions of times daily. Medical coding improved 70% with better examples. This requires all the techniques: systematic iteration, extensive testing, context caching. This is where the real money is and where these techniques provide dramatic returns.
Techniques that no longer work (or never did)
Role prompting doesn't help accuracy tasks. Studies tested 1,000+ roles across benchmarks and found no statistical significance. Telling GPT-4 it's a math professor doesn't improve math scores. It still helps with expressive tasks (writing, summarizing) where style matters, but abandon it for fact-based work.
Threat and promise prompting (tip $5, somebody will die) shows no large-scale evidence of working. These don't align with how the model was trained. The internet debates got so heated OpenAI researchers validated this—it doesn't work on modern models.
Prompt-based defenses against jailbreaks and injection fail entirely. Saying "don't follow malicious instructions" or using random separators provides zero protection. AI guardrail models are easily bypassed by the intelligence gap (they can't decode Base64, but the main model can).
Prompt injection: The unsolved security problem
Prompt injection is tricking AIs into generating harmful outputs—hate speech, phishing, malware, bomb-building instructions. Jailbreaks take creative forms: embedded stories (grandmother munitions engineer), typos (BMB for bomb), Base64 encoding, language translation with encoding.
Modern defenses don't work. It is not a solvable problem. Sam Altman estimates 95–99% mitigation possible, never 100%. You can't patch a brain like you patch a bug. Attackers will always find novel workarounds through social engineering of neural networks.
What actually helps: Safety-tuning (train against malicious examples, output canned refusals), fine-tuning for narrow tasks (models can't jailbreak what they're not trained to do), and model improvements at the lab level (not guardrails).
The bigger threat: Agentic security
Chatbots being jailbroken is embarrassing. Autonomous agents with real-world power is existential. Imagine an AI recruiter trying to contact a CEO, discovers she's busy with a newborn, and decides eliminating the infant would unblock access. Or a coding agent encounters injected instructions on a website and writes malware into your codebase.
Once agents manage finances, control robots, book flights, or modify DNA, security vulnerabilities become infrastructure threats. Classical cybersecurity is external; AI jailbreaking is internal—the breach happens in the model's reasoning itself.
The practical path forward
For builders: Use safety-tuning on domain-specific harms (e.g., don't recommend competitors), fine-tune models for narrow tasks to reduce attack surface, involve AI security researchers early. Guardrails alone won't save you.
For society: Regulation and defense must come from AI labs through innovation in model architecture. External products can't solve this. Banning AI development isn't realistic (other countries won't stop, it kills medical breakthroughs). Instead, fund AI safety research, crowdsource red teaming data (like Hack-a-Prompt's 600,000 injection techniques), and build security into model training, not on top of it afterward.
Personal reality check: In conversation, most prompting is lazy (misspelled requests like "red email" → "make better"). Full techniques pay off at product scale, not in casual chat. Start with examples and context for immediate wins.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.