How self-reflection prompting improves GPT-5 output quality

Executive overview

Most prompts hand a task to the model and accept whatever comes back first. Self-reflection prompting inserts a private quality loop before any output reaches you.

The model builds its own rubric, drafts a response, critiques it against the rubric, and redrafts until every category passes a 90% threshold — all internally. Research shows 5–40% quality improvement depending on task type, with reduced hallucination rates. OpenAI validates the technique in their official prompting guide.

Make the model grade its own work before it shows you anything.

How the self-reflection loop works

  • Model creates a private rubric with 5–7 categories tailored to the task
  • Drafts an initial response, then critiques it against each rubric category
  • Redrafts any section that falls below the 90% threshold
  • Iterates 5–7 times internally; only the final output is returned
  • Cap at 5–7 categories to prevent overthinking, which degrades quality
  • Keep all reasoning internal so tokens go into inference, not visible output

Base prompt structure

  • Open with: "Before answering, create a private rubric with 5–7 excellence criteria for this task"
  • Instruct the model to draft, critique, and redo until all rubric categories pass
  • Explicitly state: show only the final result, not the internal iterations
  • Optional: add an alternate-draft step for high-stakes tasks — model drafts a second version and selects the stronger one
  • Add stopping criteria to prevent over-iteration once the rubric is satisfied

Practical examples

Research

  • Specify the audience (analyst vs. executive) — this shapes the rubric categories the model chooses
  • Add an explicit critique: at least 3 claims must be backed by credible sources
  • Define the output format: What's new / Risk / Next steps
  • Explicit rubric categories: accuracy, claim-source match, recency, completeness, clarity

Writing (emails, blogs, LinkedIn posts)

  • Voice specification drives the rubric categories
  • Explicit critique: hook must be concrete, benefit-led, and cliche-free in the first two lines; redo everything if not
  • Explicit rubric: hook strength, specificity, structure, brevity, tone fit, scannability

Analysis

  • Goal statement drives the rubric categories
  • Require top 3 assumptions with confidence levels in the output
  • Explicit critique: address at least one strong counter-argument internally; redo if missing
  • Output: decision, rationale, risks, next 3 steps

When to use each approach

  • Low stakes / quick fix: Add rubric only; let the model define its own categories; no explicit critiques
  • High stakes (public-facing, factual, code): Name the rubric categories explicitly; add 1–3 explicit self-critiques
  • Cursor uses this technique daily for its AI coding product
  • GPT-5 outperformed o3 on an agentic coding benchmark when this technique was applied

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.