Automating client data extraction with an AI funnel

Executive overview

Every client sends the same data in a different format — PDFs, photos, messy spreadsheets — and copying it field by field is pure waste. A four-part AI system eliminates this by funnelling chaotic inputs into a consistent output automatically.

The system works in three phases: define the output fields, set rules to prevent hallucination, then build and test manually before scaling.

The AI will always try to give you an answer — your rules must make "blank" safer than "wrong."

Defining the output

  • Start with the form or template you already fill in — its fields are your extraction targets.
  • For each field, note the data type (text, date, number) and whether it is required or optional.
  • If field names vary by client, add aliases (e.g. "vendor name, also known as supplier, payee").
  • No clean template? Feed three to five completed examples to a high-end model and prompt it to extract every field that appears, flagging which are in all documents (required) vs. some (optional).

Setting rules to prevent hallucination

Three rules constrain the AI and shift it away from guessing:

  • Grounding — instruct the AI to extract only from the uploaded document, not its own knowledge or the internet.
  • Changed incentives — tell it explicitly that a blank answer is preferred over a wrong one; frame any wrong answer as 3× worse than a blank.
  • Safety net (show your work) — require the AI to include the exact source quote for every extracted value, enabling fast auditing.

Prompt structure

The base system prompt contains five parts in order:

  1. Persona — "You are a document extraction specialist."
  2. Target template — the list of fields with types and required/optional status.
  3. Rules — grounding, incentives, safety net.
  4. Audit output format — a table with columns: field name, extracted value, source quote, status.
  5. Context — brief explanation of why the task matters.

Status labels in the audit table:

  • Extracted — exact match pulled from document.
  • Inferred — AI guessed; review these first.
  • Missing — field not found; left blank.
  • Ambiguous — conflicting signals in the document.

Building and testing

  • Create a project in Claude, ChatGPT, or Gemini.
  • Paste the system prompt, customised for your situation.
  • Upload two to three completed examples to the knowledge file section as reference.
  • Test across five to eight diverse client inputs to confirm consistent extraction.
  • Adjust the prompt based on failures before moving to any automation layer.

Recreating branded output documents

Once extraction is validated, you can have the AI also produce a formatted output document:

  • Feed your branded template to a high-end model (Claude preferred for its auto-skill creation).
  • Prompt it to reverse-engineer the aesthetics — fonts, spacing, colours — for pixel-accurate recreation.
  • Ask it to create a reusable skill from that understanding.
  • Embed the skill in the existing system prompt so the full flow produces both an audit table and a completed, branded document.

When to move from browser to desktop agent

Stay in the browser while testing. Consider a desktop agent (Claude Code, Claude Cowork, Codex) when:

  • Processing more than eight to ten files per day — browser uploads cap at that range.
  • You need automatic error logging — instruct the agent to write failures to an error file rather than requiring manual tracking.
  • You want the AI to connect into other systems rather than producing files.

Self-improving prompt loop

  1. AI extracts data and logs any errors to a dedicated file.
  2. On a recurring basis (weekly or monthly), prompt the agent to review the error log for recurring themes.
  3. Ask it to make a minimal, surgical update to the system prompt to fix the pattern.
  4. Verify the fix holds across new inputs, then clear the resolved errors from the log.

Keep updates minimal — an over-ambitious rewrite bloats the prompt and degrades performance.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.