Practical techniques to improve RAG retrieval without complexity

Executive overview

RAG systems are easy to set up but hard to master. Most teams jump to complex solutions — fine-tuning, agent routing, multi-doc agents — when simpler techniques would give them 80% of the results.

Focusing on beginner-mode fundamentals plus re-ranking delivers most of the retrieval gains. Nightmare-mode complexity is only worth pursuing once these foundations are solid.

The 80/20 of RAG: data quality, smart chunking, metadata filtering, and re-ranking outperform almost every advanced technique.

Cleaning data before ingestion

  • Remove junk: ads, redundant headers, cover pages, irrelevant boilerplate.
  • Fix or remove poorly formatted content — weird layouts confuse retrieval.
  • Strip broken text: garbled ASCII, non-language symbols, encoding artifacts.
  • Goal: only meaningful, clean text enters the vector database.

Choosing and using embedders

  • Use recent, widely adopted embedding models — quality improves with model recency.
  • OpenAI's text-embedding-3-small is a strong default: cheap and near-large-model performance.
  • For specialist domains (e.g. dermatology), test embedders on your specific vocabulary.
  • Good embedders capture semantic neighbours — "money back", "return", "refund" cluster together; unrelated words do not.

Chunking strategy

  • Cut at meaning boundaries — sentence ends, paragraph ends, section ends — not at arbitrary character counts.
  • Overlap chunks by 20–30% so the LLM can see the connection between adjacent pieces.
  • Adjust chunk size to the content: a short paragraph warrants a smaller chunk than a long section; don't force a static size.
  • Never cut mid-sentence — it destroys the LLM's ability to interpret the chunk.

Metadata and filtering

  • Attach structured tags to each chunk before vectorising: date, topic, entity names, location, document section.
  • Add a one-sentence LLM-generated summary as a metadata field on each chunk.
  • At query time, combine vector similarity with metadata filters — this dramatically narrows the result set.
  • Example: "Tesla earnings" query extracts topic: earnings, company: Tesla, period: Q3 as filters before hitting the vector DB.

Source tracking

  • Prompt the LLM to return the source reference alongside every answer.
  • Format: inline citation (e.g. [Section 2.1 — Warranty Policy]) so the user can verify.
  • Prevents hallucinated answers from going unchecked.

Query rewriting before retrieval

  • Insert an intermediate LLM that rewrites the user's raw query before it hits the vector database.
  • Vague queries ("tell me about Tesla earnings") become specific ones with added context, time period, and relevant keywords.
  • Improves retrieval precision without changing the user-facing interface.

Re-ranking results

  • A typical retrieval returns 10–50 chunks; not all are equally relevant.
  • Score each chunk using a combination of vector similarity and metadata filter matches.
  • Re-rank by combined score; pass only the top 5 (or similar cutoff) to the final LLM.
  • Avoids returning the first result blindly — surfaces the most contextually accurate chunks instead.

Adventure and nightmare mode (brief reference)

Adventure mode techniques worth considering when fundamentals are solid:

  • Recursive retrieval: iteratively fetch more context based on each prior retrieval step.
  • Embedded tables: preserve relationships between cells, not just raw text.
  • Small-to-big: start with a small seed chunk, expand outward until context is sufficient.

Nightmare mode (high cost and complexity — avoid unless necessary):

  • LLM fine-tuning for domain-specific tasks.
  • Embedding fine-tuning for specialist terminology.
  • Agent routing across multiple vector databases.
  • Query planning: decompose complex queries into sub-queries, search each, aggregate.
  • Multi-doc agents: parallel agents search separate document sets and consolidate answers.

More like this — when you're ready for early access.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Get early access to the full library.

Join the waitlist for a personal account and content recommendations based on what you're working on.

No spam. Unsubscribe at any time.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.

Be among the first to get personalised recommendations tailored to your stage in business.

No spam.

You're on the list. We'll be in touch before launch.