The original is one click away. Open original ↗
Practical techniques to improve RAG retrieval without complexity
Executive overview
RAG systems are easy to set up but hard to master. Most teams jump to complex solutions — fine-tuning, agent routing, multi-doc agents — when simpler techniques would give them 80% of the results.
Focusing on beginner-mode fundamentals plus re-ranking delivers most of the retrieval gains. Nightmare-mode complexity is only worth pursuing once these foundations are solid.
The 80/20 of RAG: data quality, smart chunking, metadata filtering, and re-ranking outperform almost every advanced technique.
Cleaning data before ingestion
- Remove junk: ads, redundant headers, cover pages, irrelevant boilerplate.
- Fix or remove poorly formatted content — weird layouts confuse retrieval.
- Strip broken text: garbled ASCII, non-language symbols, encoding artifacts.
- Goal: only meaningful, clean text enters the vector database.
Choosing and using embedders
- Use recent, widely adopted embedding models — quality improves with model recency.
- OpenAI's
text-embedding-3-smallis a strong default: cheap and near-large-model performance. - For specialist domains (e.g. dermatology), test embedders on your specific vocabulary.
- Good embedders capture semantic neighbours — "money back", "return", "refund" cluster together; unrelated words do not.
Chunking strategy
- Cut at meaning boundaries — sentence ends, paragraph ends, section ends — not at arbitrary character counts.
- Overlap chunks by 20–30% so the LLM can see the connection between adjacent pieces.
- Adjust chunk size to the content: a short paragraph warrants a smaller chunk than a long section; don't force a static size.
- Never cut mid-sentence — it destroys the LLM's ability to interpret the chunk.
Metadata and filtering
- Attach structured tags to each chunk before vectorising: date, topic, entity names, location, document section.
- Add a one-sentence LLM-generated summary as a metadata field on each chunk.
- At query time, combine vector similarity with metadata filters — this dramatically narrows the result set.
- Example: "Tesla earnings" query extracts
topic: earnings,company: Tesla,period: Q3as filters before hitting the vector DB.
Source tracking
- Prompt the LLM to return the source reference alongside every answer.
- Format: inline citation (e.g.
[Section 2.1 — Warranty Policy]) so the user can verify. - Prevents hallucinated answers from going unchecked.
Query rewriting before retrieval
- Insert an intermediate LLM that rewrites the user's raw query before it hits the vector database.
- Vague queries ("tell me about Tesla earnings") become specific ones with added context, time period, and relevant keywords.
- Improves retrieval precision without changing the user-facing interface.
Re-ranking results
- A typical retrieval returns 10–50 chunks; not all are equally relevant.
- Score each chunk using a combination of vector similarity and metadata filter matches.
- Re-rank by combined score; pass only the top 5 (or similar cutoff) to the final LLM.
- Avoids returning the first result blindly — surfaces the most contextually accurate chunks instead.
Adventure and nightmare mode (brief reference)
Adventure mode techniques worth considering when fundamentals are solid:
- Recursive retrieval: iteratively fetch more context based on each prior retrieval step.
- Embedded tables: preserve relationships between cells, not just raw text.
- Small-to-big: start with a small seed chunk, expand outward until context is sufficient.
Nightmare mode (high cost and complexity — avoid unless necessary):
- LLM fine-tuning for domain-specific tasks.
- Embedding fine-tuning for specialist terminology.
- Agent routing across multiple vector databases.
- Query planning: decompose complex queries into sub-queries, search each, aggregate.
- Multi-doc agents: parallel agents search separate document sets and consolidate answers.
More like this — when you're ready for early access.
Join the waitlist for a personal account and content recommendations based on what you're working on.
No spam. Unsubscribe at any time.
You're on the list. We'll be in touch before launch.