5 RAG Pipeline Mistakes We Made and How to Avoid Them

Mar 14, 20268 min readZorithm Technologies

Building a RAG system sounds straightforward until production.

Retrieval-Augmented Generation sounds simple in demos: embed your documents, retrieve relevant chunks, pass them to an LLM, get an answer. In production, it's considerably more nuanced.

Mistake 1: Fixed-Size Chunking

Our first RAG pipeline split documents into fixed 512-token chunks. The results were mediocre because important context was routinely split across chunk boundaries. We switched to semantic chunking — splitting on paragraph boundaries and sentence endings, with overlap between chunks.

Mistake 2: Using the Wrong Embedding Model

We started with text-embedding-ada-002 because it was the obvious default. For our domain, it underperformed. After benchmarking against our specific test set, we switched to a domain-fine-tuned model and saw a 23% improvement in retrieval precision.

Mistake 3: No Reranking

Embedding similarity is a coarse retrieval signal. Adding a cross-encoder reranker as a second retrieval pass significantly improved answer relevance.

Mistake 4: Ignoring Context Window Management

As document volume grew, we hit context window limits. The fix was a budget-aware context assembly function that prioritises the highest-scoring chunks and truncates gracefully when the limit is approached.

Mistake 5: No Evaluation Framework

We shipped without a systematic way to evaluate pipeline quality. We built a small test set of 50 question-answer pairs and ran automated evaluation after every pipeline change.

Back to Blog Work With Us

AI10 minutes