Engineering Retrieval-Augmented Generation (RAG): chunking, hybrid search, reranking, freshness, citations, and evaluation.

RAG engineering

Overview

RAG is the mechanism that turns agent outputs from plausible to verifiable. Production RAG requires careful chunking, permission enforcement, reranking, freshness handling, and continuous evaluation.

Key topics

  • Chunking by structure and semantics, not only by token size.
  • Hybrid retrieval (lexical + vector) and reranking for precision.
  • Permission filtering before passage exposure to the model.
  • Freshness controls, cache invalidation, and live fetch patterns.
  • Citation coverage and groundedness metrics.

Common pitfalls

  • Over-chunking or under-chunking leading to missing context.
  • Ignoring metadata filters (tenant, role, recency) and returning noise.
  • Returning too many passages and overwhelming the model.
  • No evidence mapping from outputs to source sections.

Recommended practices

  • Start with a curated golden set of queries and expected sources.
  • Use rerankers and diversity selection to reduce redundancy.
  • Require citations for key claims and extracted fields.
  • Continuously monitor retrieval quality and drift.

This page is intended to be actionable for engineering teams. For platform-specific details, cross-reference /platform/agents, /platform/orchestration, and /platform/knowledge.