Ingestion and normalization
Connectors pull content from common repositories (drives, wikis, ticketing systems, git, internal databases). Ingestion supports incremental sync and webhooks so changes propagate quickly.
Documents are cleaned, structured, and enriched with metadata (owner, timestamps, tags, tenant, confidentiality level). Where required, sensitive fields can be redacted before indexing.
Chunking strategy
Chunking is performed along semantic and structural boundaries (headings, sections, tables) rather than fixed token sizes alone. Controlled overlap preserves context without inflating retrieval noise.
Each chunk retains stable references to the source document and location, enabling precise citations and audit trails.
Hybrid retrieval and reranking
Knowledge retrieval combines lexical search (for exact terms, identifiers, and numbers) with vector similarity (for semantic matching). Filters enforce tenant separation and permissioning before reranking occurs.
A reranker then selects the most useful passages for the query, improving precision and reducing the risk of irrelevant context contaminating the answer.
Citations and proof
Agents receive context bundles that include the retrieved passages and their source identifiers. Outputs can include citations with document name, section anchor, and last-updated timestamps.
When knowledge coverage is incomplete or contradictory, the agent can surface uncertainty explicitly and propose next steps (e.g., request missing documents or escalate to a human reviewer).
Freshness and change control
Freshness is managed through incremental indexing, cache invalidation, and optional recency boosting. For high-stakes workflows, the system can perform live fetches from authoritative sources (subject to policy) before finalizing an answer.
Versioning of sources supports audits and reproducibility, especially when regulatory or contractual decisions depend on specific document snapshots.
Security and prompt-injection resilience
Sources are treated as untrusted by default. The system isolates instructions found in documents, prevents them from modifying system policies, and enforces that only explicit user intent can trigger side-effectful tool calls.
RBAC/ABAC is enforced at retrieval time, ensuring agents only see what the requesting principal is allowed to access.
Quality measurement
Knowledge quality is measured using golden queries and retrieval metrics (precision, recall@k), combined with answer-level metrics such as citation coverage and groundedness.
These signals feed continuous improvement of chunking, metadata, and reranking policies.