A model layer that supports routing, embeddings, specialization, safety configurations, and cost/latency controls for production workloads.

Model layer

Overview

The model layer abstracts LLM selection and configuration so applications can evolve without rewriting business logic. It supports routing between models, embeddings, and specialized components such as rerankers.

Model choices are governed by policies: which models are allowed for which workloads, what data can be sent, and what latency/cost budgets apply.

Routing and specialization

Different tasks benefit from different models. The model layer can route classification, extraction, summarization, and free-form generation to appropriate engines while maintaining consistent observability and evaluation.

Cost and performance controls

Streaming, caching, batching, and token budgeting are standard controls. These mechanisms reduce cost and help meet latency expectations in interactive channels.