Security engineering for AI agents: prompt injection, data boundaries, least privilege, secrets, auditing, and retention.

Security engineering

Overview

Security in agent systems includes access control for retrieval and tools, defense against prompt injection, strict handling of secrets, and auditable operations. The goal is to prevent data leakage and unsafe side effects by default.

Key topics

  • Threat modeling for agent workflows and tool surfaces.
  • Prompt injection defenses for documents, web content, and user input.
  • Least privilege design for tools and knowledge access.
  • Secret management patterns (never expose secrets to the model).
  • Retention policies and logging redaction.

Common pitfalls

  • Allowing documents to instruct the system (policy bypass).
  • Over-privileged tools available in general workflows.
  • Logging sensitive payloads without redaction.
  • No audit trail for writes or data access.

Recommended practices

  • Implement ABAC/RBAC at retrieval and tool execution time.
  • Use allowlists and explicit approvals for risky actions.
  • Maintain immutable audit logs and run traces.
  • Regularly run red-team evals and fix findings systematically.

This page is intended to be actionable for engineering teams. For platform-specific details, cross-reference /platform/agents, /platform/orchestration, and /platform/knowledge.