Guardrails are defensive mechanisms that bound agent behavior within acceptable limits. They act as safety nets independent of the agent's decision-making.
Types
- Input guardrails: Filter or reject harmful prompts
- Output guardrails: Block or modify unsafe responses
- Action guardrails: Prevent unauthorized tool use
Implementation
Can be rule-based (filters, blocklists) or model-based (classifier models for detection).