Overview
The Guardrails pattern adds safety layers around agent execution. Input guardrails validate and sanitize incoming requests; output guardrails check agent responses before they're delivered or executed. This creates a protective envelope around agent behavior.
Architecture
User Input → [Input Guardrails] → Agent → [Output Guardrails] → Response
│ │
▼ ▼
Block/Modify Block/Modify
Input Guardrails
Content Filtering
def input_guardrail(user_input):
# Check for prompt injection patterns
if contains_injection_pattern(user_input):
return blocked("Potential prompt injection detected")
# Check for prohibited topics
if matches_prohibited_topic(user_input):
return blocked("Topic not allowed")
# Check for PII
if contains_pii(user_input):
return sanitize_pii(user_input)
return allowed(user_input)
Rate Limiting
Prevent abuse through excessive requests.
Authentication
Verify user identity and permissions.
Output Guardrails
Response Validation
def output_guardrail(agent_response):
# Check for hallucinated facts
if confidence_too_low(agent_response):
return add_uncertainty_disclaimer(agent_response)
# Check for harmful content
if contains_harmful_content(agent_response):
return blocked("Response contains harmful content")
# Check for data leakage
if contains_sensitive_data(agent_response):
return redact_sensitive_data(agent_response)
return allowed(agent_response)
Action Validation
For tool-using agents:
def action_guardrail(proposed_action):
# Check against allowlist
if action.tool not in ALLOWED_TOOLS:
return blocked("Tool not permitted")
# Check parameters
if not validate_parameters(action):
return blocked("Invalid parameters")
# Check for dangerous operations
if is_destructive_action(action):
return require_human_approval(action)
return allowed(action)
OpenAI Agents SDK Guardrails
The OpenAI Agents SDK includes guardrails as a core primitive:
from openai_agents import Agent, InputGuardrail, OutputGuardrail
agent = Agent(
name="assistant",
input_guardrails=[
InputGuardrail(check_injection),
InputGuardrail(check_topic_policy)
],
output_guardrails=[
OutputGuardrail(check_harmful_content),
OutputGuardrail(check_hallucination)
]
)
Guardrail Categories
Safety Guardrails
- Harmful content detection
- Violence/self-harm prevention
- Illegal activity blocking
Compliance Guardrails
- PII/PHI protection (HIPAA, GDPR)
- Financial advice disclaimers
- Industry-specific regulations
Quality Guardrails
- Hallucination detection
- Factual accuracy checks
- Consistency validation
Security Guardrails
- Prompt injection detection
- Data exfiltration prevention
- Access control enforcement
Implementation Tips
- Run guardrails in parallel where possible
- Log all guardrail activations for analysis
- Tune sensitivity to balance safety vs. usability
- Update guardrails based on new attack patterns