Agent Evaluation Patterns
Battle-tested approaches for building, evaluating, and governing AI agent systems. Each pattern captures hard-won knowledge from production deployments.
34Patterns
5Categories
3Quick Wins
Browse by Category
Orchestration
Coordinate multi-agent workflows11 patternsEvaluation
Assess agent outputs & quality5 patternsSafety
Guardrails and protection3 patternsCoordination
Agent collaboration patterns10 patternsDiscovery
Capability & service discovery5 patternsPattern Stacks
Curated combinations that work well togetherSafety Stack
Essential guardrails for production agentsEvaluation Stack
Comprehensive quality assessmentGetting Started
Foundational patterns for new projectsLearning Paths
Featured Patterns
Ready to put these patterns into practice?
Evaluate your agents using these patterns. Build reputation through real evaluation.
Pattern Comparison Matrix
Pattern Comparison Matrix
| Pattern | Safety | Accuracy | Cost | Latency |
|---|---|---|---|---|
| Red Teaming Pattern Discovering vulnerabilities, edge cases, and failure modes before production deployment | 5 | 3 | 2 | 1 |
| Capability Attestation Pattern Verifying agent capabilities with proofs rather than trusting self-reported claims | 5 | 5 | 2 | 2 |
| Agent Service Mesh Pattern Infrastructure-level agent discovery, routing, and observability | 5 | 4 | 2 | 3 |
| Semantic Capability Matching Pattern Finding agents by natural language description rather than exact capability tags | 3 | 4 | 3 | 3 |
| Agent Registry Pattern Centralized or federated discovery of available agents and their capabilities | 3 | 4 | 4 | 3 |
| Emergence-Aware Monitoring Pattern Detecting and adapting to emergent behaviors in multi-agent systems | 5 | 4 | 3 | 4 |
| Byzantine-Resilient Consensus Pattern Fault-tolerant agreement in adversarial or unreliable environments | 5 | 5 | 1 | 1 |
| Model Context Protocol (MCP) Pattern Standardized tool and context exchange between agents | 4 | 4 | 3 | 3 |
| Dynamic Task Routing Pattern Intelligent task distribution based on real-time agent capabilities | 3 | 4 | 4 | 3 |
| Sub-Agent Delegation Pattern Complex tasks requiring context isolation and recursive decomposition | 3 | 4 | 3 | 3 |
| Market-Based Coordination Pattern Decentralized task allocation using auction and trading mechanisms | 3 | 4 | 4 | 3 |
| Consensus-Based Decision Pattern Multi-agent collective decision-making with deliberation or voting | 4 | 5 | 2 | 2 |
| A2A Protocol Pattern Cross-vendor agent interoperability and standardized communication | 4 | 4 | 3 | 3 |
| Guardrails Pattern Production agents requiring content safety and policy compliance | 5 | 3 | 3 | 3 |
| Reflection Pattern Improving output quality through iterative self-critique | 3 | 5 | 2 | 2 |
| ReAct Pattern (Reason + Act) Adaptive, tool-using agents that need to respond to dynamic situations | 3 | 4 | 3 | 2 |
| Defense in Depth Pattern Production agent systems handling untrusted inputs with tool access | 5 | 4 | 2 | 2 |
| Human-in-the-Loop Pattern High-stakes decisions requiring human oversight and approval | 5 | 5 | 2 | 1 |
| LLM-as-Judge Pattern Scalable quality assessment of agent outputs without human reviewers | 3 | 4 | 4 | 4 |
| Blackboard Pattern Asynchronous multi-agent collaboration on complex problems | 3 | 4 | 3 | 3 |
| Supervisor Pattern Multi-agent workflows requiring clear coordination and audit trails | 4 | 4 | 3 | 2 |
Red Teaming Pattern
Discovering vulnerabilities, edge cases, and failure modes before production deployment
Safety
5
Accuracy
3
Cost
2
Latency
1
Capability Attestation Pattern
Verifying agent capabilities with proofs rather than trusting self-reported claims
Safety
5
Accuracy
5
Cost
2
Latency
2
Agent Service Mesh Pattern
Infrastructure-level agent discovery, routing, and observability
Safety
5
Accuracy
4
Cost
2
Latency
3
Semantic Capability Matching Pattern
Finding agents by natural language description rather than exact capability tags
Safety
3
Accuracy
4
Cost
3
Latency
3
Agent Registry Pattern
Centralized or federated discovery of available agents and their capabilities
Safety
3
Accuracy
4
Cost
4
Latency
3
Emergence-Aware Monitoring Pattern
Detecting and adapting to emergent behaviors in multi-agent systems
Safety
5
Accuracy
4
Cost
3
Latency
4
Byzantine-Resilient Consensus Pattern
Fault-tolerant agreement in adversarial or unreliable environments
Safety
5
Accuracy
5
Cost
1
Latency
1
Model Context Protocol (MCP) Pattern
Standardized tool and context exchange between agents
Safety
4
Accuracy
4
Cost
3
Latency
3
Dynamic Task Routing Pattern
Intelligent task distribution based on real-time agent capabilities
Safety
3
Accuracy
4
Cost
4
Latency
3
Sub-Agent Delegation Pattern
Complex tasks requiring context isolation and recursive decomposition
Safety
3
Accuracy
4
Cost
3
Latency
3
Market-Based Coordination Pattern
Decentralized task allocation using auction and trading mechanisms
Safety
3
Accuracy
4
Cost
4
Latency
3
Consensus-Based Decision Pattern
Multi-agent collective decision-making with deliberation or voting
Safety
4
Accuracy
5
Cost
2
Latency
2
A2A Protocol Pattern
Cross-vendor agent interoperability and standardized communication
Safety
4
Accuracy
4
Cost
3
Latency
3
Guardrails Pattern
Production agents requiring content safety and policy compliance
Safety
5
Accuracy
3
Cost
3
Latency
3
Reflection Pattern
Improving output quality through iterative self-critique
Safety
3
Accuracy
5
Cost
2
Latency
2
ReAct Pattern (Reason + Act)
Adaptive, tool-using agents that need to respond to dynamic situations
Safety
3
Accuracy
4
Cost
3
Latency
2
Defense in Depth Pattern
Production agent systems handling untrusted inputs with tool access
Safety
5
Accuracy
4
Cost
2
Latency
2
Human-in-the-Loop Pattern
High-stakes decisions requiring human oversight and approval
Safety
5
Accuracy
5
Cost
2
Latency
1
LLM-as-Judge Pattern
Scalable quality assessment of agent outputs without human reviewers
Safety
3
Accuracy
4
Cost
4
Latency
4
Blackboard Pattern
Asynchronous multi-agent collaboration on complex problems
Safety
3
Accuracy
4
Cost
3
Latency
3
Supervisor Pattern
Multi-agent workflows requiring clear coordination and audit trails
Safety
4
Accuracy
4
Cost
3
Latency
2
All Patterns34
Know a pattern that should be here?
Contribute a Pattern