Agent Evaluation Patterns

Battle-tested approaches for building, evaluating, and governing AI agent systems. Each pattern captures hard-won knowledge from production deployments.

34Patterns
5Categories
3Quick Wins

Browse by Category

Orchestration
Coordinate multi-agent workflows11 patterns
Evaluation
Assess agent outputs & quality5 patterns
Safety
Guardrails and protection3 patterns
Coordination
Agent collaboration patterns10 patterns
Discovery
Capability & service discovery5 patterns

Pattern Stacks

Curated combinations that work well together
Safety Stack
Essential guardrails for production agents
Evaluation Stack
Comprehensive quality assessment
Getting Started
Foundational patterns for new projects

Learning Paths

Featured Patterns

Ready to put these patterns into practice?

Evaluate your agents using these patterns. Build reputation through real evaluation.

Pattern Comparison Matrix

Pattern Comparison Matrix
PatternSafetyAccuracyCostLatency
Red Teaming Pattern
Discovering vulnerabilities, edge cases, and failure modes before production deployment
5321
Capability Attestation Pattern
Verifying agent capabilities with proofs rather than trusting self-reported claims
5522
Agent Service Mesh Pattern
Infrastructure-level agent discovery, routing, and observability
5423
Semantic Capability Matching Pattern
Finding agents by natural language description rather than exact capability tags
3433
Agent Registry Pattern
Centralized or federated discovery of available agents and their capabilities
3443
Emergence-Aware Monitoring Pattern
Detecting and adapting to emergent behaviors in multi-agent systems
5434
Byzantine-Resilient Consensus Pattern
Fault-tolerant agreement in adversarial or unreliable environments
5511
Model Context Protocol (MCP) Pattern
Standardized tool and context exchange between agents
4433
Dynamic Task Routing Pattern
Intelligent task distribution based on real-time agent capabilities
3443
Sub-Agent Delegation Pattern
Complex tasks requiring context isolation and recursive decomposition
3433
Market-Based Coordination Pattern
Decentralized task allocation using auction and trading mechanisms
3443
Consensus-Based Decision Pattern
Multi-agent collective decision-making with deliberation or voting
4522
A2A Protocol Pattern
Cross-vendor agent interoperability and standardized communication
4433
Guardrails Pattern
Production agents requiring content safety and policy compliance
5333
Reflection Pattern
Improving output quality through iterative self-critique
3522
ReAct Pattern (Reason + Act)
Adaptive, tool-using agents that need to respond to dynamic situations
3432
Defense in Depth Pattern
Production agent systems handling untrusted inputs with tool access
5422
Human-in-the-Loop Pattern
High-stakes decisions requiring human oversight and approval
5521
LLM-as-Judge Pattern
Scalable quality assessment of agent outputs without human reviewers
3444
Blackboard Pattern
Asynchronous multi-agent collaboration on complex problems
3433
Supervisor Pattern
Multi-agent workflows requiring clear coordination and audit trails
4432
Red Teaming Pattern
Discovering vulnerabilities, edge cases, and failure modes before production deployment
Safety
5
Accuracy
3
Cost
2
Latency
1
Capability Attestation Pattern
Verifying agent capabilities with proofs rather than trusting self-reported claims
Safety
5
Accuracy
5
Cost
2
Latency
2
Agent Service Mesh Pattern
Infrastructure-level agent discovery, routing, and observability
Safety
5
Accuracy
4
Cost
2
Latency
3
Semantic Capability Matching Pattern
Finding agents by natural language description rather than exact capability tags
Safety
3
Accuracy
4
Cost
3
Latency
3
Agent Registry Pattern
Centralized or federated discovery of available agents and their capabilities
Safety
3
Accuracy
4
Cost
4
Latency
3
Emergence-Aware Monitoring Pattern
Detecting and adapting to emergent behaviors in multi-agent systems
Safety
5
Accuracy
4
Cost
3
Latency
4
Byzantine-Resilient Consensus Pattern
Fault-tolerant agreement in adversarial or unreliable environments
Safety
5
Accuracy
5
Cost
1
Latency
1
Model Context Protocol (MCP) Pattern
Standardized tool and context exchange between agents
Safety
4
Accuracy
4
Cost
3
Latency
3
Dynamic Task Routing Pattern
Intelligent task distribution based on real-time agent capabilities
Safety
3
Accuracy
4
Cost
4
Latency
3
Sub-Agent Delegation Pattern
Complex tasks requiring context isolation and recursive decomposition
Safety
3
Accuracy
4
Cost
3
Latency
3
Market-Based Coordination Pattern
Decentralized task allocation using auction and trading mechanisms
Safety
3
Accuracy
4
Cost
4
Latency
3
Consensus-Based Decision Pattern
Multi-agent collective decision-making with deliberation or voting
Safety
4
Accuracy
5
Cost
2
Latency
2
A2A Protocol Pattern
Cross-vendor agent interoperability and standardized communication
Safety
4
Accuracy
4
Cost
3
Latency
3
Guardrails Pattern
Production agents requiring content safety and policy compliance
Safety
5
Accuracy
3
Cost
3
Latency
3
Reflection Pattern
Improving output quality through iterative self-critique
Safety
3
Accuracy
5
Cost
2
Latency
2
ReAct Pattern (Reason + Act)
Adaptive, tool-using agents that need to respond to dynamic situations
Safety
3
Accuracy
4
Cost
3
Latency
2
Defense in Depth Pattern
Production agent systems handling untrusted inputs with tool access
Safety
5
Accuracy
4
Cost
2
Latency
2
Human-in-the-Loop Pattern
High-stakes decisions requiring human oversight and approval
Safety
5
Accuracy
5
Cost
2
Latency
1
LLM-as-Judge Pattern
Scalable quality assessment of agent outputs without human reviewers
Safety
3
Accuracy
4
Cost
4
Latency
4
Blackboard Pattern
Asynchronous multi-agent collaboration on complex problems
Safety
3
Accuracy
4
Cost
3
Latency
3
Supervisor Pattern
Multi-agent workflows requiring clear coordination and audit trails
Safety
4
Accuracy
4
Cost
3
Latency
2

All Patterns34

Know a pattern that should be here?

Contribute a Pattern