Agent Evaluation Patterns

Browse by Category

Pattern Stacks

Curated combinations that work well together

Safety Stack

Essential guardrails for production agents

1Defense in Depth Pattern 2Guardrails Pattern 3Human-in-the-Loop Pattern

Evaluation Stack

Comprehensive quality assessment

1LLM-as-Judge Pattern 2Reflection Pattern 3Red Teaming Pattern

Getting Started

Foundational patterns for new projects

1ReAct Pattern (Reason + Act)2Tool Use Pattern 3Supervisor Pattern

Learning Paths

Featured Patterns

Pattern Comparison Matrix

Pattern	Safety	Accuracy	Cost	Latency
Red Teaming Pattern Discovering vulnerabilities, edge cases, and failure modes before production deployment	5	3	2	1
Capability Attestation Pattern Verifying agent capabilities with proofs rather than trusting self-reported claims	5	5	2	2
Agent Service Mesh Pattern Infrastructure-level agent discovery, routing, and observability	5	4	2	3
Semantic Capability Matching Pattern Finding agents by natural language description rather than exact capability tags	3	4	3	3
Agent Registry Pattern Centralized or federated discovery of available agents and their capabilities	3	4	4	3
Emergence-Aware Monitoring Pattern Detecting and adapting to emergent behaviors in multi-agent systems	5	4	3	4
Byzantine-Resilient Consensus Pattern Fault-tolerant agreement in adversarial or unreliable environments	5	5	1	1
Model Context Protocol (MCP) Pattern Standardized tool and context exchange between agents	4	4	3	3
Dynamic Task Routing Pattern Intelligent task distribution based on real-time agent capabilities	3	4	4	3
Sub-Agent Delegation Pattern Complex tasks requiring context isolation and recursive decomposition	3	4	3	3
Market-Based Coordination Pattern Decentralized task allocation using auction and trading mechanisms	3	4	4	3
Consensus-Based Decision Pattern Multi-agent collective decision-making with deliberation or voting	4	5	2	2
A2A Protocol Pattern Cross-vendor agent interoperability and standardized communication	4	4	3	3
Guardrails Pattern Production agents requiring content safety and policy compliance	5	3	3	3
Reflection Pattern Improving output quality through iterative self-critique	3	5	2	2
ReAct Pattern (Reason + Act) Adaptive, tool-using agents that need to respond to dynamic situations	3	4	3	2
Defense in Depth Pattern Production agent systems handling untrusted inputs with tool access	5	4	2	2
Human-in-the-Loop Pattern High-stakes decisions requiring human oversight and approval	5	5	2	1
LLM-as-Judge Pattern Scalable quality assessment of agent outputs without human reviewers	3	4	4	4
Blackboard Pattern Asynchronous multi-agent collaboration on complex problems	3	4	3	3
Supervisor Pattern Multi-agent workflows requiring clear coordination and audit trails	4	4	3	2

Red Teaming Pattern

Discovering vulnerabilities, edge cases, and failure modes before production deployment

Safety

5

Accuracy

3

Cost

2

Latency

1

Capability Attestation Pattern

Verifying agent capabilities with proofs rather than trusting self-reported claims

Safety

5

Accuracy

5

Cost

2

Latency

2

Agent Service Mesh Pattern

Infrastructure-level agent discovery, routing, and observability

Safety

5

Accuracy

4

Cost

2

Latency

3

Semantic Capability Matching Pattern

Finding agents by natural language description rather than exact capability tags

Safety

3

Accuracy

4

Cost

3

Latency

3

Agent Registry Pattern

Centralized or federated discovery of available agents and their capabilities

Safety

3

Accuracy

4

Cost

4

Latency

3

Emergence-Aware Monitoring Pattern

Detecting and adapting to emergent behaviors in multi-agent systems

Safety

5

Accuracy

4

Cost

3

Latency

4

Byzantine-Resilient Consensus Pattern

Fault-tolerant agreement in adversarial or unreliable environments

Safety

5

Accuracy

5

Cost

1

Latency

1

Model Context Protocol (MCP) Pattern

Standardized tool and context exchange between agents

Safety

4

Accuracy

4

Cost

3

Latency

3

Dynamic Task Routing Pattern

Intelligent task distribution based on real-time agent capabilities

Safety

3

Accuracy

4

Cost

4

Latency

3

Sub-Agent Delegation Pattern

Complex tasks requiring context isolation and recursive decomposition

Safety

3

Accuracy

4

Cost

3

Latency

3

Market-Based Coordination Pattern

Decentralized task allocation using auction and trading mechanisms

Safety

3

Accuracy

4

Cost

4

Latency

3

Consensus-Based Decision Pattern

Multi-agent collective decision-making with deliberation or voting

Safety

4

Accuracy

5

Cost

2

Latency

2

A2A Protocol Pattern

Cross-vendor agent interoperability and standardized communication

Safety

4

Accuracy

4

Cost

3

Latency

3

Guardrails Pattern

Production agents requiring content safety and policy compliance

Safety

5

Accuracy

3

Cost

3

Latency

3

Reflection Pattern

Improving output quality through iterative self-critique

Safety

3

Accuracy

5

Cost

2

Latency

2

ReAct Pattern (Reason + Act)

Adaptive, tool-using agents that need to respond to dynamic situations

Safety

3

Accuracy

4

Cost

3

Latency

2

Defense in Depth Pattern

Production agent systems handling untrusted inputs with tool access

Safety

5

Accuracy

4

Cost

2

Latency

2

Human-in-the-Loop Pattern

High-stakes decisions requiring human oversight and approval

Safety

5

Accuracy

5

Cost

2

Latency

1

LLM-as-Judge Pattern

Scalable quality assessment of agent outputs without human reviewers

Safety

3

Accuracy

4

Cost

4

Latency

4

Blackboard Pattern

Asynchronous multi-agent collaboration on complex problems

Safety

3

Accuracy

4

Cost

3

Latency

3

Supervisor Pattern

Multi-agent workflows requiring clear coordination and audit trails

Safety

4

Accuracy

4

Cost

3

Latency

2

Scores range from 0 (lowest) to 5 (highest). Hover over cells for details.

Agent Evaluation Patterns

Browse by Category

Orchestration

Evaluation

Safety

Coordination

Discovery

Pattern Stacks

Safety Stack

Evaluation Stack

Getting Started

Learning Paths

Beginner

Intermediate

Advanced

Featured Patterns

Ready to put these patterns into practice?

Pattern Comparison Matrix

All Patterns34