safety

Mutual Verification Pattern

Overview

The Challenge

In multi-agent systems, agents may propagate hallucinations or errors, creating false consensus through mutual reinforcement.

The Solution

Implement cross-agent verification where agents independently evaluate each other's outputs before accepting them as valid.

New to agent evaluation?

Start Learning

Deep Dive

Overview

When multiple agents collaborate, there's a risk of cascading errors: Agent A hallucinates, Agent B accepts it as fact, and both reinforce the mistake. The Mutual Verification Pattern breaks this cycle through independent cross-checking.

The Problem: Mutual Hallucination Reinforcement

Agent A: "The capital of Australia is Sydney."
Agent B: "I agree with Agent A's reasoning."
Agent A: "Agent B confirms my answer."
Result: Both agents confidently wrong.

Without independent verification, agents may:

Copy each other's reasoning to save compute
Defer to apparently confident assertions
Create echo chambers of false consensus

Verification Mechanisms

Cross-Validation Architecture

        ┌─────────────┐
        │ Coordinator │
        └──────┬──────┘
               │
     ┌─────────┼─────────┐
     ▼         ▼         ▼
┌────────┐ ┌────────┐ ┌────────┐
│Agent A │ │Agent B │ │Agent C │
│Research│ │Verify A│ │Verify B│
└────────┘ └────────┘ └────────┘

Independent Grounding

Each verifying agent must:

Access original sources independently
Not see the original agent's reasoning
Apply different verification methods

async def verify_claim(claim, verifying_agent):
    # Agent verifies WITHOUT seeing original reasoning
    result = await verifying_agent.verify(
        claim=claim.statement,
        sources=claim.sources,  # Original sources, not summary
        show_original_reasoning=False
    )
    return result

Confidence Scoring

Combine independent assessments:

def aggregate_verification(results):
    if all(r.confident and r.agrees for r in results):
        return {"status": "verified", "confidence": "high"}
    elif any(r.disagrees for r in results):
        return {"status": "disputed", "details": get_disputes(results)}
    else:
        return {"status": "uncertain", "needs_human_review": True}

Diversity Requirements

Model Diversity

Use different model families for generation and verification:

Generate with GPT-4
Verify with Claude
Cross-check with Gemini

This prevents correlated errors from shared training.

Method Diversity

Apply different verification approaches:

Fact-checking against sources
Logical consistency analysis
Domain expert evaluation (specialized agents)

Anti-Patterns

Shallow Agreement

Agent B: "I agree with Agent A."  ❌

Require substantive verification, not mere agreement.

Shared Context

# Wrong: Sharing the original reasoning
verify(claim, original_reasoning=agent_a.reasoning)  ❌

Verifiers must work independently.

Homogeneous Verifiers

Using the same model for all verification creates correlated failures.

Implementation Tips

Set verification thresholds based on stake level
Log disagreements for analysis and improvement
Implement escalation paths when verification fails
Consider compute/latency tradeoffs—not everything needs full verification

Ready to implement?

Get RepKit

Considerations

Verification adds latency and cost. Reserve full mutual verification for high-stakes decisions.

PreviousGuardrails Pattern

NextDefense in Depth Pattern

Mutual Verification Pattern

Overview

The Challenge

The Solution

Deep Dive

Overview

The Problem: Mutual Hallucination Reinforcement

Verification Mechanisms

Cross-Validation Architecture

Independent Grounding

Confidence Scoring

Diversity Requirements

Model Diversity

Method Diversity

Anti-Patterns

Shallow Agreement

Shared Context

Homogeneous Verifiers

Implementation Tips

Considerations

Tags