safety

Mutual Verification Pattern

Overview

The Challenge

In multi-agent systems, agents may propagate hallucinations or errors, creating false consensus through mutual reinforcement.

The Solution

Implement cross-agent verification where agents independently evaluate each other's outputs before accepting them as valid.

New to agent evaluation?
Start Learning

Deep Dive

Overview

When multiple agents collaborate, there's a risk of cascading errors: Agent A hallucinates, Agent B accepts it as fact, and both reinforce the mistake. The Mutual Verification Pattern breaks this cycle through independent cross-checking.

The Problem: Mutual Hallucination Reinforcement

Agent A: "The capital of Australia is Sydney."
Agent B: "I agree with Agent A's reasoning."
Agent A: "Agent B confirms my answer."
Result: Both agents confidently wrong.

Without independent verification, agents may:

  • Copy each other's reasoning to save compute
  • Defer to apparently confident assertions
  • Create echo chambers of false consensus

Verification Mechanisms

Cross-Validation Architecture

        ┌─────────────┐
        │ Coordinator │
        └──────┬──────┘
               │
     ┌─────────┼─────────┐
     ▼         ▼         ▼
┌────────┐ ┌────────┐ ┌────────┐
│Agent A │ │Agent B │ │Agent C │
│Research│ │Verify A│ │Verify B│
└────────┘ └────────┘ └────────┘

Independent Grounding

Each verifying agent must:

  • Access original sources independently
  • Not see the original agent's reasoning
  • Apply different verification methods
async def verify_claim(claim, verifying_agent):
    # Agent verifies WITHOUT seeing original reasoning
    result = await verifying_agent.verify(
        claim=claim.statement,
        sources=claim.sources,  # Original sources, not summary
        show_original_reasoning=False
    )
    return result

Confidence Scoring

Combine independent assessments:

def aggregate_verification(results):
    if all(r.confident and r.agrees for r in results):
        return {"status": "verified", "confidence": "high"}
    elif any(r.disagrees for r in results):
        return {"status": "disputed", "details": get_disputes(results)}
    else:
        return {"status": "uncertain", "needs_human_review": True}

Diversity Requirements

Model Diversity

Use different model families for generation and verification:

  • Generate with GPT-4
  • Verify with Claude
  • Cross-check with Gemini

This prevents correlated errors from shared training.

Method Diversity

Apply different verification approaches:

  • Fact-checking against sources
  • Logical consistency analysis
  • Domain expert evaluation (specialized agents)

Anti-Patterns

Shallow Agreement

Agent B: "I agree with Agent A."  ❌

Require substantive verification, not mere agreement.

Shared Context

# Wrong: Sharing the original reasoning
verify(claim, original_reasoning=agent_a.reasoning)  ❌

Verifiers must work independently.

Homogeneous Verifiers

Using the same model for all verification creates correlated failures.

Implementation Tips

  • Set verification thresholds based on stake level
  • Log disagreements for analysis and improvement
  • Implement escalation paths when verification fails
  • Consider compute/latency tradeoffs—not everything needs full verification
Ready to implement?
Get RepKit
Considerations

Verification adds latency and cost. Reserve full mutual verification for high-stakes decisions.