Overview
Explanation degradation occurs when multi-agent systems lose the ability to provide coherent explanations for their decisions. As information passes through multiple agents, reasoning chains break down, context is lost, and the final output becomes unexplainable.
The Explanation Problem
Single Agent: Traceable
Input → Reasoning → Output
"I recommended X because:
1. Data showed Y
2. Policy Z applies
3. Calculation: Y + Z = X"
Clear, auditable explanation
Multi-Agent: Degraded
Input → Agent A → Agent B → Agent C → Output
"Why X?"
Agent C: "Agent B told me to"
Agent B: "I processed Agent A's output"
Agent A: "Based on my analysis..."
But what was the actual reasoning chain?
Degradation Patterns
Reasoning Chain Breaks
Agent A: "High risk because metric > threshold"
Agent B: Receives "high risk" (loses metric details)
Agent C: Receives "concern flagged" (loses severity)
Agent D: Outputs "rejected" (no context why)
Final explanation: "Rejected due to concerns"
Actual reason: Lost in translation
Context Compression
Original: "Customer has excellent 10-year history
but recent missed payment due to
documented medical emergency"
After 3 agents: "Customer has payment issues"
Nuance lost, explanation misleading
Circular Explanations
Q: "Why was loan denied?"
A: "Risk assessment was negative"
Q: "Why was risk assessment negative?"
A: "Multiple factors indicated high risk"
Q: "What factors?"
A: "The factors that led to denial"
[Circular, no actual explanation]
Black Box Composition
Agent A: Explainable model
Agent B: Explainable model
Agent C: Explainable model
A + B + C: Emergent black box
Individual explanations don't compose
into system-level explanation
Regulatory Requirements
GDPR Article 22
"Meaningful information about the logic involved"
- Multi-agent: Which agent's logic?
- Emergent decisions: What logic?
Fair Lending Laws
"Specific reasons for adverse action"
- Must cite actual factors
- "AI determined" not acceptable
Healthcare Regulations
"Clinical decision support must be explainable"
- Physicians must understand AI reasoning
- Cannot rely on unexplainable recommendations
Financial Services
"Model risk management requires explanation"
- Regulators audit decision logic
- Can't audit what can't be explained
Explanation Debt
Explanation quality over agent chain:
Quality
│
│ ████
│ ████ ███
│ ████ ███ ██
│ ████ ███ ██ █
└──────────────────
A B C D
Each handoff loses explanation fidelity
Solutions
Explanation Preservation Protocol
class ExplainableMessage:
def __init__(self, content, explanation):
self.content = content
self.explanation = ExplanationChain()
def add_reasoning_step(self, agent_id, reasoning, evidence):
self.explanation.add_step({
"agent": agent_id,
"reasoning": reasoning,
"evidence": evidence,
"timestamp": now()
})
def get_full_explanation(self):
return self.explanation.compose_narrative()
Explanation Checkpoints
After every N agents, generate explanation summary:
Checkpoint 1 (After Agent B):
"Decision trending toward X because [A's reasoning] + [B's refinement]"
Checkpoint 2 (After Agent D):
"Final decision X because [Summary 1] + [C's analysis] + [D's validation]"
Counterfactual Preservation
Track what would change the decision:
"Denied because income < $50K
Would approve if income >= $50K"
Even if reasoning chain lost, counterfactual explains