Stop Blaming the Wrong Source: How to Check That AI Claims Come From the Right Tool

At a Glance

Source-aware verification can spot when an AI attaches a fact to the wrong tool or record: it matches standard support detection while adding reliable claim-to-source labels that pooled checks miss.

ON THIS PAGE

What They Found

ProvenanceGuard keeps the original tool and source IDs from multi-tool agent traces, breaks answers into atomic claims, routes each claim to the specific evidence object, and then checks whether that evidence actually supports the claim. It reaches similar or better support-blocking performance than source-blind baselines while also reporting which tool or record a supported claim came from. The system trades some precision for very high recall—useful in sensitive, review-heavy settings where falsely allowing a wrongly attributed claim is costly. Targeted tests that intentionally swap source attributions were all detected, showing the method finds explicit provenance mistakes. Tool Use Pattern

Key Data

1Held-out binary blocking: 0.802 F1 (precision 0.673, recall 0.993) on the 40-trace split.

20.858 exact source accuracy over 260 source-eligible held-out claims.

30.681 source-plus-relation accuracy overall; 1.00 detection on 50 targeted source-conflation probes where attributions were deliberately swapped.

Implications

Engineers building agent systems that call multiple external tools or databases—because claims must be tied to the correct record or paper, not just 'supported somewhere.' Technical leaders and compliance teams in sensitive domains (healthcare, finance, legal) will want provenance-aware checks to avoid misattributed facts that can cause harm. Researchers studying agent reliability and trust can treat source attribution as a separate evaluation axis beyond standard factuality checks. Multi-Agent Customer Support

Need expert guidance?We can help implement this

Learn More

Ready to evaluate your AI agents?

Learn how ReputAgent helps teams build trustworthy AI through systematic evaluation.

Learn More

Yes, But...

Results come from a medical-agent stack and a relatively small held-out split (40 traces), so numbers have wide confidence intervals and may not generalize to other domains. Many labels were produced with LLM assistance and only a subset were human-reviewed, so benchmark noise and labeling bias remain possible. The method prioritizes high recall (fail-closed behavior) at the cost of precision; deployments with high reviewer load may need different thresholds or additional tuning. Evaluation-Driven Development (EDDOps)

Methodology & More

ProvenanceGuard treats a tool-using agent’s full trace (user question, final answer, and every tool output with its stable source ID) as the canonical evidence set and never collapses different tool calls into one anonymous context. It deterministically breaks the assistant reply into atomic claims, preserves each claim’s stated attribution, and then routes each claim to the relevant evidence object by comparing claim text to the specific tool output. Support is scored with a natural-language entailment model (a model that judges whether a piece of evidence supports a claim) plus alignment signals; the verifier then compares the routed source to the answer’s stated source to detect mismatches. Agent Service Mesh Pattern

Need expert guidance?We can help implement this

Learn More

Credibility Assessment:

ArXiv with no affiliations or notable author signals provided.

multi-agent trust agent reliability agent-to-agent evaluation agent governance

Not sure where to start?