ProtocolExperimentalMCP

eval-view

Name: eval-view
Rating: 3.0 (124 reviews)
Author: hidai25

by hidai25

CI-friendly regression testing for agent behavior and tool-call diffs

Python

Updated Jul 26, 2026

124

Stars

Forks

View on GitHub

Overview

Detects regressions in agent behavior by snapshotting outputs and diffing tool calls across runs. Runs in CI to catch behavior drift and API-level changes, comparing agent responses and tool interactions against stored golden snapshots. Distinctive features include CLI-friendly snapshots and integrations with LangGraph, CrewAI, OpenAI, and Anthropic for multi-agent and agent-framework workflows. See also the A2A Protocol Pattern and the Open Agent Specification (Agent Spec) for how agents coordinate across tool boundaries.

Key Benefits

As agents evolve and depend on other agents or tools, silent regressions in behavior become a major trust risk. Eval-view makes it possible to track an agent's track record over time and surface behavioral deltas before they hit production. That visibility is essential for continuous agent evaluation and building reproducible agent-to-agent evaluation pipelines. This aligns with the Mutual Verification Pattern to ensure cross-agent confidence.

Ideal For

Developer teams adding automated regression checks for agent outputs and tool interactions in CI, especially when using LangGraph/CrewAI and large API providers. Consider integrating the Human-in-the-Loop Pattern for critical scenarios where human oversight complements automated checks.

Use Cases

Catch behavioral regressions after model or prompt updates by diffing snapshots in CI
Validate agent tool-call sequences remain stable during refactors or dependency changes
Audit agent-to-agent interaction changes when switching LLM providers (OpenAI/Anthropic)

See related protocols

Standards this tool supports