eval-view
by hidai25
CI-friendly regression testing for agent behavior and tool-call diffs
Overview
Detects regressions in agent behavior by snapshotting outputs and diffing tool calls across runs. Runs in CI to catch behavior drift and API-level changes, comparing agent responses and tool interactions against stored golden snapshots. Distinctive features include CLI-friendly snapshots and integrations with LangGraph, CrewAI, OpenAI, and Anthropic for multi-agent and agent-framework workflows. See also the A2A Protocol Pattern and the Open Agent Specification (Agent Spec) for how agents coordinate across tool boundaries.
Key Benefits
Ideal For
Developer teams adding automated regression checks for agent outputs and tool interactions in CI, especially when using LangGraph/CrewAI and large API providers. Consider integrating the Human-in-the-Loop Pattern for critical scenarios where human oversight complements automated checks.
Use Cases
- Catch behavioral regressions after model or prompt updates by diffing snapshots in CI
- Validate agent tool-call sequences remain stable during refactors or dependency changes
- Audit agent-to-agent interaction changes when switching LLM providers (OpenAI/Anthropic)