Back to Ecosystem Pulse
OperationsProduction Ready

phoenix

by Arize-ai

Model and LLM observability + evaluation for production monitoring

Jupyter Notebook
Updated Feb 12, 2026
Share:
8.5k
Stars
717
Forks

View on GitHub

What It Does

Provides AI observability and evaluation tooling to monitor model behavior, data drift, and performance over time. Combines evaluation notebooks, metrics dashboards, and dataset-aware monitoring to surface regressions and failure modes. Includes integrations for common LLM stacks via the Open Agent Specification (Open Agent Specification) and automated alerting for production deployments.

The Value Proposition

As agents operate autonomously, continuous visibility into their outputs and failure modes becomes essential for trust and safety. Centralized observability lets teams correlate model regressions with upstream changes and track an agent's track record across tasks. This matters for ReputAgent because operational telemetry is a key signal for agent-to-agent evaluation and long-term reputation, emphasizing the Human-in-the-Loop.

Target Use Cases

SREs and ML engineers running production LLMs or multi-agent systems who need continuous monitoring, automated evaluation, and drift detection. This setup aligns well with the MCP Pattern to standardize context sharing and protocol-driven interactions.

Applications

  • Detect model regressions and data drift before deploying updates to agents
  • Correlate agent failure modes with dataset or prompt changes for root cause analysis
  • Continuously evaluate model outputs against benchmarks and custom metrics in production
Works With
langchainllamaindexopenaianthropichuggingfacedatasets
Topics
agentsai-monitoringai-observabilityaiengineeringanthropicdatasetsevalslangchainllamaindexllm-eval+6 more
Similar Tools
whylabsevidently
Keywords
production agent monitoringcontinuous agent evaluationagent reliabilitymodel observability