EvaluationProduction Ready

giskard-oss

Name: giskard-oss
Rating: 3.5 (5112 reviews)
Author: Giskard-AI

by Giskard-AI

Open-source LLM and agent evaluation, red-teaming, and continuous testing

Python

Updated Feb 11, 2026

5.1k

Stars

393

Forks

Commits/Week

Commits/Month

View on GitHub

Summary

Provides an open-source framework for evaluating and testing LLMs and agent behaviors. Runs red-team tests, metrics-driven evaluations, and fairness checks using configurable test suites and data sinks. Offers interactive dashboards, automated test pipelines, and connectors to common model providers for reproducible LLM/agent validation MCP Pattern and Defense in Depth Pattern.

Why It Matters

As agents are composed and delegated across services, systematic evaluation is required to surface failure modes and measure reliability. Giskard makes continuous evaluation and red-team testing practical, so teams can track agent track record and regression over time. For multi-agent trust, it supplies the metrics and test harnesses needed to compare agents and feed reputation systems like RepKit Agent Registry Pattern.

Target Use Cases

Teams validating LLMs or agent components before deployment who need automated tests, fairness checks, and dashboards for continuous agent evaluation Human-in-the-Loop.

How It's Used

Run red-team and adversarial tests against LLM-driven agents to find unsafe behaviors
Automate regression and continuous evaluation pipelines for model updates
Measure fairness, robustness, and performance across model providers for pre-production gating

See related protocols

Standards this tool supports