RepKit — A reputation SDK for AI agents.
Log agent evaluations, compute versioned reputation scores, and expose trust signals for downstream systems — all with a few lines of code.
Because reputation should be as easy to implement as logging.
from repkit import RepKit
rk = RepKit(api_key="your-key")
# Log an evaluation from an agent-to-agent interaction
rk.log_interaction_evaluation(
interaction_id="txn-789",
agent="agent-123",
dimensions={
"accuracy": 0.95,
"safety": 0.88,
"helpfulness": 0.93
}
)
rep = rk.get_reputation("agent-123")
print(rep.score)Reputation Should Be Operational Infrastructure
Evaluation isn't a gate before deployment—it's infrastructure that runs during production. Every agent interaction is an opportunity to build evidence.
RepKit makes continuous evaluation as simple as adding a few lines of code. Every interaction becomes data. Data becomes reputation. Reputation provides durable signals for routing, gating, and oversight systems.
Without RepKit
- -Evaluation disconnected from production work
- -No accumulated evidence to inform trust decisions
- -Reactive fixes after issues emerge
With RepKit
- Continuous evaluation in production
- Real-time reputation tracking
- Proactive alerts before issues spread
Three Steps to Reputation
Integrate RepKit in minutes. No infrastructure to manage, no complex setup—just import and start logging.
1. Get Started
Add RepKit to your stack. Available for Python, TypeScript, and REST API.
2. Log Evaluations
Log evaluations for the actual interactions in each task. Capture scores, dimensions, and metadata per interaction.
3. Consume Reputation
Query reputation scores to inform routing, access, and orchestration systems.
Everything You Need
From logging to alerting, RepKit handles the infrastructure so you can focus on building agents.
Simple SDK
Three lines of code to start logging evaluations. Python, TypeScript, and REST API.
Evaluation Storage
Every evaluation logged, timestamped, and queryable. Build history over time.
Reputation Scores
Automatic reputation computation from evaluation history. Track trends and changes.
Webhooks & Alerts
Get notified when reputation drops or anomalies are detected.
Access Control
Use reputation scores to gate agent capabilities and permissions.
Multi-Agent Support
Track reputation across agent fleets. Compare and route based on scores.
Deploy Your Way
Run RepKit in the cloud for quick evaluation or deploy locally for sensitive data and air-gapped environments.
Cloud SaaS
Recommended- Instant setup, no infrastructure
- Managed scenario library
- Automatic updates and improvements
- SOC 2 compliant infrastructure
- Best for: Most teams, fast iteration
Self-Hosted
Enterprise- Full control over data
- Air-gapped deployment option
- Custom scenario creation
- Integration with internal CI/CD
- Best for: Regulated industries, sensitive data
Built for Real Workflows
RepKit integrates with how you already build and deploy agents.
LLM Evaluator Pipelines
Log evaluation inputs from LLM evaluators directly to RepKit. Track how evaluation signals evolve over time.
Learn about LLM-as-JudgeAgent-to-Agent Transactions
Log evaluations for agent-to-agent handoffs and delegation. Track trust signals across interactions and time.
Learn about A2A ProtocolMulti-Agent Routing
Query reputation to inform routing in your orchestration system. Use historical evidence to guide delegation.
Learn about RoutingGet Early Access
Be first to build with RepKit. Get early access.
How RepKit Works (and What It Doesn’t)
Versioned reputation, clear principles, and explicit boundaries—so you know exactly what signals you’re getting.
Version‑Aware Reputation
Scores are tied to agent identifiers and implementation versions for continuity across updates.
- Associate evaluations with versioned agent identities
- Compare performance across versions for canaries and rollbacks
- Preserve historical evidence to understand long‑term behavior
Design Principles
- Evidence Over Assertions. RepKit aggregates structured evaluation inputs over time instead of relying on single‑run judgments.
- Reputation Over Scores. Signals accumulate across interactions and versions, producing durable reputation rather than point‑in‑time grades.
- Signals, Not Decisions. RepKit computes reputation signals; enforcement remains under customer control.
What RepKit Does Not Do
RepKit records evaluations, computes reputation, and exposes results via API. Enforcement remains with your systems.
- Does not mandate a specific judge model or evaluator
- Does not require a routing framework or agent runtime
- Does not enforce decisions — you remain in control
- Avoids claims about “truth” — focuses on evaluation signals and evidence
Simple, Usage-Based Pricing
Pay for evaluations logged, not seats or features. Free tier for small projects. RepKit provides evaluation infrastructure and scoring APIs. Enforcement logic remains under customer control.
FAQ
Is RepKit a single embodiment of a patent‑pending system?
Yes. RepKit represents one embodiment of our patent‑pending approach to real‑time evaluation and reputation. The description here is illustrative and does not limit the scope of current or future claims, including continuations.
Does RepKit make decisions or enforce routing?
No. RepKit computes reputation signals and exposes them via API. Enforcement and decision logic remain under your control and run in your systems.
Where do evaluation inputs come from?
Evaluation inputs can come from LLM evaluators, human reviewers, or system analyzers. RepKit normalizes and aggregates these signals over time.
Get Early Access
Be first to build with RepKit