AI Agent Evaluation Glossary

Key terms and concepts for understanding AI agent evaluation, reputation, and trust.

Filters

112 terms

A

Evaluation
A/B Testing

Comparing two versions of an agent or system by randomly assigning users to each version and measuring outcome differences.

Governance
AI Governance

The frameworks, policies, and processes for managing AI systems throughout their lifecycle.

Evaluation
Ablation Study

Systematic removal or modification of system components to understand their contribution to overall performance.

Governance
Access Control

Mechanisms that determine what resources, tools, or actions an agent is permitted to use.

Failures
Adversarial Input

Carefully crafted inputs designed to cause AI systems to make mistakes they wouldn't make on normal inputs.

Agents
Agent

An AI system that can perceive its environment, make decisions, and take actions to achieve goals with some degree of autonomy.

Protocols
Agent Card

A standardized description of an agent's capabilities, limitations, and intended use cases.

Agents
Agent Communication

The protocols and formats by which agents exchange information, requests, and results.

Agents
Agent Handoff

The transfer of a conversation or task from one agent to another, including relevant context.

Agents
Agent Loop

The iterative cycle where an agent observes state, decides on actions, executes them, and repeats until task completion.

Protocols
Agent-to-Agent Protocol

Standardized communication formats and patterns for agents to interact with each other.

Agents
Agentic AI

AI systems designed to take autonomous actions toward goals, as opposed to purely responding to prompts.

Trust
Alignment

The degree to which an AI system's goals, behaviors, and values match those intended by its designers and users.

Agents
Anthropic

An AI safety company that develops the Claude family of AI assistants and conducts research on AI alignment.

Agents
Attention Mechanism

The core innovation in transformers that allows models to weigh the relevance of different parts of the input.

Governance
Audit Trail

A chronological record of agent actions, decisions, and their outcomes for accountability and debugging.

Agents
Autonomous Agent

An AI agent capable of operating independently over extended periods to achieve complex goals with minimal human intervention.

B

Evaluation
Benchmark

A standardized test suite designed to measure specific capabilities of AI systems, enabling comparison across models and versions.

C

Evaluation
Calibration

The alignment between an agent's expressed confidence and its actual accuracy—a well-calibrated agent is right 80% of the time when it says it's 80% confident.

Governance
Canary Deployment

Gradually rolling out agent changes to a small subset of users before full deployment.

Protocols
Capability Discovery

The process by which one agent learns what another agent can do, enabling dynamic collaboration.

Evaluation
Capability Elicitation

Techniques to determine what an AI system can actually do, potentially uncovering hidden capabilities.

Failures
Cascading Failure

When an error in one agent or component triggers failures in dependent agents, amplifying the impact.

Failures
Catastrophic Forgetting

When an agent loses previously learned capabilities after being trained on new tasks or data.

Agents
Chain-of-Thought

A prompting technique where the model explicitly shows intermediate reasoning steps before reaching a conclusion.

Agents
Compound AI System

A system combining multiple AI models, retrievers, tools, and logic into an integrated application.

Agents
Consensus

Agreement among multiple agents on a decision, result, or state, often required for collective action.

Evaluation
Consensus Evaluation

An evaluation pattern where multiple judges (human or AI) must agree before a result is accepted.

Trust
Constitutional AI

An approach to training AI systems to follow a set of principles (a "constitution") for safer behavior.

Governance
Containment

Limiting an agent's ability to affect systems and data beyond what is necessary for its task.

Failures
Context Confusion

When an agent misinterprets which parts of its context apply to the current task, mixing up instructions or data.

Agents
Context Window

The maximum amount of text (measured in tokens) that an LLM can process in a single interaction.

Governance
Continuous Monitoring

Ongoing observation of agent behavior and performance to detect degradation, drift, or anomalies.

Agents
Coordinator Agent

An agent responsible for assigning tasks, managing workflow, and aggregating results from other agents.

Evaluation
Cost Per Task

The total computational and API costs required to complete a single agent task.

D

Failures
Data Leakage

When an agent inadvertently exposes sensitive information from its training data, context, or connected systems.

Failures
Deceptive Alignment

A hypothetical failure mode where an agent behaves well during training/testing but pursues different goals when deployed.

Agents
Delegation

When one agent assigns a task to another agent, transferring responsibility for completion.

Failures
Drift

Gradual degradation of agent performance over time due to changes in data, environment, or the agent itself.

E

Agents
Embedding

A dense vector representation of text that captures semantic meaning, enabling similarity comparisons.

Agents
Emergent Behavior

Capabilities or behaviors that appear in AI systems at scale without being explicitly programmed.

Evaluation
Evaluation

A single assessment event where an agent's performance is measured against specific criteria.

Trust
Explainability

The ability to understand and communicate why an agent made a particular decision or produced a specific output.

F

Evaluation
F1 Score

The harmonic mean of precision and recall, providing a single metric that balances both concerns.

Agents
Few-Shot Learning

Providing a small number of examples in the prompt to demonstrate desired behavior.

Agents
Fine-Tuning

Additional training of a pre-trained model on domain-specific data to improve performance on particular tasks.

Agents
Foundation Model

A large AI model trained on broad data that can be adapted to many downstream tasks.

Agents
Function Calling

A structured mechanism for LLMs to invoke predefined functions with properly formatted arguments.

G

Failures
Goal Misgeneralization

When an agent learns to pursue a goal that worked in training but fails to transfer correctly to deployment.

Evaluation
Ground Truth

The verified correct answer or outcome against which agent outputs are compared during evaluation.

Trust
Grounding

Connecting AI outputs to verifiable sources of truth to reduce hallucination and increase accuracy.

Governance
Guardrails

Safety constraints that prevent agents from taking harmful or unauthorized actions, even if instructed to do so.

H

Failures
Hallucination

When an AI generates plausible-sounding but factually incorrect or fabricated information.

Evaluation
Held-Out Test Set

Evaluation data kept separate from training to assess how well an agent generalizes to unseen examples.

Governance
Human-in-the-Loop

A system design where human oversight is required at critical decision points in an agent workflow.

I

Agents
In-Context Learning

The ability of LLMs to learn from examples provided in the prompt without updating model weights.

Governance
Incident Response

The process of detecting, investigating, and recovering from agent failures or harmful behaviors.

Agents
Inference

The process of running a trained model to generate outputs from inputs.

Agents
Inference Cost

The computational and financial expense of running an AI model to generate outputs.

Evaluation
Inter-Rater Reliability

The degree to which different human evaluators agree when assessing the same agent outputs.

J

Failures
Jailbreak

A prompt technique designed to bypass an AI system's safety measures or content policies.

L

Evaluation
LLM-as-Judge

Using a large language model to evaluate another agent's outputs, replacing or supplementing human evaluation.

Agents
Large Language Model

A neural network trained on vast text data that can generate, understand, and reason about natural language.

Evaluation
Latency

The time delay between sending a request to an agent and receiving its response.

Agents
Latent Space

The internal representation space where models encode meaning, enabling operations like similarity search.

M

Agents
Memory

Mechanisms that allow agents to retain and recall information across interactions or within long tasks.

Failures
Mode Collapse

When an agent converges to producing a limited set of repetitive outputs regardless of input variety.

Protocols
Model Context Protocol

A standard protocol for providing context and tools to AI models in a consistent, interoperable way.

Governance
Model Risk Management

Systematic processes for identifying, measuring, and mitigating risks from AI/ML models.

Agents
Multi-Agent System

A system composed of multiple interacting agents that collaborate, compete, or coordinate to accomplish tasks.

O

Agents
OpenAI

An AI research company that created ChatGPT, GPT-4, and pioneered many modern AI agent capabilities.

Agents
Orchestration

Coordinating multiple agents, tools, or processing steps to accomplish complex tasks.

P

Evaluation
Pass@k

Evaluation metric measuring the probability that at least one of k generated solutions is correct.

Agents
Planning

The agent capability to decompose complex goals into sequences of achievable sub-tasks.

Evaluation
Precision

The proportion of positive predictions that are actually correct—of all the things the agent said were true, how many actually were.

Agents
Prompt Engineering

The practice of designing and optimizing inputs to LLMs to elicit desired behaviors and outputs.

Failures
Prompt Injection

An attack where malicious instructions are embedded in user input to override or manipulate an agent's intended behavior.

Governance
Prompt Injection Defense

Techniques and architectures designed to prevent prompt injection attacks from succeeding.

R

Trust
RLHF

Reinforcement Learning from Human Feedback—training AI models using human preferences as the reward signal.

Governance
Rate Limiting

Controlling how frequently agents can perform actions or consume resources to prevent abuse or runaway costs.

Agents
ReAct

A prompting framework combining Reasoning and Acting, where agents alternate between thinking about what to do and taking actions.

Agents
Reasoning

The ability of AI systems to draw logical conclusions, solve problems, and think through multi-step challenges.

Evaluation
Recall

The proportion of actual positives that were correctly identified—of all the things that were true, how many did the agent find.

Evaluation
Red Teaming

Adversarial testing where evaluators actively try to make an AI system fail, misbehave, or produce harmful outputs.

Agents
Reflection

The practice of having an agent review and critique its own outputs to identify errors or improvements.

Trust
Reputation

The accumulated picture of an agent's performance across many scenarios over time, based on verifiable evaluation history.

Governance
Responsible AI

Practices and principles for developing and deploying AI systems that are safe, fair, transparent, and beneficial.

Agents
Retrieval-Augmented Generation

An architecture that enhances LLM responses by first retrieving relevant information from external knowledge sources.

Failures
Reward Hacking

When an agent finds unintended ways to maximize its reward signal without achieving the underlying goal.

Trust
Reward Model

A model trained to predict human preferences, used to guide AI training via reinforcement learning.

Agents
Routing

The process of directing tasks to appropriate agents based on task requirements and agent capabilities.

S

Trust
Safety Layer

A component specifically designed to detect and prevent harmful agent behaviors before they affect users or systems.

Failures
Sandbagging

When an AI system deliberately underperforms on evaluations while retaining hidden capabilities.

Agents
Scaling Laws

Empirical relationships showing how AI capabilities improve predictably with increased compute, data, or parameters.

Governance
Shadow Mode

Running a new agent version alongside production without affecting users, to validate behavior before full deployment.

Agents
Specialist Agent

An agent optimized for a specific task type or domain, trading generality for expertise.

Failures
Specification Gaming

When an agent finds unintended ways to satisfy its objective that violate the spirit of the task.

Agents
Swarm Intelligence

Collective behavior emerging from many simple agents following local rules, without centralized control.

Failures
Sycophancy

A failure mode where an agent agrees with or validates user inputs even when incorrect, prioritizing approval over accuracy.

Agents
System Prompt

Initial instructions that define an agent's role, capabilities, constraints, and behavioral guidelines.

T

Agents
Temperature

A parameter controlling randomness in LLM outputs—higher temperature means more varied/creative responses.

Evaluation
Throughput

The number of requests or tasks an agent system can process per unit time.

Agents
Token

The basic unit of text processing for LLMs—roughly 4 characters or 0.75 words in English.

Agents
Tokenizer

The component that converts text into tokens that a language model can process.

Failures
Tool Misuse

When an agent uses available tools incorrectly, calling wrong functions, passing bad arguments, or using tools unnecessarily.

Agents
Tool Use

The ability of an agent to invoke external functions, APIs, or services to extend its capabilities beyond text generation.

Agents
Transformer

The neural network architecture underlying modern LLMs, based on self-attention mechanisms.

Trust
Trust Signal

Observable evidence that influences trust decisions about an agent's reliability or capability.

U

Trust
Uncertainty Quantification

Methods for measuring and communicating how confident an agent is in its outputs.

V

Agents
Vector Database

A database optimized for storing and querying high-dimensional vectors, typically embeddings.

Governance
Versioning

Tracking and managing different versions of agents, models, and prompts to enable rollback and comparison.

Z

Agents
Zero-Shot Learning

Performing tasks without any task-specific examples, relying only on instructions and pre-trained knowledge.