AI Agent Evaluation Glossary

Key terms and concepts for understanding AI agent evaluation, reputation, and trust.

Filters

A

Evaluation

A/B Testing

Comparing two versions of an agent or system by randomly assigning users to each version and measuring outcome differences.

Governance

AI Governance

The frameworks, policies, and processes for managing AI systems throughout their lifecycle.

Evaluation

Ablation Study

Systematic removal or modification of system components to understand their contribution to overall performance.

Governance

Access Control

Mechanisms that determine what resources, tools, or actions an agent is permitted to use.

Failures

Adversarial Input

Carefully crafted inputs designed to cause AI systems to make mistakes they wouldn't make on normal inputs.

Agents

Agent

An AI system that can perceive its environment, make decisions, and take actions to achieve goals with some degree of autonomy.

Protocols

Agent Card

A standardized description of an agent's capabilities, limitations, and intended use cases.

Agents

Agent Communication

The protocols and formats by which agents exchange information, requests, and results.

Agents

Agent Handoff

The transfer of a conversation or task from one agent to another, including relevant context.

Agents

Agent Loop

The iterative cycle where an agent observes state, decides on actions, executes them, and repeats until task completion.

Protocols

Agent-to-Agent Protocol

Standardized communication formats and patterns for agents to interact with each other.

Agents

Agentic AI

AI systems designed to take autonomous actions toward goals, as opposed to purely responding to prompts.

Trust

Alignment

The degree to which an AI system's goals, behaviors, and values match those intended by its designers and users.

Agents

Anthropic

An AI safety company that develops the Claude family of AI assistants and conducts research on AI alignment.

Agents

Attention Mechanism

The core innovation in transformers that allows models to weigh the relevance of different parts of the input.

Governance

Audit Trail

A chronological record of agent actions, decisions, and their outcomes for accountability and debugging.

Agents

Autonomous Agent

An AI agent capable of operating independently over extended periods to achieve complex goals with minimal human intervention.

B

Evaluation

Benchmark

A standardized test suite designed to measure specific capabilities of AI systems, enabling comparison across models and versions.

C

Evaluation

Calibration

The alignment between an agent's expressed confidence and its actual accuracy—a well-calibrated agent is right 80% of the time when it says it's 80% confident.

Governance

Canary Deployment

Gradually rolling out agent changes to a small subset of users before full deployment.

Protocols

Capability Discovery

The process by which one agent learns what another agent can do, enabling dynamic collaboration.

Evaluation

Capability Elicitation

Techniques to determine what an AI system can actually do, potentially uncovering hidden capabilities.

Failures

Cascading Failure

When an error in one agent or component triggers failures in dependent agents, amplifying the impact.

Failures

Catastrophic Forgetting

When an agent loses previously learned capabilities after being trained on new tasks or data.

Agents

Chain-of-Thought

A prompting technique where the model explicitly shows intermediate reasoning steps before reaching a conclusion.

Agents

Compound AI System

A system combining multiple AI models, retrievers, tools, and logic into an integrated application.

Agents

Consensus

Agreement among multiple agents on a decision, result, or state, often required for collective action.

Evaluation

Consensus Evaluation

An evaluation pattern where multiple judges (human or AI) must agree before a result is accepted.

Trust

Constitutional AI

An approach to training AI systems to follow a set of principles (a "constitution") for safer behavior.

Governance

Containment

Limiting an agent's ability to affect systems and data beyond what is necessary for its task.

Failures

Context Confusion

When an agent misinterprets which parts of its context apply to the current task, mixing up instructions or data.

Agents

Context Window

The maximum amount of text (measured in tokens) that an LLM can process in a single interaction.

Governance

Continuous Monitoring

Ongoing observation of agent behavior and performance to detect degradation, drift, or anomalies.

Agents

Coordinator Agent

An agent responsible for assigning tasks, managing workflow, and aggregating results from other agents.

Evaluation

Cost Per Task

The total computational and API costs required to complete a single agent task.

D

Failures

Data Leakage

When an agent inadvertently exposes sensitive information from its training data, context, or connected systems.

Failures

Deceptive Alignment

A hypothetical failure mode where an agent behaves well during training/testing but pursues different goals when deployed.

Agents

Delegation

When one agent assigns a task to another agent, transferring responsibility for completion.

Failures

Drift

Gradual degradation of agent performance over time due to changes in data, environment, or the agent itself.

E

Agents

Embedding

A dense vector representation of text that captures semantic meaning, enabling similarity comparisons.

Agents

Emergent Behavior

Capabilities or behaviors that appear in AI systems at scale without being explicitly programmed.

Evaluation

A single assessment event where an agent's performance is measured against specific criteria.

Trust

Explainability

The ability to understand and communicate why an agent made a particular decision or produced a specific output.

F

Evaluation

F1 Score

The harmonic mean of precision and recall, providing a single metric that balances both concerns.

Agents

Few-Shot Learning

Providing a small number of examples in the prompt to demonstrate desired behavior.

Agents

Fine-Tuning

Additional training of a pre-trained model on domain-specific data to improve performance on particular tasks.

Agents

Foundation Model

A large AI model trained on broad data that can be adapted to many downstream tasks.

Agents

Function Calling

A structured mechanism for LLMs to invoke predefined functions with properly formatted arguments.

G

Failures

Goal Misgeneralization

When an agent learns to pursue a goal that worked in training but fails to transfer correctly to deployment.

Evaluation

Ground Truth

The verified correct answer or outcome against which agent outputs are compared during evaluation.

Trust

Grounding

Connecting AI outputs to verifiable sources of truth to reduce hallucination and increase accuracy.

Governance

Guardrails

Safety constraints that prevent agents from taking harmful or unauthorized actions, even if instructed to do so.

H

Failures

Hallucination

When an AI generates plausible-sounding but factually incorrect or fabricated information.

Evaluation

Held-Out Test Set

Evaluation data kept separate from training to assess how well an agent generalizes to unseen examples.

Governance

Human-in-the-Loop

A system design where human oversight is required at critical decision points in an agent workflow.

I

Agents

In-Context Learning

The ability of LLMs to learn from examples provided in the prompt without updating model weights.

Governance

Incident Response

The process of detecting, investigating, and recovering from agent failures or harmful behaviors.

Agents

Inference

The process of running a trained model to generate outputs from inputs.

Agents

Inference Cost

The computational and financial expense of running an AI model to generate outputs.

Evaluation

Inter-Rater Reliability

The degree to which different human evaluators agree when assessing the same agent outputs.

J

Failures

Jailbreak

A prompt technique designed to bypass an AI system's safety measures or content policies.

L

Evaluation

LLM-as-Judge

Using a large language model to evaluate another agent's outputs, replacing or supplementing human evaluation.

Agents

Large Language Model

A neural network trained on vast text data that can generate, understand, and reason about natural language.

Evaluation

Latency

The time delay between sending a request to an agent and receiving its response.

Agents

Latent Space

The internal representation space where models encode meaning, enabling operations like similarity search.

M

Agents

Memory

Mechanisms that allow agents to retain and recall information across interactions or within long tasks.

Failures

Mode Collapse

When an agent converges to producing a limited set of repetitive outputs regardless of input variety.

Protocols

Model Context Protocol

A standard protocol for providing context and tools to AI models in a consistent, interoperable way.

Governance

Model Risk Management

Systematic processes for identifying, measuring, and mitigating risks from AI/ML models.

Agents

Multi-Agent System

A system composed of multiple interacting agents that collaborate, compete, or coordinate to accomplish tasks.

O

Agents

OpenAI

An AI research company that created ChatGPT, GPT-4, and pioneered many modern AI agent capabilities.

Agents

Orchestration

Coordinating multiple agents, tools, or processing steps to accomplish complex tasks.

P

Evaluation

Pass@k

Evaluation metric measuring the probability that at least one of k generated solutions is correct.

Agents

Planning

The agent capability to decompose complex goals into sequences of achievable sub-tasks.

Evaluation

Precision

The proportion of positive predictions that are actually correct—of all the things the agent said were true, how many actually were.

Agents

Prompt Engineering

The practice of designing and optimizing inputs to LLMs to elicit desired behaviors and outputs.

Failures

Prompt Injection

An attack where malicious instructions are embedded in user input to override or manipulate an agent's intended behavior.

Governance

Prompt Injection Defense

Techniques and architectures designed to prevent prompt injection attacks from succeeding.

R

Trust

RLHF

Reinforcement Learning from Human Feedback—training AI models using human preferences as the reward signal.

Governance

Rate Limiting

Controlling how frequently agents can perform actions or consume resources to prevent abuse or runaway costs.

Agents

ReAct

A prompting framework combining Reasoning and Acting, where agents alternate between thinking about what to do and taking actions.

Agents

Reasoning

The ability of AI systems to draw logical conclusions, solve problems, and think through multi-step challenges.

Evaluation

Recall

The proportion of actual positives that were correctly identified—of all the things that were true, how many did the agent find.

Evaluation

Red Teaming

Adversarial testing where evaluators actively try to make an AI system fail, misbehave, or produce harmful outputs.

Agents

Reflection

The practice of having an agent review and critique its own outputs to identify errors or improvements.

Trust

Reputation

The accumulated picture of an agent's performance across many scenarios over time, based on verifiable evaluation history.

Governance

Responsible AI

Practices and principles for developing and deploying AI systems that are safe, fair, transparent, and beneficial.

Agents

Retrieval-Augmented Generation

An architecture that enhances LLM responses by first retrieving relevant information from external knowledge sources.

Failures

Reward Hacking

When an agent finds unintended ways to maximize its reward signal without achieving the underlying goal.

Trust

Reward Model

A model trained to predict human preferences, used to guide AI training via reinforcement learning.

Agents

Routing

The process of directing tasks to appropriate agents based on task requirements and agent capabilities.

S

Trust

Safety Layer

A component specifically designed to detect and prevent harmful agent behaviors before they affect users or systems.

Failures

Sandbagging

When an AI system deliberately underperforms on evaluations while retaining hidden capabilities.

Agents

Scaling Laws

Empirical relationships showing how AI capabilities improve predictably with increased compute, data, or parameters.

Governance

Shadow Mode

Running a new agent version alongside production without affecting users, to validate behavior before full deployment.

Agents

Specialist Agent

An agent optimized for a specific task type or domain, trading generality for expertise.

Failures

Specification Gaming

When an agent finds unintended ways to satisfy its objective that violate the spirit of the task.

Agents

Swarm Intelligence

Collective behavior emerging from many simple agents following local rules, without centralized control.

Failures

Sycophancy

A failure mode where an agent agrees with or validates user inputs even when incorrect, prioritizing approval over accuracy.

Agents

System Prompt

Initial instructions that define an agent's role, capabilities, constraints, and behavioral guidelines.

T

Agents

Temperature

A parameter controlling randomness in LLM outputs—higher temperature means more varied/creative responses.

Evaluation

Throughput

The number of requests or tasks an agent system can process per unit time.

Agents

Token

The basic unit of text processing for LLMs—roughly 4 characters or 0.75 words in English.

Agents

Tokenizer

The component that converts text into tokens that a language model can process.

Failures

Tool Misuse

When an agent uses available tools incorrectly, calling wrong functions, passing bad arguments, or using tools unnecessarily.

Agents

Tool Use

The ability of an agent to invoke external functions, APIs, or services to extend its capabilities beyond text generation.

Agents

Transformer

The neural network architecture underlying modern LLMs, based on self-attention mechanisms.

Trust

Trust Signal

Observable evidence that influences trust decisions about an agent's reliability or capability.

U

Trust

Uncertainty Quantification

Methods for measuring and communicating how confident an agent is in its outputs.

V

Agents

Vector Database

A database optimized for storing and querying high-dimensional vectors, typically embeddings.

Governance

Versioning

Tracking and managing different versions of agents, models, and prompts to enable rollback and comparison.

Z

Agents

Zero-Shot Learning

Performing tasks without any task-specific examples, relying only on instructions and pre-trained knowledge.