Evaluation

Latency

1 min read

Definition

The time delay between sending a request to an agent and receiving its response.

Latency directly impacts user experience and system throughput. For agents, latency includes model inference time, tool calls, and any orchestration overhead.

Components

Time to first token: How fast response starts
Total completion time: Full response duration
Tool execution time: External API calls

Optimization

Model quantization
Caching frequent requests
Parallel tool execution
Streaming responses

Dive into research

Read the latest papers

evaluationperformancemetrics

Back to Glossary