Evaluation

Latency

1 min read

Definition

The time delay between sending a request to an agent and receiving its response.

Latency directly impacts user experience and system throughput. For agents, latency includes model inference time, tool calls, and any orchestration overhead.

Components

  • Time to first token: How fast response starts
  • Total completion time: Full response duration
  • Tool execution time: External API calls

Optimization

  • Model quantization
  • Caching frequent requests
  • Parallel tool execution
  • Streaming responses
evaluationperformancemetrics