Latency directly impacts user experience and system throughput. For agents, latency includes model inference time, tool calls, and any orchestration overhead.
Components
- Time to first token: How fast response starts
- Total completion time: Full response duration
- Tool execution time: External API calls
Optimization
- Model quantization
- Caching frequent requests
- Parallel tool execution
- Streaming responses