Agents

Inference

1 min read

What It Means

The process of running a trained model to generate outputs from inputs.

Inference is when models are actually used (as opposed to training). For agents, each interaction involves inference.

Considerations

  • Latency requirements
  • Cost per request
  • Hardware requirements
  • Batching strategies

Optimization

  • Model quantization
  • Speculative decoding
  • Caching
  • Smaller models for simple tasks
agentsoperationsperformance