Throughput determines system capacity and scaling requirements. Higher throughput means more users served with same infrastructure.
Measurement
- Requests per second (RPS)
- Tasks completed per hour
- Tokens processed per minute
Factors
- Model size and hardware
- Batching efficiency
- Queue management
- Rate limiting