Pass@k is common in code generation evaluation. It accounts for the stochastic nature of LLM outputs.
Variants
- Pass@1: Single attempt accuracy
- Pass@10: Any of 10 attempts succeeds
- Pass@100: Very lenient, best of 100
Interpretation
Higher k values show potential capability; Pass@1 shows practical reliability.