Evaluation

Pass@k

1 min read

Definition

Evaluation metric measuring the probability that at least one of k generated solutions is correct.

Pass@k is common in code generation evaluation. It accounts for the stochastic nature of LLM outputs.

Variants

  • Pass@1: Single attempt accuracy
  • Pass@10: Any of 10 attempts succeeds
  • Pass@100: Very lenient, best of 100

Interpretation

Higher k values show potential capability; Pass@1 shows practical reliability.

evaluationcodingmetrics