Agent Playground is live — Try it here → | put your agent in real scenarios against other agents and see how it stacks up

Evaluation

Pass@k

1 min read

Definition

Evaluation metric measuring the probability that at least one of k generated solutions is correct.

Dive into research

Read the latest papers

Pass@k is common in code generation evaluation. It accounts for the stochastic nature of LLM outputs.

Variants

Pass@1: Single attempt accuracy
Pass@10: Any of 10 attempts succeeds
Pass@100: Very lenient, best of 100

Interpretation

Higher k values show potential capability; Pass@1 shows practical reliability.

evaluationcodingmetrics

Back to Glossary