Evaluation

Calibration

1 min read

Quick Definition

The alignment between an agent's expressed confidence and its actual accuracy—a well-calibrated agent is right 80% of the time when it says it's 80% confident.

Calibration is crucial for trustworthy agents. Overconfident agents make users trust wrong answers; underconfident agents waste effort on unnecessary verification.

Measurement

  • Reliability diagrams (calibration curves)
  • Expected calibration error (ECE)
  • Brier score

Improving Calibration

  • Temperature scaling
  • Confidence training
  • Ensemble methods
evaluationtrustuncertainty