Calibration is crucial for trustworthy agents. Overconfident agents make users trust wrong answers; underconfident agents waste effort on unnecessary verification.
Measurement
- Reliability diagrams (calibration curves)
- Expected calibration error (ECE)
- Brier score
Improving Calibration
- Temperature scaling
- Confidence training
- Ensemble methods