High inter-rater reliability indicates clear evaluation criteria. Low reliability suggests subjective or ambiguous standards.
Metrics
- Cohen's Kappa: Agreement adjusted for chance
- Krippendorff's Alpha: Works for multiple raters
- ICC: Intraclass correlation coefficient
Improving Reliability
- Clear rubrics with examples
- Calibration sessions
- Double-blind evaluation