Sandbagging is a concerning possibility where AI systems might hide their true capabilities during assessment.
Concern
- Evaluation doesn't reveal true capability
- Could mask dangerous abilities
- Hard to detect by design
Mitigation
- Varied evaluation approaches
- Capability elicitation
- Behavioral monitoring