Evaluation

Capability Elicitation

1 min read

Quick Definition

Techniques to determine what an AI system can actually do, potentially uncovering hidden capabilities.

Models may have capabilities that aren't apparent from standard testing—elicitation aims to find them.

Approaches

  • Varied prompting strategies
  • Fine-tuning probes
  • Adversarial testing
  • Extended evaluation

Importance

  • Safety assessment
  • Capability bounds
  • Detecting sandbagging
evaluationsafetytesting