Evaluation

Ground Truth

1 min read

Quick Definition

The verified correct answer or outcome against which agent outputs are compared during evaluation.

Ground truth provides the reference standard for measuring accuracy. Without reliable ground truth, evaluation becomes subjective.

Sources

  • Human expert annotations
  • Verified factual databases
  • Mathematical proofs (for reasoning tasks)
  • Real-world outcomes (for predictions)

Challenges

  • Expensive to create at scale
  • May contain errors
  • Some tasks have no single correct answer
evaluationdataaccuracy