Trust

Constitutional AI

1 min read

Quick Definition

An approach to training AI systems to follow a set of principles (a "constitution") for safer behavior.

Constitutional AI (CAI), developed by Anthropic, trains models to critique and revise their own outputs according to principles.

Process

  1. Generate response
  2. Critique against principles
  3. Revise response
  4. Train on improved outputs

Benefits

  • Scalable safety training
  • Explicit principles
  • Self-improvement
trustsafetytraining