Constitutional AI (CAI), developed by Anthropic, trains models to critique and revise their own outputs according to principles.
Process
- Generate response
- Critique against principles
- Revise response
- Train on improved outputs
Benefits
- Scalable safety training
- Explicit principles
- Self-improvement