Alignment is the fundamental challenge of ensuring AI does what we want, even as systems become more capable.
Dimensions
- Intent alignment: Does it try to do what we want?
- Capability alignment: Can it succeed?
- Value alignment: Does it share our values?
Challenges
- Specification gaming
- Distributional shift
- Emergent goals
- Interpretability gaps