Trust

Alignment

1 min read

What It Means

The degree to which an AI system's goals, behaviors, and values match those intended by its designers and users.

Alignment is the fundamental challenge of ensuring AI does what we want, even as systems become more capable.

Dimensions

Intent alignment: Does it try to do what we want?
Capability alignment: Can it succeed?
Value alignment: Does it share our values?

Challenges

Specification gaming
Distributional shift
Emergent goals
Interpretability gaps

Not sure where to start?

Get personalized recommendations

trustsafetyalignment

Back to Glossary