AI GLOSSARY

Outer Alignment

Safety, Alignment & Ethics

The challenge of ensuring that the objective used to train an AI system correctly captures what we actually want, so that the training signal genuinely reflects human values rather than a proxy that diverges from them in important cases. Outer alignment is distinct from inner alignment, which concerns whether the model actually learns the training objective. Both must be solved for a system to be truly aligned.