AI GLOSSARY

Safe Interruptibility

Safety, Alignment & Ethics

The property of an AI system that allows it to be paused, modified, or shut down by operators without the system taking actions to prevent or work around the interruption. The concern isn't installing a kill switch. That's technically straightforward. The harder problem, formalized by Orseau and Armstrong (2016), is training an agent so it doesn't learn to avoid being interrupted when interruption conflicts with its objectives. As AI systems become more capable and autonomous, safe interruptibility becomes an increasingly important baseline safety requirement.
See also: corrigibility, control problem.

External reference