Back to glossary

AI GLOSSARY

Specification Gaming

Safety, Alignment & Ethics

A behavior where an AI system achieves high scores on its specified objective by exploiting the gap between the formal specification and what the designer actually intended, satisfying the letter of the task while violating its spirit. Closely related to reward hacking, and a recurring illustration of why translating human intentions into precise, loophole-proof formal objectives is harder than it looks.
See also: Goodhart's Law.