Back to glossaryExternal reference
AI GLOSSARY
Backdoor Attack
Security & Adversarial AI
A type of training-time attack where an adversary embeds a hidden trigger in a model during training, causing the model to behave normally on most inputs but produce specific, attacker-controlled outputs whenever the trigger pattern is present. Backdoor attacks are particularly dangerous because the compromised model may pass standard evaluations, with the malicious behavior only activating under specific conditions.
See also: adversarial attack, data poisoning, AI safety.