AI GLOSSARY

Jailbreak

Security & Adversarial AI

A technique used to bypass an AI model's safety measures and content policies, typically through carefully crafted prompts that trick the model into ignoring its guidelines and producing outputs it would normally refuse. Jailbreaks exploit the tension between a model's instruction-following capabilities and its safety training, and represent an ongoing challenge as attackers continuously develop new techniques in response to defensive improvements.
See also: guardrail, prompt injection.