Back to glossary
AI GLOSSARY
Guardrail
Large Language Model (LLM) Terms
A mechanism, either built into a model or applied around it, that prevents the model from producing certain types of outputs, such as harmful content, sensitive personal information, or off-topic responses. Guardrails can be implemented through training, filtering, prompting, or external classifiers, and are a key component of responsible AI deployment.
See also: content moderation, reinforcement learning from human feedback, AI safety.