Back to glossary

AI GLOSSARY

Output Filtering

Security & Adversarial AI

The automated screening of an AI model's outputs before they are returned to users, detecting and blocking harmful, policy-violating, or sensitive content that the model may have generated despite safety training. Output filtering acts as a last line of defense in the content safety pipeline, complementing model-level safety training with an additional layer of protection that can be updated independently.
See also: guardrail.