Back to glossary

AI GLOSSARY

Content Filtering

Security & Adversarial AI

The automated screening of inputs to or outputs from an AI system to detect and block harmful, policy-violating, or dangerous content. Content filters can operate at multiple points in the pipeline, checking user inputs before they reach the model, screening model outputs before they are returned to users, or both, and range from simple keyword matching to sophisticated classifier models.
See also: content moderation, abuse monitoring, behavioral policy.