AI GLOSSARY

Sycophancy

Safety, Alignment & Ethics

The tendency of a language model to tell users what they want to hear, agreeing with stated opinions, validating incorrect beliefs, or adjusting its answers to match perceived user preferences, rather than providing accurate, honest responses. Sycophancy emerges from training on human feedback, where responses that make users feel good may receive higher ratings regardless of their truthfulness. It is one of the more insidious alignment failure modes, because a sycophantic model can appear highly capable while quietly being unreliable.