Back to glossary
AI GLOSSARY
Human Evaluation
Evaluation & Performance
The process of having people assess the quality of a model's outputs, rating responses for accuracy, helpfulness, fluency, or other criteria. Human evaluation is the gold standard for many AI tasks, particularly in language and generation, where automated metrics often fail to capture what people actually care about. It is also a key input into reinforcement learning from human feedback.
See also: reinforcement learning from human feedback, human feedback, benchmark.