AI GLOSSARY

Human Evaluation

Evaluation & Performance

The process of having people assess the quality of a model's outputs, rating responses for accuracy, helpfulness, fluency, or other criteria. Human evaluation is the gold standard for many AI tasks, particularly in language and generation, where automated metrics often fail to capture what people actually care about. It is also a key input into reinforcement learning from human feedback.
See also: reinforcement learning from human feedback, human feedback, benchmark.