AI GLOSSARY

Reward Model

Research & Advanced Concepts

A model trained to predict how much a human would prefer a given output, assigning a scalar reward score that captures human judgment about quality, helpfulness, or safety. Reward models are central to reinforcement learning from human feedback, translating human preferences into a signal that can be used to fine-tune language models, acting as a proxy for human evaluation at scale.