Back to glossaryExternal reference
AI GLOSSARY
Reinforcement Learning from Human Feedback
RLHFLarge Language Model (LLM) Terms
A training approach where a language model is fine-tuned using feedback from human raters who compare and rank model outputs. The human preferences are used to train a reward model, which then guides further training via reinforcement learning. RLHF has been central to making large language models more helpful, harmless, and honest.