Back to glossary

AI GLOSSARY

Reinforcement Learning from Human Feedback

RLHFLarge Language Model (LLM) Terms

A training approach where a language model is fine-tuned using feedback from human raters who compare and rank model outputs. The human preferences are used to train a reward model, which then guides further training via reinforcement learning. RLHF has been central to making large language models more helpful, harmless, and honest.

External reference