AI GLOSSARY

Policy Gradient

Research & Advanced Concepts

A family of reinforcement learning algorithms that directly optimize the policy, the mapping from states to actions, by computing gradients of the expected reward with respect to the policy parameters. Policy gradient methods are particularly useful for problems with continuous action spaces or where the policy is represented by a neural network, and underlie many modern RL and reinforcement learning from human feedback approaches.

External reference