AI GLOSSARY

Bandit Problem

Research & Advanced Concepts

A classic decision-making problem in reinforcement learning where an agent must choose between multiple options, each with unknown reward probabilities, and learn from feedback over time. The name comes from the analogy of a gambler choosing between slot machines. Bandit problems formalize the exploration-exploitation tradeoff and have practical applications in recommendation systems, clinical trials, and online advertising.
See also: reinforcement learning, exploration-exploitation tradeoff.

External reference