{"version":"1.0","type":"rich","provider_name":"gaks.ai AI Glossary","provider_url":"https://gaks.ai/glossary","title":"Bandit Problem — AI Glossary","author_name":"Glenn Katrud Solheim","author_url":"https://gaks.ai","width":600,"height":200,"html":"<div style=\"font-family:sans-serif;border:1px solid #e0e0e0;border-radius:8px;padding:16px;max-width:600px;background:#ffffff;color:#111111;\"><p style=\"margin:0 0 4px;font-size:11px;color:#666;\">AI Glossary — gaks.ai</p><h3 style=\"margin:0 0 8px;font-size:16px;\">Bandit Problem</h3><p style=\"margin:0 0 12px;font-size:14px;line-height:1.6;\">A classic decision-making problem in reinforcement learning where an agent must choose between multiple options, each with unknown reward probabilities, and learn from feedback over time. The name comes from the analogy of a gambler choosing between slot machines. Bandit problems formalize the exploration-exploitation tradeoff and have practical applications in recommendation systems, clinical trials, and online advertising.  See also: reinforcement learning, exploration-exploitation tradeoff.</p><a href=\"https://gaks.ai/glossary/bandit-problem\" style=\"font-size:12px;color:#0077aa;\">Source: gaks.ai/glossary/bandit-problem →</a></div>"}