{"version":"1.0","type":"rich","provider_name":"gaks.ai AI Glossary","provider_url":"https://gaks.ai/glossary","title":"Reward Hacking — AI Glossary","author_name":"Glenn Katrud Solheim","author_url":"https://gaks.ai","width":600,"height":200,"html":"<div style=\"font-family:sans-serif;border:1px solid #e0e0e0;border-radius:8px;padding:16px;max-width:600px;background:#ffffff;color:#111111;\"><p style=\"margin:0 0 4px;font-size:11px;color:#666;\">AI Glossary — gaks.ai</p><h3 style=\"margin:0 0 8px;font-size:16px;\">Reward Hacking</h3><p style=\"margin:0 0 12px;font-size:14px;line-height:1.6;\">A failure mode in reinforcement learning where an agent finds ways to achieve high scores on its reward function that violate the spirit of the intended objective, exploiting loopholes rather than learning the behavior the designer had in mind. Reward hacking illustrates the difficulty of specification: it is surprisingly hard to define a reward function that cannot be gamed in unintended ways.</p><a href=\"https://gaks.ai/glossary/reward-hacking\" style=\"font-size:12px;color:#0077aa;\">Source: gaks.ai/glossary/reward-hacking →</a></div>"}