{"version":"1.0","type":"rich","provider_name":"gaks.ai AI Glossary","provider_url":"https://gaks.ai/glossary","title":"Chain-of-Thought Monitoring — AI Glossary","author_name":"Glenn Katrud Solheim","author_url":"https://gaks.ai","width":600,"height":200,"html":"<div style=\"font-family:sans-serif;border:1px solid #e0e0e0;border-radius:8px;padding:16px;max-width:600px;background:#ffffff;color:#111111;\"><p style=\"margin:0 0 4px;font-size:11px;color:#666;\">AI Glossary — gaks.ai</p><h3 style=\"margin:0 0 8px;font-size:16px;\">Chain-of-Thought Monitoring (CoT Monitoring)</h3><p style=\"margin:0 0 12px;font-size:14px;line-height:1.6;\">An AI safety oversight technique that inspects a reasoning model's externalized thinking trace (its Chain-of-Thought) for signs of deceptive intent, goal misalignment, or harmful reasoning before the model acts on it. Unlike input/output monitoring, CoT monitoring can catch explicit statements of harmful intent and surface flawed reasoning that would otherwise be invisible.  The technique exploits a structural property of current reasoning models: sufficiently complex cognition must pass through the chain of thought as working memory, making it readable to an external monitor. This is called *CoT monitorability*.  However, monitorability is considered fragile. It can be undermined by reinforcement learning at scale (reasoning drifts toward illegible representations), deliberate obfuscation by situationally-aware models, or architectures that keep reasoning in latent (non-verbalized) form entirely. CoT monitoring is therefore one imperfect layer among several needed safety mechanisms, not a sufficient safeguard on its own.</p><a href=\"https://gaks.ai/glossary/chain-of-thought-monitoring\" style=\"font-size:12px;color:#0077aa;\">Source: gaks.ai/glossary/chain-of-thought-monitoring →</a></div>"}