{"version":"1.0","type":"rich","provider_name":"gaks.ai AI Glossary","provider_url":"https://gaks.ai/glossary","title":"Mechanistic Interpretability — AI Glossary","author_name":"Glenn Katrud Solheim","author_url":"https://gaks.ai","width":600,"height":200,"html":"<div style=\"font-family:sans-serif;border:1px solid #e0e0e0;border-radius:8px;padding:16px;max-width:600px;background:#ffffff;color:#111111;\"><p style=\"margin:0 0 4px;font-size:11px;color:#666;\">AI Glossary — gaks.ai</p><h3 style=\"margin:0 0 8px;font-size:16px;\">Mechanistic Interpretability</h3><p style=\"margin:0 0 12px;font-size:14px;line-height:1.6;\">A research field that aims to reverse-engineer neural networks at a granular level, understanding not just what they can do, but precisely how they do it at the level of individual neurons, weights, and circuits. The term was coined by Chris Olah, and the field seeks to open the black box of deep learning with the goal of making AI systems more transparent, predictable, and safe. It is a core area of AI safety research.  See also: interpretability, circuit.</p><a href=\"https://gaks.ai/glossary/mechanistic-interpretability\" style=\"font-size:12px;color:#0077aa;\">Source: gaks.ai/glossary/mechanistic-interpretability →</a></div>"}