{"version":"1.0","type":"rich","provider_name":"gaks.ai AI Glossary","provider_url":"https://gaks.ai/glossary","title":"Activation Steering — AI Glossary","author_name":"Glenn Katrud Solheim","author_url":"https://gaks.ai","width":600,"height":200,"html":"<div style=\"font-family:sans-serif;border:1px solid #e0e0e0;border-radius:8px;padding:16px;max-width:600px;background:#ffffff;color:#111111;\"><p style=\"margin:0 0 4px;font-size:11px;color:#666;\">AI Glossary — gaks.ai</p><h3 style=\"margin:0 0 8px;font-size:16px;\">Activation Steering</h3><p style=\"margin:0 0 12px;font-size:14px;line-height:1.6;\">A technique for influencing a neural network's behavior by directly modifying its internal activations during inference, adding or subtracting vectors that correspond to specific concepts or behaviors. Rather than changing the model's weights or prompting it differently, activation steering intervenes at the mechanistic level, making it a valuable tool for interpretability and alignment research. Adding a vector is called activation addition; subtracting one is called activation subtraction.</p><a href=\"https://gaks.ai/glossary/activation-steering\" style=\"font-size:12px;color:#0077aa;\">Source: gaks.ai/glossary/activation-steering →</a></div>"}