Back to glossary

AI GLOSSARY

Activation Addition

Research & Advanced Concepts

A form of activation steering in which a concept vector is added to a model's internal activations during inference to amplify or introduce a target behaviour. The vector is typically derived by contrasting activations from prompts that do and do not exhibit the concept. For example, subtracting the 'sad' activation from the 'happy' activation yields a direction that, when added, steers the model toward happier outputs.

See also: Activation Steering, Activation Subtraction, Representation Engineering.

Being built