AI GLOSSARY

Linear Probe

Research & Advanced Concepts

A simple interpretability technique where a linear classifier is trained on top of a neural network's internal representations to test whether a particular concept, such as sentiment, part of speech, or color, is encoded in those representations. If the probe achieves high accuracy, it suggests the network has learned to represent that concept, even if it was not explicitly trained to do so.