AI GLOSSARY

Multi-Head Attention

Neural Network Architectures

An extension of the attention mechanism that runs multiple attention operations in parallel, each focusing on different aspects or relationships in the input, and combines their outputs. Multi-head attention allows the model to simultaneously attend to different types of information, such as syntactic structure and semantic meaning, and is a core building block of transformer architectures.

External reference