AI GLOSSARY

Transformer

Neural Network Architectures

The neural network architecture that underlies virtually all modern large language models, introduced in the landmark 2017 paper "Attention Is All You Need". Transformers process entire sequences in parallel using self-attention mechanisms, capturing long-range dependencies far more effectively than recurrent architectures. The transformer has since become the dominant architecture not just for language but increasingly for vision, audio, and multimodal AI.

External reference