Back to glossary
AI GLOSSARY
Tokenizer
AI & Machine Learning
The component of a language model pipeline that converts raw text into tokens before the model processes it. Different models use different tokenizers with different vocabularies, which affects how they handle multilingual text, rare words, and special characters.