Back to glossary

AI GLOSSARY

Tokenizer

AI & Machine Learning

The component of a language model pipeline that converts raw text into tokens before the model processes it. Different models use different tokenizers with different vocabularies, which affects how they handle multilingual text, rare words, and special characters.