The complete set of tokens a model knows and can work with, defined by its tokenizer. Tokens outside the vocabulary are either broken into smaller known pieces or replaced with a special unknown token. A larger vocabulary can represent text more precisely but requires more memory.