Back to glossaryExternal reference
AI GLOSSARY
Corpus
Data
A large, structured collection of text or other data used to train or evaluate an AI model. A corpus might consist of books, websites, scientific papers, or conversations. The broader and more diverse it is, the more the model can learn from it.
See also: training data, data preprocessing, pre-training.