Back to glossary

AI GLOSSARY

Corpus

Data

A large, structured collection of text or other data used to train or evaluate an AI model. A corpus might consist of books, websites, scientific papers, or conversations. The broader and more diverse it is, the more the model can learn from it.
See also: training data, data preprocessing, pre-training.

External reference