Back to glossary
AI GLOSSARY
Batch Inference
Deployment & Infrastructure
Running a model on a large collection of inputs all at once, rather than processing each one as it arrives. Batch inference is more computationally efficient than real-time inference and is used for tasks that do not require an immediate response, such as processing a night's worth of transactions or pre-generating recommendations.
See also: inference, latency.