Back to glossary

AI GLOSSARY

Batch Inference

Deployment & Infrastructure

Running a model on a large collection of inputs all at once, rather than processing each one as it arrives. Batch inference is more computationally efficient than real-time inference and is used for tasks that do not require an immediate response, such as processing a night's worth of transactions or pre-generating recommendations.
See also: inference, latency.