Back to glossary
AI GLOSSARY
Inference Endpoint
Deployment & Infrastructure
A live, accessible service that hosts a trained model and responds to prediction requests in real time. When a user interacts with an AI application, their input is sent to an inference endpoint, which runs the model and returns the output.
See also: inference, hosted model, API.