Back to glossary

AI GLOSSARY

Inference Endpoint

Deployment & Infrastructure

A live, accessible service that hosts a trained model and responds to prediction requests in real time. When a user interacts with an AI application, their input is sent to an inference endpoint, which runs the model and returns the output.
See also: inference, hosted model, API.