HuggingFace Chat

HuggingFace Inference Endpoints allow you to deploy and serve machine learning models in the cloud, making them accessible via an API.

Getting Started

Further details on HuggingFace Inference Endpoints can be found here.


Add the spring-ai-huggingface dependency:


You should get your HuggingFace API key and set it as an environment variable

export HUGGINGFACE_API_KEY=your_api_key_here
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Note, there is not yet a Spring Boot Starter for this client implementation.

Obtain the endpoint URL of the Inference Endpoint. You can find this on the Inference Endpoint’s UI here.

Making a call to the model

HuggingfaceChatClient client = new HuggingfaceChatClient(apiKey, basePath);
Prompt prompt = new Prompt("Your text here...");
ChatResponse response =;


Using the example found here

String mistral7bInstruct = """
        [INST] You are a helpful code assistant. Your task is to generate a valid JSON object based on the given information:
        name: John
        lastname: Smith
        address: #1 Samuel St.
        Just generate the JSON object without explanations:
Prompt prompt = new Prompt(mistral7bInstruct);
ChatResponse aiResponse =;

Will produce the output

    "name": "John",
    "lastname": "Smith",
    "address": "#1 Samuel St."