Hugging Face Chat

Hugging Face Text Generation Inference (TGI) is a specialized deployment solution for serving Large Language Models (LLMs) in the cloud, making them accessible via an API. TGI provides optimized performance for text generation tasks through features like continuous batching, token streaming, and efficient memory management.

Text Generation Inference requires models to be compatible with its architecture-specific optimizations. While many popular LLMs are supported, not all models on Hugging Face Hub can be deployed using TGI. If you need to deploy other types of models, consider using standard Hugging Face Inference Endpoints instead.
For a complete and up-to-date list of supported models and architectures, see the Text Generation Inference supported models documentation.

Prerequisites

You will need to create an Inference Endpoint on Hugging Face and create an API token to access the endpoint. Further details can be found here. The Spring AI project defines a configuration property named spring.ai.huggingface.chat.api-key that you should set to the value of the API token obtained from Hugging Face. There is also a configuration property named spring.ai.huggingface.chat.url that you should set to the inference endpoint URL obtained when provisioning your model in Hugging Face. You can find this on the Inference Endpoint’s UI here. Exporting environment variables is one way to set these configuration properties:

export SPRING_AI_HUGGINGFACE_CHAT_API_KEY=<INSERT KEY HERE>
export SPRING_AI_HUGGINGFACE_CHAT_URL=<INSERT INFERENCE ENDPOINT URL HERE>

Add Repositories and BOM

Spring AI artifacts are published in Spring Milestone and Snapshot repositories. Refer to the Repositories section to add these repositories to your build system.

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the Hugging Face Chat Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-huggingface-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-huggingface-spring-boot-starter'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Chat Properties

The prefix spring.ai.huggingface is the property prefix that lets you configure the chat model implementation for Hugging Face.

Property

Description

Default

spring.ai.huggingface.chat.api-key

API Key to authenticate with the Inference Endpoint.

-

spring.ai.huggingface.chat.url

URL of the Inference Endpoint to connect to

-

spring.ai.huggingface.chat.enabled

Enable Hugging Face chat model.

true

Sample Controller (Auto-configuration)

Create a new Spring Boot project and add the spring-ai-huggingface-spring-boot-starter to your pom (or gradle) dependencies.

Add an application.properties file, under the src/main/resources directory, to enable and configure the Hugging Face chat model:

spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL
replace the api-key and url with your Hugging Face values.

This will create a HuggingfaceChatModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final HuggingfaceChatModel chatModel;

    @Autowired
    public ChatController(HuggingfaceChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }
}

Manual Configuration

The HuggingfaceChatModel implements the ChatModel interface and uses the [low-level-api] to connect to the Hugging Face inference endpoints.

Add the spring-ai-huggingface dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-huggingface</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-huggingface'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create a HuggingfaceChatModel and use it for text generations:

HuggingfaceChatModel chatModel = new HuggingfaceChatModel(apiKey, url);

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

System.out.println(response.getGeneration().getResult().getOutput().getContent());