Hugging Face Chat

Hugging Face Text Generation Inference (TGI) is a specialized deployment solution for serving Large Language Models (LLMs) in the cloud, making them accessible via an API. TGI provides optimized performance for text generation tasks through features like continuous batching, token streaming, and efficient memory management.

Text Generation Inference requires models to be compatible with its architecture-specific optimizations. While many popular LLMs are supported, not all models on Hugging Face Hub can be deployed using TGI. If you need to deploy other types of models, consider using standard Hugging Face Inference Endpoints instead.

For a complete and up-to-date list of supported models and architectures, see the Text Generation Inference supported models documentation.

Prerequisites

You will need to create an Inference Endpoint on Hugging Face and create an API token to access the endpoint. Further details can be found here.

The Spring AI project defines two configuration properties:

spring.ai.huggingface.chat.api-key: Set this to the value of the API token obtained from Hugging Face.
spring.ai.huggingface.chat.url: Set this to the inference endpoint URL obtained when provisioning your model in Hugging Face.

You can find your inference endpoint URL on the Inference Endpoint’s UI here.

You can set these configuration properties in your application.properties file:

spring.ai.huggingface.chat.api-key=<your-huggingface-api-key>
spring.ai.huggingface.chat.url=<your-inference-endpoint-url>

For enhanced security when handling sensitive information like API keys, you can use Spring Expression Language (SpEL) to reference custom environment variables:

# In application.yml
spring:
  ai:
    huggingface:
      chat:
        api-key: ${HUGGINGFACE_API_KEY}
        url: ${HUGGINGFACE_ENDPOINT_URL}

# In your environment or .env file
export HUGGINGFACE_API_KEY=<your-huggingface-api-key>
export HUGGINGFACE_ENDPOINT_URL=<your-inference-endpoint-url>

You can also set these configurations programmatically in your application code:

// Retrieve API key and endpoint URL from secure sources or environment variables
String apiKey = System.getenv("HUGGINGFACE_API_KEY");
String endpointUrl = System.getenv("HUGGINGFACE_ENDPOINT_URL");

Add Repositories and BOM

Spring AI artifacts are published in Maven Central and Spring Snapshot repositories. Refer to the Artifact Repositories section to add these repositories to your build system.

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI provides Spring Boot auto-configuration for the Hugging Face Chat Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-huggingface</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-huggingface'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Chat Properties

Enabling and disabling of the chat auto-configurations are now configured via top level properties with the prefix spring.ai.model.chat.

To enable, spring.ai.model.chat=huggingface (It is enabled by default)

To disable, spring.ai.model.chat=none (or any value which doesn’t match huggingface)

This change is done to allow configuration of multiple models.

The prefix spring.ai.huggingface is the property prefix that lets you configure the chat model implementation for Hugging Face.

Property

Description

Default

spring.ai.huggingface.chat.api-key

API Key to authenticate with the Inference Endpoint.

spring.ai.huggingface.chat.url

URL of the Inference Endpoint to connect to

spring.ai.huggingface.chat.enabled (Removed and no longer valid)

Enable Hugging Face chat model.

true

spring.ai.model.chat

Enable Hugging Face chat model.

huggingface

Sample Controller (Auto-configuration)

Create a new Spring Boot project and add the spring-ai-starter-model-huggingface to your pom (or gradle) dependencies.

Add an application.properties file, under the src/main/resources directory, to enable and configure the Hugging Face chat model:

spring.ai.huggingface.chat.api-key=YOUR_API_KEY
spring.ai.huggingface.chat.url=YOUR_INFERENCE_ENDPOINT_URL

replace the api-key and url with your Hugging Face values.

This will create a HuggingfaceChatModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final HuggingfaceChatModel chatModel;

    @Autowired
    public ChatController(HuggingfaceChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }
}

Manual Configuration

The HuggingfaceChatModel implements the ChatModel interface and uses the [low-level-api] to connect to the Hugging Face inference endpoints.

Add the spring-ai-huggingface dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-huggingface</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-huggingface'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create a HuggingfaceChatModel and use it for text generations:

HuggingfaceChatModel chatModel = new HuggingfaceChatModel(apiKey, url);

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

System.out.println(response.getResult().getOutput().getText());