This version is still in development and is not considered stable yet. For the latest snapshot version, please use Spring AI 1.0.0-SNAPSHOT!

watsonx.ai Chat

With watsonx.ai you can run various Large Language Models (LLMs) locally and generate text from them. Spring AI supports the watsonx.ai text generation with WatsonxAiChatModel.

Prerequisites

You first need to have a SaaS instance of watsonx.ai (as well as an IBM Cloud account).

Refer to free-trial to try watsonx.ai for free

More info can be found here

Auto-configuration

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI provides Spring Boot auto-configuration for the watsonx.ai Chat Client. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
   <groupId>org.springframework.ai</groupId>
   <artifactId>spring-ai-starter-model-watsonx-ai</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-watsonx-ai'
}

Chat Properties

Connection Properties

The prefix spring.ai.watsonx.ai is used as the property prefix that lets you connect to watsonx.ai.

Property	Description	Default
spring.ai.watsonx.ai.base-url	The URL to connect to	us-south.ml.cloud.ibm.com
spring.ai.watsonx.ai.stream-endpoint	The streaming endpoint	ml/v1/text/generation_stream?version=2023-05-29
spring.ai.watsonx.ai.text-endpoint	The text endpoint	ml/v1/text/generation?version=2023-05-29
spring.ai.watsonx.ai.project-id	The project ID	-
spring.ai.watsonx.ai.iam-token	The IBM Cloud account IAM token	-

Property

Description

Default

spring.ai.watsonx.ai.base-url

The URL to connect to

us-south.ml.cloud.ibm.com

spring.ai.watsonx.ai.stream-endpoint

The streaming endpoint

ml/v1/text/generation_stream?version=2023-05-29

spring.ai.watsonx.ai.text-endpoint

The text endpoint

ml/v1/text/generation?version=2023-05-29

spring.ai.watsonx.ai.project-id

The project ID

spring.ai.watsonx.ai.iam-token

The IBM Cloud account IAM token

Configuration Properties

Enabling and disabling of the chat auto-configurations are now configured via top level properties with the prefix spring.ai.model.chat.

To enable, spring.ai.model.chat=watsonx (It is enabled by default)

To disable, spring.ai.model.chat=none (or any value which doesn’t match watsonx)

This change is done to allow configuration of multiple models.

The prefix spring.ai.watsonx.ai.chat is the property prefix that lets you configure the chat model implementation for Watsonx.AI.

Property	Description	Default
spring.ai.watsonx.ai.chat.enabled (Removed and no longer valid)	Enable Watsonx.AI chat model.	true
spring.ai.model.chat	Enable Watsonx.AI chat model.	watsonx
spring.ai.watsonx.ai.chat.options.temperature	The temperature of the model. Increasing the temperature will make the model answer more creatively.	0.7
spring.ai.watsonx.ai.chat.options.top-p	Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.2) will generate more focused and conservative text.	1.0
spring.ai.watsonx.ai.chat.options.top-k	Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.	50
spring.ai.watsonx.ai.chat.options.decoding-method	Decoding is the process that a model uses to choose the tokens in the generated output.	greedy
spring.ai.watsonx.ai.chat.options.max-new-tokens	Sets the limit of tokens that the LLM follow.	20
spring.ai.watsonx.ai.chat.options.min-new-tokens	Sets how many tokens must the LLM generate.	0
spring.ai.watsonx.ai.chat.options.stop-sequences	Sets when the LLM should stop. (e.g., ["\n\n\n"]) then when the LLM generates three consecutive line breaks it will terminate. Stop sequences are ignored until after the number of tokens that are specified in the Min tokens parameter are generated.	-
spring.ai.watsonx.ai.chat.options.repetition-penalty	Sets how strongly to penalize repetitions. A higher value (e.g., 1.8) will penalize repetitions more strongly, while a lower value (e.g., 1.1) will be more lenient.	1.0
spring.ai.watsonx.ai.chat.options.random-seed	Produce repeatable results, set the same random seed value every time.	randomly generated
spring.ai.watsonx.ai.chat.options.model	Model is the identifier of the LLM Model to be used.	google/flan-ul2

Property

Description

Default

spring.ai.watsonx.ai.chat.enabled (Removed and no longer valid)

Enable Watsonx.AI chat model.

true

spring.ai.model.chat

Enable Watsonx.AI chat model.

watsonx

spring.ai.watsonx.ai.chat.options.temperature

The temperature of the model. Increasing the temperature will make the model answer more creatively.

0.7

spring.ai.watsonx.ai.chat.options.top-p

Works together with top-k. A higher value (e.g., 0.95) will lead to more diverse text, while a lower value (e.g., 0.2) will generate more focused and conservative text.

1.0

spring.ai.watsonx.ai.chat.options.top-k

Reduces the probability of generating nonsense. A higher value (e.g. 100) will give more diverse answers, while a lower value (e.g. 10) will be more conservative.

spring.ai.watsonx.ai.chat.options.decoding-method

Decoding is the process that a model uses to choose the tokens in the generated output.

greedy

spring.ai.watsonx.ai.chat.options.max-new-tokens

Sets the limit of tokens that the LLM follow.

spring.ai.watsonx.ai.chat.options.min-new-tokens

Sets how many tokens must the LLM generate.

spring.ai.watsonx.ai.chat.options.stop-sequences

Sets when the LLM should stop. (e.g., ["\n\n\n"]) then when the LLM generates three consecutive line breaks it will terminate. Stop sequences are ignored until after the number of tokens that are specified in the Min tokens parameter are generated.

spring.ai.watsonx.ai.chat.options.repetition-penalty

Sets how strongly to penalize repetitions. A higher value (e.g., 1.8) will penalize repetitions more strongly, while a lower value (e.g., 1.1) will be more lenient.

1.0

spring.ai.watsonx.ai.chat.options.random-seed

Produce repeatable results, set the same random seed value every time.

randomly generated

spring.ai.watsonx.ai.chat.options.model

Model is the identifier of the LLM Model to be used.

google/flan-ul2

Runtime Options

The WatsonxAiChatOptions.java provides model configurations, such as the model to use, the temperature, the frequency penalty, etc.

On start-up, the default options can be configured with the WatsonxAiChatModel(api, options) constructor or the spring.ai.watsonxai.chat.options.* properties.

At run-time you can override the default options by adding new, request specific, options to the Prompt call. For example to override the default model and temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        WatsonxAiChatOptions.builder()
            .temperature(0.4)
        .build()
    ));

In addition to the model specific WatsonxAiChatOptions.java you can use a portable ChatOptions instance, created with the ChatOptionsBuilder#builder().

For more information go to watsonx-parameters-info

Usage example

public class MyClass {

    private static final String MODEL = "google/flan-ul2";
    private final WatsonxAiChatModel chatModel;

    @Autowired
    MyClass(WatsonxAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    public String generate(String userInput) {

        WatsonxAiChatOptions options = WatsonxAiChatOptions.builder()
            .model(MODEL)
            .decodingMethod("sample")
            .randomSeed(1)
            .build();

        Prompt prompt = new Prompt(new SystemMessage(userInput), options);

        var results = this.chatModel.call(prompt);

        var generatedText = results.getResult().getOutput().getContent();

        return generatedText;
    }

    public String generateStream(String userInput) {

        WatsonxAiChatOptions options = WatsonxAiChatOptions.builder()
            .model(MODEL)
            .decodingMethod("greedy")
            .randomSeed(2)
            .build();

        Prompt prompt = new Prompt(new SystemMessage(userInput), options);

        var results = this.chatModel.stream(prompt).collectList().block(); // wait till the stream is resolved (completed)

        var generatedText = results.stream()
            .map(generation -> generation.getResult().getOutput().getContent())
            .collect(Collectors.joining());

        return generatedText;
    }

}