Chat Client API

The ChatClient offers a fluent API for communicating with an AI Model. It supports both a synchronous and reactive programming model.

The fluent API has methods for building up the constituent parts of a Prompt that is passed to the AI model as input. The Prompt contains the instructional text to guide the AI model’s output and behavior. From the API point of view, prompts consist of a collection of messages.

The AI model processes two main types of messages: user messages, which are direct inputs from the user, and system messages, which are generated by the system to guide the conversation.

These messages often contain placeholders that are substituted at runtime based on user input to customize the response of the AI model to the user input.

There are also Prompt options that can be specified, such as the name of the AI Model to use and the temperature setting that controls the randomness or creativity of the generated output.

Creating a ChatClient

The ChatClient is created using a ChatClient.Builder object. You can obtain an autoconfigured ChatClient.Builder instance for any ChatModel Spring Boot autoconfiguration or create one programmatically.

Using an autoconfigured ChatClient.Builder

In the most simple use case, Spring AI provides Spring Boot autoconfiguration, creating a prototype ChatClient.Builder bean for you to inject into your class. Here is a simple example of retrieving a String response to a simple user request.

@RestController
class MyController {

    private final ChatClient chatClient;

    public MyController(ChatClient.Builder chatClientBuilder) {
        this.chatClient = chatClientBuilder.build();
    }

    @GetMapping("/ai")
    String generation(String userInput) {
        return this.chatClient.prompt()
            .user(userInput)
            .call()
            .content();
    }
}

In this simple example, the user input sets the contents of the user message. The call method sends a request to the AI model, and the content method returns the AI model’s response as a String.

Create a ChatClient programmatically

You can disable the ChatClient.Builder autoconfiguration by setting the property spring.ai.chat.client.enabled=false. This is useful if multiple chat models are used together. Then create a ChatClient.Builder instance for for every ChatModel programmatically:

ChatModel myChatModel = ... // usually autowired

ChatClient.Builder builder = ChatClient.builder(myChatModel);

// or create a ChatClient with the default builder settings:

ChatClient chatClient = ChatClient.create(myChatModel);

ChatClient Responses

The ChatClient API offers several ways to format the response from the AI Model.

Returning a ChatResponse

The response from the AI model is a rich structure defined by the type ChatResponse. It includes metadata about how the response was generated and can also contain multiple responses, known as Generations, each with its own metadata. The metadata includes the number of tokens (each token is approximately 3/4 of a word) used to create the response. This information is important because hosted AI models charge based on the number of tokens used per request.

An example to return the ChatResponse object that contains the metadata is shown below by invoking chatResponse() after the call() method.

ChatResponse chatResponse = chatClient.prompt()
    .user("Tell me a joke")
    .call()
    .chatResponse();

Returning an Entity

You often want to return an entity class that is mapped from the returned String. The entity method provides this functionality.

For example, given the Java record:

record ActorFilms(String actor, List<String> movies) {
}

You can easily map the AI model’s output to this record using the entity method, as shown below:

ActorFilms actorFilms = chatClient.prompt()
    .user("Generate the filmography for a random actor.")
    .call()
    .entity(ActorFilms.class);

There is also an overloaded entity method with the signature entity(ParameterizedTypeReference<T> type) that lets you specify types such as generic Lists:

List<ActorFilms> actorFilms = chatClient.prompt()
    .user("Generate the filmography of 5 movies for Tom Hanks and Bill Murray.")
    .call()
    .entity(new ParameterizedTypeReference<List<ActorFilms>>() {
    });

Streaming Responses

The stream lets you get an asynchronous response as shown below

Flux<String> output = chatClient.prompt()
    .user("Tell me a joke")
    .stream()
    .content();

You can also stream the ChatResponse using the method Flux<ChatResponse> chatResponse().

In the 1.0.0 M2 we will offer a convenience method that will let you return an Java entity with the reactive stream() method. In the meantime, you should use the Structured Output Converter to convert the aggregated response explicity as shown below. This also demonstrates the use of parameters in the fluent API that will be discussed in more detail in a later section of the documentation.

    var converter = new BeanOutputConverter<>(new ParameterizedTypeReference<List<ActorsFilms>>() {
    });

    Flux<String> flux = this.chatClient.prompt()
        .user(u -> u.text("""
                            Generate the filmography for a random actor.
                            {format}
                          """)
                .param("format", converter.getFormat()))
        .stream()
        .content();

    String content = flux.collectList().block().stream().collect(Collectors.joining());

    List<ActorFilms> actorFilms = converter.convert(content);

call() return values

After specifying the call method on ChatClient there are a few different options for the response type.

  • String content(): returns the String content of the response

  • ChatResponse chatResponse(): returns the ChatResponse object that contains multiple generations and also metadata about the response, for example how many token were used to create the response.

  • entity to return a Java type

    • entity(ParameterizedTypeReference<T> type): used to return a Collection of entity types.

    • entity(Class<T> type): used to return a specific entity type.

    • entity(StructuredOutputConverter<T> structuredOutputConverter): used to specify an instance of a StructuredOutputConverter to convert a String to an entity type.

You can also invoke the stream method instead of call and

stream() return values

After specifying the stream method on ChatClient, there are a few options for the response type:

  • Flux<String> content(): Returns a Flux of the string being generated by the AI model.

  • Flux<ChatResponse> chatResponse(): Returns a Flux of the ChatResponse object, which contains additional metadata about the response.

Using Defaults

Creating a ChatClient with default system text in an @Configuration class simplifies runtime code. By setting defaults, you only need to specify user text when calling ChatClient, eliminating the need to set system text for each request in your runtime code path.

Default System Text

In the following example, we will configure the system text to always reply in a pirate’s voice. To avoid repeating the system text in runtime code, we will create a ChatClient instance in an @Configuration class.

@Configuration
class Config {

    @Bean
    ChatClient chatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("You are a friendly chat bot that answers question in the voice of a Pirate")
                .build();
    }

}

and an @RestController to invoke it

@RestController
class AIController {

	private final ChatClient chatClient;

	AIController(ChatClient chatClient) {
		this.chatClient = chatClient;
	}

	@GetMapping("/ai/simple")
	public Map<String, String> completion(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
		return Map.of("completion", chatClient.prompt().user(message).call().content());
	}
}

invoking it via curl gives

❯ curl localhost:8080/ai/simple
{"generation":"Why did the pirate go to the comedy club? To hear some arrr-rated jokes! Arrr, matey!"}

Default System Text with parameters

In the following example, we will use a placeholder in the system text to specify the voice of the completion at runtime instead of design time.

@Configuration
class Config {

    @Bean
    ChatClient chatClient(ChatClient.Builder builder) {
        return builder.defaultSystem("You are a friendly chat bot that answers question in the voice of a {voice}")
                .build();
    }

}
@RestController
class AIController {
	private final ChatClient chatClient
	AIController(ChatClient chatClient) {
		this.chatClient = chatClient;
	}
	@GetMapping("/ai")
	Map<String, String> completion(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message, String voice) {
		return Map.of(
				"completion",
				chatClient.prompt()
						.system(sp -> sp.param("voice", voice))
						.user(message)
						.call()
						.content());
	}
}

The response is

http localhost:8080/ai voice=='Robert DeNiro'
{
    "completion": "You talkin' to me? Okay, here's a joke for ya: Why couldn't the bicycle stand up by itself? Because it was two tired! Classic, right?"
}

Other defaults

At the ChatClient.Builder level, you can specify the default prompt.

  • defaultOptions(ChatOptions chatOptions): Pass in either portable options defined in the ChatOptions class or model-specific options such as those in OpenAiChatOptions. For more information on model-specific ChatOptions implementations, refer to the JavaDocs.

  • defaultFunction(String name, String description, java.util.function.Function<I, O> function): The name is used to refer to the function in user text. The description explains the function’s purpose and helps the AI model choose the correct function for an accurate response. The function argument is a Java function instance that the model will execute when necessary.

  • defaultFunctions(String…​ functionNames): The bean names of `java.util.Function`s defined in the application context.

  • defaultUser(String text), defaultUser(Resource text), defaultUser(Consumer<UserSpec> userSpecConsumer): These methods let you define the user text. The Consumer<UserSpec> allows you to use a lambda to specify the user text and any default parameters.

  • defaultAdvisors(RequestResponseAdvisor…​ advisor): Advisors allow modification of the data used to create the Prompt. The QuestionAnswerAdvisor implementation enables the pattern of Retrieval Augmented Generation by appending the prompt with context information related to the user text.

  • defaultAdvisors(Consumer<AdvisorSpec> advisorSpecConsumer): This method allows you to define a Consumer to configure multiple advisors using the AdvisorSpec. Advisors can modify the data used to create the final Prompt. The Consumer<AdvisorSpec> lets you specify a lambda to add advisors, such as QuestionAnswerAdvisor, which supports Retrieval Augmented Generation by appending the prompt with relevant context information based on the user text.

You can override these defaults at runtime using the corresponding methods without the default prefix.

  • options(ChatOptions chatOptions)

  • function(String name, String description, java.util.function.Function<I, O> function)

  • `functions(String…​ functionNames)

  • user(String text) , user(Resource text), user(Consumer<UserSpec> userSpecConsumer)

  • advisors(RequestResponseAdvisor…​ advisor)

  • advisors(Consumer<AdvisorSpec> advisorSpecConsumer)

Advisors

A common pattern when calling an AI model with user text is to append or augment the prompt with contextual data.

This contextual data can be of different types. Common types include:

  • Your own data: This is data the AI model hasn’t been trained on. Even if the model has seen similar data, the appended contextual data takes precedence in generating the response.

  • Conversational history: The chat model’s API is stateless. If you tell the AI model your name, it won’t remember it in subsequent interactions. Conversational history must be sent with each request to ensure previous interactions are considered when generating a response.

Retrieval Augmented Generation

A vector database stores data that the AI model is unaware of. When a user question is sent to the AI model, a QuestionAnswerAdvisor queries the vector database for documents related to the user question.

The response from the vector database is appended to the user text to provide context for the AI model to generate a response.

Assuming you have already loaded data into a VectorStore, you can perform Retrieval Augmented Generation (RAG) by providing an instance of QuestionAnswerAdvisor to the ChatClient.

ChatResponse response = ChatClient.builder(chatModel)
        .build().prompt()
        .advisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
        .user(userText)
        .call()
        .chatResponse();

Is this example, the SearchRequest.defaults() will perform a similarity search over all documents in the Vector Database. To restrict the types of documents that are searched, the SearchRequest takes a SQL like filter expression that is portable across all VectorStores.

Dynamic Filter Expressions

Update the SearchRequest filter expression at runtime using the FILTER_EXPRESSION advisor context parameter:

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()))
    .build();

// Update filter expression at runtime
String content = chatClient.prompt()
    .user("Please answer my question XYZ")
    .advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'Spring'"))
    .call()
    .content();

The FILTER_EXPRESSION parameter allows you to dynamically filter the search results based on the provided expression.

Chat Memory

The interface ChatMemory represents a storage for chat conversation history. It provides methods to add messages to a * conversation, retrieve messages from a conversation, and clear the conversation history.

There are two implementations InMemoryChatMemory and CassandraChatMemory that provides storage for chat conversation history, in-memory and persisted with time-to-live correspondingly.

To create a CassandraChatMemory with time-to-live

CassandraChatMemory.create(CassandraChatMemoryConfig.builder().withTimeToLive(Duration.ofDays(1)).build());

Two advisor implementations use the ChatMemory interface to advice the prompt with conversation history which differ in the details of how the memory is added to the prompt

  • MessageChatMemoryAdvisor : Memory is retrieved added as a collection of messages to the prompt

  • PromptChatMemoryAdvisor : Memory is retrieved added into the prompt’s system text.

  • VectorStoreChatMemoryAdvisor : The construtor ` VectorStoreChatMemoryAdvisor(VectorStore vectorStore, String defaultConversationId, int chatHistoryWindowSize)` lets you specify the VectorStore to retrieve the chat history from, the unqiue conversation ID, the size of the chat history to be retreived in token size.

A sample @Service implementation that uses several advisors is shown below

import static org.springframework.ai.chat.client.advisor.AbstractChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY;
import static org.springframework.ai.chat.client.advisor.AbstractChatMemoryAdvisor.CHAT_MEMORY_RETRIEVE_SIZE_KEY;

@Service
public class CustomerSupportAssistant {

    private final ChatClient chatClient;

    public CustomerSupportAssistant(ChatClient.Builder builder, VectorStore vectorStore, ChatMemory chatMemory) {

    this.chatClient = builder
            .defaultSystem("""
                    You are a customer chat support agent of an airline named "Funnair".", Respond in a friendly,
                    helpful, and joyful manner.

                    Before providing information about a booking or cancelling a booking, you MUST always
                    get the following information from the user: booking number, customer first name and last name.

                    Before changing a booking you MUST ensure it is permitted by the terms.

                    If there is a charge for the change, you MUST ask the user to consent before proceeding.
                    """)
            .defaultAdvisors(
                    new PromptChatMemoryAdvisor(chatMemory),
                    // new MessageChatMemoryAdvisor(chatMemory), // CHAT MEMORY
                    new QuestionAnswerAdvisor(vectorStore, SearchRequest.defaults()),
                    new LoggingAdvisor()) // RAG
            .defaultFunctions("getBookingDetails", "changeBooking", "cancelBooking") // FUNCTION CALLING
            .build();
}

public Flux<String> chat(String chatId, String userMessageContent) {

    return this.chatClient.prompt()
            .user(userMessageContent)
            .advisors(a -> a
                    .param(CHAT_MEMORY_CONVERSATION_ID_KEY, chatId)
                    .param(CHAT_MEMORY_RETRIEVE_SIZE_KEY, 100))
            .stream().content();
    }
}

Logging

The SimpleLoggerAdvisor is an advisor that logs the request and response data of the ChatClient. This can be useful for debugging and monitoring your AI interactions.

To enable logging, add the SimpleLoggerAdvisor to the advisor chain when creating your ChatClient. It’s recommended to add it toward the end of the chain:

ChatResponse response = ChatClient.create(chatModel).prompt()
        .advisors(new SimpleLoggerAdvisor())
        .user("Tell me a joke?")
        .call()
        .chatResponse();

To see the logs, set the logging level for the advisor package to DEBUG:

logging.level.org.springframework.ai.chat.client.advisor=DEBUG

Add this to your application.properties or application.yaml file.

You can customize what data from AdvisedRequest and ChatResponse is logged by using the following constructor:

SimpleLoggerAdvisor(
    Function<AdvisedRequest, String> requestToString,
    Function<ChatResponse, String> responseToString
)

Example usage:

javaCopySimpleLoggerAdvisor customLogger = new SimpleLoggerAdvisor(
    request -> "Custom request: " + request.userText,
    response -> "Custom response: " + response.getResult()
);

This allows you to tailor the logged information to your specific needs.

Be cautious about logging sensitive information in production environments.