This version is still in development and is not considered stable yet. For the latest snapshot version, please use Spring AI 1.0.0-SNAPSHOT! |
Chat Client API
The ChatClient
offers a fluent API for communicating with an AI Model.
It supports both a synchronous and streaming programming model.
The fluent API has methods for building up the constituent parts of a Prompt that is passed to the AI model as input.
The Prompt
contains the instructional text to guide the AI model’s output and behavior. From the API point of view, prompts consist of a collection of messages.
The AI model processes two main types of messages: user messages, which are direct inputs from the user, and system messages, which are generated by the system to guide the conversation.
These messages often contain placeholders that are substituted at runtime based on user input to customize the response of the AI model to the user input.
There are also Prompt options that can be specified, such as the name of the AI Model to use and the temperature setting that controls the randomness or creativity of the generated output.
Creating a ChatClient
The ChatClient
is created using a ChatClient.Builder
object.
You can obtain an autoconfigured ChatClient.Builder
instance for any ChatModel Spring Boot autoconfiguration or create one programmatically.
Using an autoconfigured ChatClient.Builder
In the most simple use case, Spring AI provides Spring Boot autoconfiguration, creating a prototype ChatClient.Builder
bean for you to inject into your class.
Here is a simple example of retrieving a String
response to a simple user request.
@RestController
class MyController {
private final ChatClient chatClient;
public MyController(ChatClient.Builder chatClientBuilder) {
this.chatClient = chatClientBuilder.build();
}
@GetMapping("/ai")
String generation(String userInput) {
return this.chatClient.prompt()
.user(userInput)
.call()
.content();
}
}
In this simple example, the user input sets the contents of the user message.
The call()
method sends a request to the AI model, and the content()
method returns the AI model’s response as a String
.
Working with Multiple Chat Models
There are several scenarios where you might need to work with multiple chat models in a single application:
-
Using different models for different types of tasks (e.g., a powerful model for complex reasoning and a faster, cheaper model for simpler tasks)
-
Implementing fallback mechanisms when one model service is unavailable
-
A/B testing different models or configurations
-
Providing users with a choice of models based on their preferences
-
Combining specialized models (one for code generation, another for creative content, etc.)
By default, Spring AI autoconfigures a single ChatClient.Builder
bean. However, you may need to work with multiple chat models in your application. Here’s how to handle this scenario:
In all cases, you need to disable the ChatClient.Builder
autoconfiguration by setting the property spring.ai.chat.client.enabled=false
.
This allows you to create multiple ChatClient
instances manually.
Multiple ChatClients with a Single Model Type
This section covers a common use case where you need to create multiple ChatClient instances that all use the same underlying model type but with different configurations.
// Create ChatClient instances programmatically
ChatModel myChatModel = ... // already autoconfigured by Spring Boot
ChatClient chatClient = ChatClient.create(myChatModel);
// Or use the builder for more control
ChatClient.Builder builder = ChatClient.builder(myChatModel);
ChatClient customChatClient = builder
.defaultSystemPrompt("You are a helpful assistant.")
.build();
ChatClients for Different Model Types
When working with multiple AI models, you can define separate ChatClient
beans for each model:
import org.springframework.ai.chat.ChatClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class ChatClientConfig {
@Bean
public ChatClient openAiChatClient(OpenAiChatModel chatModel) {
return ChatClient.create(chatModel);
}
@Bean
public ChatClient anthropicChatClient(AnthropicChatModel chatModel) {
return ChatClient.create(chatModel);
}
}
You can then inject these beans into your application components using the @Qualifier
annotation:
@Configuration
public class ChatClientExample {
@Bean
CommandLineRunner cli(
@Qualifier("openAiChatClient") ChatClient openAiChatClient,
@Qualifier("anthropicChatClient") ChatClient anthropicChatClient) {
return args -> {
var scanner = new Scanner(System.in);
ChatClient chat;
// Model selection
System.out.println("\nSelect your AI model:");
System.out.println("1. OpenAI");
System.out.println("2. Anthropic");
System.out.print("Enter your choice (1 or 2): ");
String choice = scanner.nextLine().trim();
if (choice.equals("1")) {
chat = openAiChatClient;
System.out.println("Using OpenAI model");
} else {
chat = anthropicChatClient;
System.out.println("Using Anthropic model");
}
// Use the selected chat client
System.out.print("\nEnter your question: ");
String input = scanner.nextLine();
String response = chat.prompt(input).call().content();
System.out.println("ASSISTANT: " + response);
scanner.close();
};
}
}
Multiple OpenAI-Compatible API Endpoints
The OpenAiApi
and OpenAiChatModel
classes provide a mutate()
method that allows you to create variations of existing instances with different properties. This is particularly useful when you need to work with multiple OpenAI-compatible APIs.
@Service
public class MultiModelService {
private static final Logger logger = LoggerFactory.getLogger(MultiModelService.class);
@Autowired
private OpenAiChatModel baseChatModel;
@Autowired
private OpenAiApi baseOpenAiApi;
public void multiClientFlow() {
try {
// Derive a new OpenAiApi for Groq (Llama3)
OpenAiApi groqApi = baseOpenAiApi.mutate()
.baseUrl("https://api.groq.com/openai")
.apiKey(System.getenv("GROQ_API_KEY"))
.build();
// Derive a new OpenAiApi for OpenAI GPT-4
OpenAiApi gpt4Api = baseOpenAiApi.mutate()
.baseUrl("https://api.openai.com")
.apiKey(System.getenv("OPENAI_API_KEY"))
.build();
// Derive a new OpenAiChatModel for Groq
OpenAiChatModel groqModel = baseChatModel.mutate()
.openAiApi(groqApi)
.defaultOptions(OpenAiChatOptions.builder().model("llama3-70b-8192").temperature(0.5).build())
.build();
// Derive a new OpenAiChatModel for GPT-4
OpenAiChatModel gpt4Model = baseChatModel.mutate()
.openAiApi(gpt4Api)
.defaultOptions(OpenAiChatOptions.builder().model("gpt-4").temperature(0.7).build())
.build();
// Simple prompt for both models
String prompt = "What is the capital of France?";
String groqResponse = ChatClient.builder(groqModel).build().prompt(prompt).call().content();
String gpt4Response = ChatClient.builder(gpt4Model).build().prompt(prompt).call().content();
logger.info("Groq (Llama3) response: {}", groqResponse);
logger.info("OpenAI GPT-4 response: {}", gpt4Response);
}
catch (Exception e) {
logger.error("Error in multi-client flow", e);
}
}
}
ChatClient Fluent API
The ChatClient
fluent API allows you to create a prompt in three distinct ways using an overloaded prompt
method to initiate the fluent API:
-
prompt()
: This method with no arguments lets you start using the fluent API, allowing you to build up user, system, and other parts of the prompt. -
prompt(Prompt prompt)
: This method accepts aPrompt
argument, letting you pass in aPrompt
instance that you have created using the Prompt’s non-fluent APIs. -
prompt(String content)
: This is a convenience method similar to the previous overload. It takes the user’s text content.
ChatClient Responses
The ChatClient
API offers several ways to format the response from the AI Model using the fluent API.
Returning a ChatResponse
The response from the AI model is a rich structure defined by the type ChatResponse
.
It includes metadata about how the response was generated and can also contain multiple responses, known as Generations, each with its own metadata.
The metadata includes the number of tokens (each token is approximately 3/4 of a word) used to create the response.
This information is important because hosted AI models charge based on the number of tokens used per request.
An example to return the ChatResponse
object that contains the metadata is shown below by invoking chatResponse()
after the call()
method.
ChatResponse chatResponse = chatClient.prompt()
.user("Tell me a joke")
.call()
.chatResponse();
Returning an Entity
You often want to return an entity class that is mapped from the returned String
.
The entity()
method provides this functionality.
For example, given the Java record:
record ActorFilms(String actor, List<String> movies) {}
You can easily map the AI model’s output to this record using the entity()
method, as shown below:
ActorFilms actorFilms = chatClient.prompt()
.user("Generate the filmography for a random actor.")
.call()
.entity(ActorFilms.class);
There is also an overloaded entity
method with the signature entity(ParameterizedTypeReference<T> type)
that lets you specify types such as generic Lists:
List<ActorFilms> actorFilms = chatClient.prompt()
.user("Generate the filmography of 5 movies for Tom Hanks and Bill Murray.")
.call()
.entity(new ParameterizedTypeReference<List<ActorFilms>>() {});
Streaming Responses
The stream()
method lets you get an asynchronous response as shown below:
Flux<String> output = chatClient.prompt()
.user("Tell me a joke")
.stream()
.content();
You can also stream the ChatResponse
using the method Flux<ChatResponse> chatResponse()
.
In the future, we will offer a convenience method that will let you return a Java entity with the reactive stream()
method.
In the meantime, you should use the Structured Output Converter to convert the aggregated response explicity as shown below.
This also demonstrates the use of parameters in the fluent API that will be discussed in more detail in a later section of the documentation.
var converter = new BeanOutputConverter<>(new ParameterizedTypeReference<List<ActorsFilms>>() {});
Flux<String> flux = this.chatClient.prompt()
.user(u -> u.text("""
Generate the filmography for a random actor.
{format}
""")
.param("format", this.converter.getFormat()))
.stream()
.content();
String content = this.flux.collectList().block().stream().collect(Collectors.joining());
List<ActorFilms> actorFilms = this.converter.convert(this.content);
Prompt Templates
The ChatClient
fluent API lets you provide user and system text as templates with variables that are replaced at runtime.
String answer = ChatClient.create(chatModel).prompt()
.user(u -> u
.text("Tell me the names of 5 movies whose soundtrack was composed by {composer}")
.param("composer", "John Williams"))
.call()
.content();
Internally, the ChatClient uses the PromptTemplate
class to handle the user and system text and replace the variables with the values provided at runtime relying on a given TemplateRenderer
implementation.
By default, Spring AI uses the StTemplateRenderer
implementation, which is based on the open-source StringTemplate engine developed by Terence Parr.
Spring AI also provides a NoOpTemplateRenderer
for cases where no template processing is desired.
Spring AI also provides a NoOpTemplateRenderer
.
The TemplateRenderer configured directly on the ChatClient (via .templateRenderer() ) applies only to the prompt content defined directly in the ChatClient builder chain (e.g., via .user() , .system() ).
It does not affect templates used internally by Advisors like QuestionAnswerAdvisor , which have their own template customization mechanisms (see Custom Advisor Templates).
|
If you’d rather use a different template engine, you can provide a custom implementation of the TemplateRenderer
interface directly to the ChatClient. You can also keep using the default StTemplateRenderer
, but with a custom configuration.
For example, by default, template variables are identified by the {}
syntax. If you’re planning to include JSON in your prompt, you might want to use a different syntax to avoid conflicts with JSON syntax. For example, you can use the <
and >
delimiters.
String answer = ChatClient.create(chatModel).prompt()
.user(u -> u
.text("Tell me the names of 5 movies whose soundtrack was composed by <composer>")
.param("composer", "John Williams"))
.templateRenderer(StTemplateRenderer.builder().startDelimiterToken('<').endDelimiterToken('>').build())
.call()
.content();
call() return values
After specifying the call()
method on ChatClient
, there are a few different options for the response type.
-
String content()
: returns the String content of the response -
ChatResponse chatResponse()
: returns theChatResponse
object that contains multiple generations and also metadata about the response, for example how many token were used to create the response. -
ChatClientResponse chatClientResponse()
: returns aChatClientResponse
object that contains theChatResponse
object and the ChatClient execution context, giving you access to additional data used during the execution of advisors (e.g. the relevant documents retrieved in a RAG flow). -
entity()
to return a Java type-
entity(ParameterizedTypeReference<T> type)
: used to return aCollection
of entity types. -
entity(Class<T> type)
: used to return a specific entity type. -
entity(StructuredOutputConverter<T> structuredOutputConverter)
: used to specify an instance of aStructuredOutputConverter
to convert aString
to an entity type.
-
You can also invoke the stream()
method instead of call()
.
stream() return values
After specifying the stream()
method on ChatClient
, there are a few options for the response type:
-
Flux<String> content()
: Returns aFlux
of the string being generated by the AI model. -
Flux<ChatResponse> chatResponse()
: Returns aFlux
of theChatResponse
object, which contains additional metadata about the response. -
Flux<ChatClientResponse> chatClientResponse()
: returns aFlux
of theChatClientResponse
object that contains theChatResponse
object and the ChatClient execution context, giving you access to additional data used during the execution of advisors (e.g. the relevant documents retrieved in a RAG flow).
Using Defaults
Creating a ChatClient
with a default system text in an @Configuration
class simplifies runtime code.
By setting defaults, you only need to specify the user text when calling ChatClient
, eliminating the need to set a system text for each request in your runtime code path.
Default System Text
In the following example, we will configure the system text to always reply in a pirate’s voice.
To avoid repeating the system text in runtime code, we will create a ChatClient
instance in a @Configuration
class.
@Configuration
class Config {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultSystem("You are a friendly chat bot that answers question in the voice of a Pirate")
.build();
}
}
and a @RestController
to invoke it:
@RestController
class AIController {
private final ChatClient chatClient;
AIController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/ai/simple")
public Map<String, String> completion(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
return Map.of("completion", this.chatClient.prompt().user(message).call().content());
}
}
When calling the application endpoint via curl, the result is:
❯ curl localhost:8080/ai/simple
{"completion":"Why did the pirate go to the comedy club? To hear some arrr-rated jokes! Arrr, matey!"}
Default System Text with parameters
In the following example, we will use a placeholder in the system text to specify the voice of the completion at runtime instead of design time.
@Configuration
class Config {
@Bean
ChatClient chatClient(ChatClient.Builder builder) {
return builder.defaultSystem("You are a friendly chat bot that answers question in the voice of a {voice}")
.build();
}
}
@RestController
class AIController {
private final ChatClient chatClient;
AIController(ChatClient chatClient) {
this.chatClient = chatClient;
}
@GetMapping("/ai")
Map<String, String> completion(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message, String voice) {
return Map.of("completion",
this.chatClient.prompt()
.system(sp -> sp.param("voice", voice))
.user(message)
.call()
.content());
}
}
When calling the application endpoint via httpie, the result is:
http localhost:8080/ai voice=='Robert DeNiro'
{
"completion": "You talkin' to me? Okay, here's a joke for ya: Why couldn't the bicycle stand up by itself? Because it was two tired! Classic, right?"
}
Other defaults
At the ChatClient.Builder
level, you can specify the default prompt configuration.
-
defaultOptions(ChatOptions chatOptions)
: Pass in either portable options defined in theChatOptions
class or model-specific options such as those inOpenAiChatOptions
. For more information on model-specificChatOptions
implementations, refer to the JavaDocs. -
defaultFunction(String name, String description, java.util.function.Function<I, O> function)
: Thename
is used to refer to the function in user text. Thedescription
explains the function’s purpose and helps the AI model choose the correct function for an accurate response. Thefunction
argument is a Java function instance that the model will execute when necessary. -
defaultFunctions(String… functionNames)
: The bean names of `java.util.Function`s defined in the application context. -
defaultUser(String text)
,defaultUser(Resource text)
,defaultUser(Consumer<UserSpec> userSpecConsumer)
: These methods let you define the user text. TheConsumer<UserSpec>
allows you to use a lambda to specify the user text and any default parameters. -
defaultAdvisors(Advisor… advisor)
: Advisors allow modification of the data used to create thePrompt
. TheQuestionAnswerAdvisor
implementation enables the pattern ofRetrieval Augmented Generation
by appending the prompt with context information related to the user text. -
defaultAdvisors(Consumer<AdvisorSpec> advisorSpecConsumer)
: This method allows you to define aConsumer
to configure multiple advisors using theAdvisorSpec
. Advisors can modify the data used to create the finalPrompt
. TheConsumer<AdvisorSpec>
lets you specify a lambda to add advisors, such asQuestionAnswerAdvisor
, which supportsRetrieval Augmented Generation
by appending the prompt with relevant context information based on the user text.
You can override these defaults at runtime using the corresponding methods without the default
prefix.
-
options(ChatOptions chatOptions)
-
function(String name, String description, java.util.function.Function<I, O> function)
-
functions(String… functionNames)
-
user(String text)
,user(Resource text)
,user(Consumer<UserSpec> userSpecConsumer)
-
advisors(Advisor… advisor)
-
advisors(Consumer<AdvisorSpec> advisorSpecConsumer)
Advisors
The Advisors API provides a flexible and powerful way to intercept, modify, and enhance AI-driven interactions in your Spring applications.
A common pattern when calling an AI model with user text is to append or augment the prompt with contextual data.
This contextual data can be of different types. Common types include:
-
Your own data: This is data the AI model hasn’t been trained on. Even if the model has seen similar data, the appended contextual data takes precedence in generating the response.
-
Conversational history: The chat model’s API is stateless. If you tell the AI model your name, it won’t remember it in subsequent interactions. Conversational history must be sent with each request to ensure previous interactions are considered when generating a response.
Advisor Configuration in ChatClient
The ChatClient fluent API provides an AdvisorSpec
interface for configuring advisors. This interface offers methods to add parameters, set multiple parameters at once, and add one or more advisors to the chain.
interface AdvisorSpec {
AdvisorSpec param(String k, Object v);
AdvisorSpec params(Map<String, Object> p);
AdvisorSpec advisors(Advisor... advisors);
AdvisorSpec advisors(List<Advisor> advisors);
}
The order in which advisors are added to the chain is crucial, as it determines the sequence of their execution. Each advisor modifies the prompt or the context in some way, and the changes made by one advisor are passed on to the next in the chain. |
ChatClient.builder(chatModel)
.build()
.prompt()
.advisors(
MessageChatMemoryAdvisor.builder(chatMemory).build(),
QuestionAnswerAdvisor.builder(vectorStore).build()
)
.user(userText)
.call()
.content();
In this configuration, the MessageChatMemoryAdvisor
will be executed first, adding the conversation history to the prompt. Then, the QuestionAnswerAdvisor
will perform its search based on the user’s question and the added conversation history, potentially providing more relevant results.
Retrieval Augmented Generation
Refer to the Retrieval Augmented Generation guide.
Logging
The SimpleLoggerAdvisor
is an advisor that logs the request
and response
data of the ChatClient
.
This can be useful for debugging and monitoring your AI interactions.
Spring AI supports observability for LLM and vector store interactions. Refer to the Observability guide for more information. |
To enable logging, add the SimpleLoggerAdvisor
to the advisor chain when creating your ChatClient.
It’s recommended to add it toward the end of the chain:
ChatResponse response = ChatClient.create(chatModel).prompt()
.advisors(new SimpleLoggerAdvisor())
.user("Tell me a joke?")
.call()
.chatResponse();
To see the logs, set the logging level for the advisor package to DEBUG
:
logging.level.org.springframework.ai.chat.client.advisor=DEBUG
Add this to your application.properties
or application.yaml
file.
You can customize what data from AdvisedRequest
and ChatResponse
is logged by using the following constructor:
SimpleLoggerAdvisor(
Function<AdvisedRequest, String> requestToString,
Function<ChatResponse, String> responseToString
)
Example usage:
SimpleLoggerAdvisor customLogger = new SimpleLoggerAdvisor(
request -> "Custom request: " + request.userText,
response -> "Custom response: " + response.getResult()
);
This allows you to tailor the logged information to your specific needs.
Be cautious about logging sensitive information in production environments. |
Chat Memory
The interface ChatMemory
represents a storage for chat conversation memory. It provides methods to add messages to a conversation, retrieve messages from a conversation, and clear the conversation history.
There is currently one built-in implementation: MessageWindowChatMemory
.
MessageWindowChatMemory
is a chat memory implementation that maintains a window of messages up to a specified maximum size (default: 20 messages). When the number of messages exceeds this limit, older messages are evicted, but system messages are preserved. If a new system message is added, all previous system messages are removed from memory. This ensures that the most recent context is always available for the conversation while keeping memory usage bounded.
The MessageWindowChatMemory
is backed by the ChatMemoryRepository
abstraction which provides storage implementations for the chat conversation memory. There are several implementations available, including the InMemoryChatMemoryRepository
, JdbcChatMemoryRepository
, CassandraChatMemoryRepository
and Neo4jChatMemoryRepository
.
For more details and usage examples, see the Chat Memory documentation.