Google GenAI Chat

The Google GenAI API allows developers to build generative AI applications using Google’s Gemini models through either the Gemini Developer API or Vertex AI. The Google GenAI API supports multimodal prompts as input and outputs text or code. A multimodal model is capable of processing information from multiple modalities, including images, videos, and text. For example, you can send the model a photo of a plate of cookies and ask it to give you a recipe for those cookies.

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, all Gemini Pro models, up to and including the most recent Gemini 3 Pro.

This implementation provides two authentication modes:

Gemini Developer API: Use an API key for quick prototyping and development
Vertex AI: Use Google Cloud credentials for production deployments with enterprise features

Gemini API Reference

Prerequisites

Choose one of the following authentication methods:

Option 1: Gemini Developer API (API Key)

Obtain an API key from the Google AI Studio
Set the API key as an environment variable or in your application properties

Option 2: Vertex AI (Google Cloud)

Install the gcloud CLI, appropriate for your OS.
Authenticate by running the following command. Replace PROJECT_ID with your Google Cloud project ID and ACCOUNT with your Google Cloud username.

gcloud config set project <PROJECT_ID> &&
gcloud auth application-default login <ACCOUNT>

Auto-configuration

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI provides Spring Boot auto-configuration for the Google GenAI Chat Client. To enable it add the following dependency to your project’s Maven pom.xml or Gradle build.gradle build files:

Maven
Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-google-genai</artifactId>
</dependency>

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-google-genai'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Chat Properties

Enabling and disabling of the chat auto-configurations are now configured via top level properties with the prefix spring.ai.model.chat.

To enable, spring.ai.model.chat=google-genai (It is enabled by default)

To disable, spring.ai.model.chat=none (or any value which doesn’t match google-genai)

This change is done to allow configuration of multiple models.

Connection Properties

The prefix spring.ai.google.genai is used as the property prefix that lets you connect to Google GenAI.

Property Description Default

Property	Description	Default
spring.ai.model.chat	Enable Chat Model client	google-genai
spring.ai.google.genai.api-key	API key for Gemini Developer API. When provided, the client uses the Gemini Developer API instead of Vertex AI.	-
spring.ai.google.genai.project-id	Google Cloud Platform project ID (required for Vertex AI mode)	-
spring.ai.google.genai.location	Google Cloud region (required for Vertex AI mode)	-
spring.ai.google.genai.credentials-uri	URI to Google Cloud credentials. When provided it is used to create a `GoogleCredentials` instance for authentication.	-

spring.ai.model.chat

Enable Chat Model client

google-genai

spring.ai.google.genai.api-key

API key for Gemini Developer API. When provided, the client uses the Gemini Developer API instead of Vertex AI.

spring.ai.google.genai.project-id

Google Cloud Platform project ID (required for Vertex AI mode)

spring.ai.google.genai.location

Google Cloud region (required for Vertex AI mode)

spring.ai.google.genai.credentials-uri

URI to Google Cloud credentials. When provided it is used to create a GoogleCredentials instance for authentication.

Chat Model Properties

The prefix spring.ai.google.genai.chat is the property prefix that lets you configure the chat model implementation for Google GenAI Chat.

Property Description Default

Property	Description	Default
spring.ai.google.genai.chat.options.model	Supported Google GenAI Chat models to use include `gemini-2.0-flash`, `gemini-2.0-flash-lite`, `gemini-pro`, and `gemini-1.5-flash`.	gemini-2.0-flash
spring.ai.google.genai.chat.options.response-mime-type	Output response mimetype of the generated candidate text.	`text/plain`: (default) Text output or `application/json`: JSON response.
spring.ai.google.genai.chat.options.google-search-retrieval	Use Google search Grounding feature	`true` or `false`, default `false`.
spring.ai.google.genai.chat.options.temperature	Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the generative.	0.7
spring.ai.google.genai.chat.options.top-k	The maximum number of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Top-k sampling considers the set of topK most probable tokens.	-
spring.ai.google.genai.chat.options.top-p	The maximum cumulative probability of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.	-
spring.ai.google.genai.chat.options.candidate-count	The number of generated response messages to return. This value must be between [1, 8], inclusive. Defaults to 1.	1
spring.ai.google.genai.chat.options.max-output-tokens	The maximum number of tokens to generate.	-
spring.ai.google.genai.chat.options.frequency-penalty	Frequency penalties for reducing repetition.	-
spring.ai.google.genai.chat.options.presence-penalty	Presence penalties for reducing repetition.	-
spring.ai.google.genai.chat.options.thinking-budget	Thinking budget for the thinking process. See Thinking Configuration.	-
spring.ai.google.genai.chat.options.thinking-level	The level of thinking tokens the model should generate. Valid values: `LOW`, `HIGH`, `THINKING_LEVEL_UNSPECIFIED`. See Thinking Configuration.	-
spring.ai.google.genai.chat.options.include-thoughts	Enable thought signatures for function calling. Required for Gemini 3 Pro to avoid validation errors during the internal tool execution loop. See Thought Signatures.	false
spring.ai.google.genai.chat.options.tool-names	List of tools, identified by their names, to enable for function calling in a single prompt request. Tools with those names must exist in the ToolCallback registry.	-
spring.ai.google.genai.chat.options.tool-callbacks	Tool Callbacks to register with the ChatModel.	-
spring.ai.google.genai.chat.options.internal-tool-execution-enabled	If true, the tool execution should be performed, otherwise the response from the model is returned back to the user. Default is null, but if it’s null, `ToolCallingChatOptions.DEFAULT_TOOL_EXECUTION_ENABLED` which is true will take into account	-
spring.ai.google.genai.chat.options.safety-settings	List of safety settings to control safety filters, as defined by Google GenAI Safety Settings. Each safety setting can have a method, threshold, and category.	-
spring.ai.google.genai.chat.options.cached-content-name	The name of cached content to use for this request. When set along with `use-cached-content=true`, the cached content will be used as context. See Cached Content.	-
spring.ai.google.genai.chat.options.use-cached-content	Whether to use cached content if available. When true and `cached-content-name` is set, the system will use the cached content.	false
spring.ai.google.genai.chat.options.auto-cache-threshold	Automatically cache prompts that exceed this token threshold. When set, prompts larger than this value will be automatically cached for reuse. Set to null to disable auto-caching.	-
spring.ai.google.genai.chat.options.auto-cache-ttl	Time-to-live (Duration) for auto-cached content in ISO-8601 format (e.g., `PT1H` for 1 hour). Used when auto-caching is enabled.	PT1H
spring.ai.google.genai.chat.enable-cached-content	Enable the `GoogleGenAiCachedContentService` bean for managing cached content.	true

spring.ai.google.genai.chat.options.model

Supported Google GenAI Chat models to use include gemini-2.0-flash, gemini-2.0-flash-lite, gemini-pro, and gemini-1.5-flash.

gemini-2.0-flash

spring.ai.google.genai.chat.options.response-mime-type

Output response mimetype of the generated candidate text.

text/plain: (default) Text output or application/json: JSON response.

spring.ai.google.genai.chat.options.google-search-retrieval

Use Google search Grounding feature

true or false, default false.

spring.ai.google.genai.chat.options.temperature

Controls the randomness of the output. Values can range over [0.0,1.0], inclusive. A value closer to 1.0 will produce responses that are more varied, while a value closer to 0.0 will typically result in less surprising responses from the generative.

0.7

spring.ai.google.genai.chat.options.top-k

The maximum number of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Top-k sampling considers the set of topK most probable tokens.

spring.ai.google.genai.chat.options.top-p

The maximum cumulative probability of tokens to consider when sampling. The generative uses combined Top-k and nucleus sampling. Nucleus sampling considers the smallest set of tokens whose probability sum is at least topP.

spring.ai.google.genai.chat.options.candidate-count

The number of generated response messages to return. This value must be between [1, 8], inclusive. Defaults to 1.

spring.ai.google.genai.chat.options.max-output-tokens

The maximum number of tokens to generate.

spring.ai.google.genai.chat.options.frequency-penalty

Frequency penalties for reducing repetition.

spring.ai.google.genai.chat.options.presence-penalty

Presence penalties for reducing repetition.

spring.ai.google.genai.chat.options.thinking-budget

Thinking budget for the thinking process. See Thinking Configuration.

spring.ai.google.genai.chat.options.thinking-level

The level of thinking tokens the model should generate. Valid values: LOW, HIGH, THINKING_LEVEL_UNSPECIFIED. See Thinking Configuration.

spring.ai.google.genai.chat.options.include-thoughts

Enable thought signatures for function calling. Required for Gemini 3 Pro to avoid validation errors during the internal tool execution loop. See Thought Signatures.

false

spring.ai.google.genai.chat.options.tool-names

List of tools, identified by their names, to enable for function calling in a single prompt request. Tools with those names must exist in the ToolCallback registry.

spring.ai.google.genai.chat.options.tool-callbacks

Tool Callbacks to register with the ChatModel.

spring.ai.google.genai.chat.options.internal-tool-execution-enabled

If true, the tool execution should be performed, otherwise the response from the model is returned back to the user. Default is null, but if it’s null, ToolCallingChatOptions.DEFAULT_TOOL_EXECUTION_ENABLED which is true will take into account

spring.ai.google.genai.chat.options.safety-settings

List of safety settings to control safety filters, as defined by Google GenAI Safety Settings. Each safety setting can have a method, threshold, and category.

spring.ai.google.genai.chat.options.cached-content-name

The name of cached content to use for this request. When set along with use-cached-content=true, the cached content will be used as context. See Cached Content.

spring.ai.google.genai.chat.options.use-cached-content

Whether to use cached content if available. When true and cached-content-name is set, the system will use the cached content.

false

spring.ai.google.genai.chat.options.auto-cache-threshold

Automatically cache prompts that exceed this token threshold. When set, prompts larger than this value will be automatically cached for reuse. Set to null to disable auto-caching.

spring.ai.google.genai.chat.options.auto-cache-ttl

Time-to-live (Duration) for auto-cached content in ISO-8601 format (e.g., PT1H for 1 hour). Used when auto-caching is enabled.

PT1H

spring.ai.google.genai.chat.enable-cached-content

Enable the GoogleGenAiCachedContentService bean for managing cached content.

true

All properties prefixed with spring.ai.google.genai.chat.options can be overridden at runtime by adding a request specific Runtime options to the Prompt call.

Runtime options

The GoogleGenAiChatOptions.java provides model configurations, such as the temperature, the topK, etc.

On start-up, the default options can be configured with the GoogleGenAiChatModel(client, options) constructor or the spring.ai.google.genai.chat.options.* properties.

At runtime, you can override the default options by adding new, request specific, options to the Prompt call. For example, to override the default temperature for a specific request:

ChatResponse response = chatModel.call(
    new Prompt(
        "Generate the names of 5 famous pirates.",
        GoogleGenAiChatOptions.builder()
            .temperature(0.4)
        .build()
    ));

In addition to the model specific GoogleGenAiChatOptions you can use a portable ChatOptions instance, created with the ChatOptions#builder().

Tool Calling

The Google GenAI model supports tool calling (function calling) capabilities, allowing models to use tools during conversations. Here’s an example of how to define and use @Tool-based tools:

public class WeatherService {

    @Tool(description = "Get the weather in location")
    public String weatherByLocation(@ToolParam(description= "City or state name") String location) {
        ...
    }
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .tools(new WeatherService())
        .call()
        .content();

You can use the java.util.function beans as tools as well:

@Bean
@Description("Get the weather in location. Return temperature in 36°F or 36°C format.")
public Function<Request, Response> weatherFunction() {
    return new MockWeatherService();
}

String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .toolNames("weatherFunction")
        .inputType(Request.class)
        .call()
        .content();

Find more in Tools documentation.

Thinking Configuration

Gemini models support a "thinking" capability that allows the model to perform deeper reasoning before generating responses. This is controlled through the ThinkingConfig which includes three related options: thinkingBudget, thinkingLevel, and includeThoughts.

Thinking Level

The thinkingLevel option controls the depth of reasoning tokens the model generates. This is available for models that support thinking (e.g., Gemini 3 Pro Preview).

Value Description

Value	Description
`LOW`	Minimal thinking. Use for simple queries where speed is preferred over deep analysis.
`HIGH`	Extensive thinking. Use for complex problems requiring deep analysis and step-by-step reasoning.
`THINKING_LEVEL_UNSPECIFIED`	The model uses its default behavior.

LOW

Minimal thinking. Use for simple queries where speed is preferred over deep analysis.

HIGH

Extensive thinking. Use for complex problems requiring deep analysis and step-by-step reasoning.

THINKING_LEVEL_UNSPECIFIED

The model uses its default behavior.

Configuration via Properties

spring.ai.google.genai.chat.options.model=gemini-3-pro-preview
spring.ai.google.genai.chat.options.thinking-level=HIGH

Programmatic Configuration

import org.springframework.ai.google.genai.common.GoogleGenAiThinkingLevel;

ChatResponse response = chatModel.call(
    new Prompt(
        "Explain the theory of relativity in simple terms.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .thinkingLevel(GoogleGenAiThinkingLevel.HIGH)
            .build()
    ));

Thinking Budget

The thinkingBudget option sets a token budget for the thinking process:

Positive value: Maximum number of tokens for thinking (e.g., 8192)
Zero (0): Disables thinking entirely
Not set: Model decides automatically based on query complexity

ChatResponse response = chatModel.call(
    new Prompt(
        "Solve this complex math problem step by step.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-2.5-pro")
            .thinkingBudget(8192)
            .build()
    ));

Option Compatibility

thinkingLevel and thinkingBudget are mutually exclusive. You cannot use both in the same request - doing so will result in an API error.

Use thinkingLevel (LOW, HIGH) for Gemini 3 Pro models
Use thinkingBudget (token count) for Gemini 2.5 series models

You can combine includeThoughts with either thinkingLevel or thinkingBudget (but not both):

// For Gemini 3 Pro: use thinkingLevel + includeThoughts
ChatResponse response = chatModel.call(
    new Prompt(
        "Analyze this complex scenario.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .thinkingLevel(GoogleGenAiThinkingLevel.HIGH)
            .includeThoughts(true)
            .build()
    ));

// For Gemini 2.5: use thinkingBudget + includeThoughts
ChatResponse response = chatModel.call(
    new Prompt(
        "Analyze this complex scenario.",
        GoogleGenAiChatOptions.builder()
            .model("gemini-2.5-pro")
            .thinkingBudget(8192)
            .includeThoughts(true)
            .build()
    ));

Model Support

The thinking configuration options are model-specific:

Model thinkingLevel thinkingBudget Notes

Model	thinkingLevel	thinkingBudget	Notes
Gemini 3 Pro (Preview)	✅ Supported	⚠️ Backwards compatible only	Use `thinkingLevel`. Cannot disable thinking. Requires global endpoint.
Gemini 2.5 Pro	❌ Not supported	✅ Supported	Use `thinkingBudget`. Set to 0 to disable, -1 for dynamic.
Gemini 2.5 Flash	❌ Not supported	✅ Supported	Use `thinkingBudget`. Set to 0 to disable, -1 for dynamic.
Gemini 2.5 Flash-Lite	❌ Not supported	✅ Supported	Thinking disabled by default. Set `thinkingBudget` to enable.
Gemini 2.0 Flash	❌ Not supported	❌ Not supported	Thinking not available.

Gemini 3 Pro (Preview)

✅ Supported

⚠️ Backwards compatible only

Use thinkingLevel. Cannot disable thinking. Requires global endpoint.

Gemini 2.5 Pro

❌ Not supported

✅ Supported

Use thinkingBudget. Set to 0 to disable, -1 for dynamic.

Gemini 2.5 Flash

❌ Not supported

✅ Supported

Use thinkingBudget. Set to 0 to disable, -1 for dynamic.

Gemini 2.5 Flash-Lite

❌ Not supported

✅ Supported

Thinking disabled by default. Set thinkingBudget to enable.

Gemini 2.0 Flash

❌ Not supported

Thinking not available.

Using thinkingLevel with unsupported models (e.g., Gemini 2.5 or earlier) will result in an API error.
Gemini 3 Pro Preview is only available on global endpoints. Set spring.ai.google.genai.location=global or GOOGLE_CLOUD_LOCATION=global.
Check the Google GenAI Thinking documentation for the latest model capabilities.

Enabling thinking features increases token usage and API costs. Use appropriately based on the complexity of your queries.

Thought Signatures

Gemini 3 Pro introduces thought signatures, which are opaque byte arrays that preserve the model’s reasoning context during function calling. When includeThoughts is enabled, the model returns thought signatures that must be passed back within the same turn during the internal tool execution loop.

When Thought Signatures Matter

IMPORTANT: Thought signature validation only applies to the current turn - specifically during the internal tool execution loop when the model makes function calls (both parallel and sequential). The API does not validate thought signatures for previous turns in conversation history.

Per Google’s documentation:

Validation is enforced for function calls within the current turn only
Previous turn signatures do not need to be preserved
Missing signatures in the current turn’s function calls result in HTTP 400 errors for Gemini 3 Pro
For parallel function calls, only the first functionCall part carries the signature

For Gemini 2.5 Pro and earlier models, thought signatures are optional and the API is lenient.

Configuration

Enable thought signatures using configuration properties:

spring.ai.google.genai.chat.options.model=gemini-3-pro-preview
spring.ai.google.genai.chat.options.include-thoughts=true

Or programmatically at runtime:

ChatResponse response = chatModel.call(
    new Prompt(
        "Your question here",
        GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .includeThoughts(true)
            .toolCallbacks(callbacks)
            .build()
    ));

Automatic Handling

Spring AI automatically handles thought signatures during the internal tool execution loop. When internalToolExecutionEnabled is true (the default), Spring AI:

Extracts thought signatures from model responses
Attaches them to the correct functionCall parts when sending back function responses
Propagates them correctly during function calls within a single turn (both parallel and sequential)

You don’t need to manually manage thought signatures - Spring AI ensures they are properly attached to functionCall parts as required by the API specification.

Example with Function Calling

@Bean
@Description("Get the weather in a location")
public Function<WeatherRequest, WeatherResponse> weatherFunction() {
    return new WeatherService();
}

// Enable includeThoughts for Gemini 3 Pro with function calling
String response = ChatClient.create(this.chatModel)
        .prompt("What's the weather like in Boston?")
        .options(GoogleGenAiChatOptions.builder()
            .model("gemini-3-pro-preview")
            .includeThoughts(true)
            .build())
        .toolNames("weatherFunction")
        .call()
        .content();

Manual Tool Execution Mode

If you set internalToolExecutionEnabled=false to manually control the tool execution loop, you must handle thought signatures yourself when using Gemini 3 Pro with includeThoughts=true.

Requirements for manual tool execution with thought signatures:

Extract thought signatures from the response metadata:

AssistantMessage assistantMessage = response.getResult().getOutput();
Map<String, Object> metadata = assistantMessage.getMetadata();
List<byte[]> thoughtSignatures = (List<byte[]>) metadata.get("thoughtSignatures");

When sending back function responses, include the original AssistantMessage with its metadata intact in your message history. Spring AI will automatically attach the thought signatures to the correct functionCall parts.
For Gemini 3 Pro, failing to preserve thought signatures during the current turn will result in HTTP 400 errors from the API.

Only the current turn’s function calls require thought signatures. When starting a new conversation turn (after completing a function calling round), you do not need to preserve the previous turn’s signatures.

Enabling includeThoughts increases token usage as thought processes are included in responses. This impacts API costs but provides better reasoning transparency.

Multimodal

Multimodality refers to a model’s ability to simultaneously understand and process information from various (input) sources, including text, pdf, images, audio, and other data formats.

Image, Audio, Video

Google’s Gemini AI models support this capability by comprehending and integrating text, code, audio, images, and video. For more details, refer to the blog post Introducing Gemini.

Spring AI’s Message interface supports multimodal AI models by introducing the Media type. This type contains data and information about media attachments in messages, using Spring’s org.springframework.util.MimeType and a java.lang.Object for the raw media data.

Below is a simple code example extracted from GoogleGenAiChatModelIT.java, demonstrating the combination of user text with an image.

byte[] data = new ClassPathResource("/vertex-test.png").getContentAsByteArray();

var userMessage = UserMessage.builder()
			.text("Explain what do you see o this picture?")
			.media(List.of(new Media(MimeTypeUtils.IMAGE_PNG, data)))
			.build();

ChatResponse response = chatModel.call(new Prompt(List.of(this.userMessage)));

PDF

Google GenAI provides support for PDF input types. Use the application/pdf media type to attach a PDF file to the message:

var pdfData = new ClassPathResource("/spring-ai-reference-overview.pdf");

var userMessage = UserMessage.builder()
			.text("You are a very professional document summarization specialist. Please summarize the given document.")
			.media(List.of(new Media(new MimeType("application", "pdf"), pdfData)))
			.build();

var response = this.chatModel.call(new Prompt(List.of(userMessage)));

Cached Content

Google GenAI’s Context Caching allows you to cache large amounts of content (such as long documents, code repositories, or media) and reuse it across multiple requests. This significantly reduces API costs and improves response latency for repeated queries on the same content.

Benefits

Cost Reduction: Cached tokens are billed at a much lower rate than regular input tokens (typically 75-90% cheaper)
Improved Performance: Reusing cached content reduces processing time for large contexts
Consistency: Same cached context ensures consistent responses across multiple requests

Cache Requirements

Minimum cache size: 32,768 tokens (approximately 25,000 words)
Maximum cache duration: 1 hour by default (configurable via TTL)
Cached content must include either system instructions or conversation history

Using Cached Content Service

Spring AI provides GoogleGenAiCachedContentService for programmatic cache management. The service is automatically configured when using the Spring Boot auto-configuration.

Creating Cached Content

@Autowired
private GoogleGenAiCachedContentService cachedContentService;

// Create cached content with a large document
String largeDocument = "... your large context here (>32k tokens) ...";

CachedContentRequest request = CachedContentRequest.builder()
    .model("gemini-2.0-flash")
    .contents(List.of(
        Content.builder()
            .role("user")
            .parts(List.of(Part.fromText(largeDocument)))
            .build()
    ))
    .displayName("My Large Document Cache")
    .ttl(Duration.ofHours(1))
    .build();

GoogleGenAiCachedContent cachedContent = cachedContentService.create(request);
String cacheName = cachedContent.getName(); // Save this for reuse

Using Cached Content in Chat Requests

Once you’ve created cached content, reference it in your chat requests:

ChatResponse response = chatModel.call(
    new Prompt(
        "Summarize the key points from the document",
        GoogleGenAiChatOptions.builder()
            .useCachedContent(true)
            .cachedContentName(cacheName) // Use the cached content name
            .build()
    ));

Or via configuration properties:

spring.ai.google.genai.chat.options.use-cached-content=true
spring.ai.google.genai.chat.options.cached-content-name=cachedContent/your-cache-name

Managing Cached Content

The GoogleGenAiCachedContentService provides comprehensive cache management:

// Retrieve cached content
GoogleGenAiCachedContent content = cachedContentService.get(cacheName);

// Update cache TTL
CachedContentUpdateRequest updateRequest = CachedContentUpdateRequest.builder()
    .ttl(Duration.ofHours(2))
    .build();
GoogleGenAiCachedContent updated = cachedContentService.update(cacheName, updateRequest);

// List all cached content
List<GoogleGenAiCachedContent> allCaches = cachedContentService.listAll();

// Delete cached content
boolean deleted = cachedContentService.delete(cacheName);

// Extend cache TTL
GoogleGenAiCachedContent extended = cachedContentService.extendTtl(cacheName, Duration.ofMinutes(30));

// Cleanup expired caches
int removedCount = cachedContentService.cleanupExpired();

Asynchronous Operations

All operations have asynchronous variants:

CompletableFuture<GoogleGenAiCachedContent> futureCache =
    cachedContentService.createAsync(request);

CompletableFuture<GoogleGenAiCachedContent> futureGet =
    cachedContentService.getAsync(cacheName);

CompletableFuture<Boolean> futureDelete =
    cachedContentService.deleteAsync(cacheName);

Auto-Caching

Spring AI can automatically cache large prompts when they exceed a specified token threshold:

# Automatically cache prompts larger than 100,000 tokens
spring.ai.google.genai.chat.options.auto-cache-threshold=100000
# Set auto-cache TTL to 1 hour
spring.ai.google.genai.chat.options.auto-cache-ttl=PT1H

Or programmatically:

ChatResponse response = chatModel.call(
    new Prompt(
        largePrompt,
        GoogleGenAiChatOptions.builder()
            .autoCacheThreshold(100000)
            .autoCacheTtl(Duration.ofHours(1))
            .build()
    ));

Auto-caching is useful for one-time large contexts. For repeated use of the same context, manually creating and referencing cached content is more efficient.

Monitoring Cache Usage

Cached content includes usage metadata accessible via the service:

GoogleGenAiCachedContent content = cachedContentService.get(cacheName);

// Check if cache is expired
boolean expired = content.isExpired();

// Get remaining TTL
Duration remaining = content.getRemainingTtl();

// Get usage metadata
CachedContentUsageMetadata metadata = content.getUsageMetadata();
if (metadata != null) {
    System.out.println("Total tokens: " + metadata.totalTokenCount().orElse(0));
}

Best Practices

Cache Lifetime: Set appropriate TTL based on your use case. Shorter TTLs for frequently changing content, longer for static content.
Cache Naming: Use descriptive display names to identify cached content easily.
Cleanup: Periodically clean up expired caches to maintain organization.
Token Threshold: Only cache content that exceeds the minimum threshold (32,768 tokens).
Cost Optimization: Reuse cached content across multiple requests to maximize cost savings.

Configuration Example

Complete configuration example:

# Enable cached content service (enabled by default)
spring.ai.google.genai.chat.enable-cached-content=true

# Use a specific cached content
spring.ai.google.genai.chat.options.use-cached-content=true
spring.ai.google.genai.chat.options.cached-content-name=cachedContent/my-cache-123

# Auto-caching configuration
spring.ai.google.genai.chat.options.auto-cache-threshold=50000
spring.ai.google.genai.chat.options.auto-cache-ttl=PT30M

Sample Controller

Create a new Spring Boot project and add the spring-ai-starter-model-google-genai to your pom (or gradle) dependencies.

Add a application.properties file, under the src/main/resources directory, to enable and configure the Google GenAI chat model:

Using Gemini Developer API (API Key)

spring.ai.google.genai.api-key=YOUR_API_KEY
spring.ai.google.genai.chat.options.model=gemini-2.0-flash
spring.ai.google.genai.chat.options.temperature=0.5

Using Vertex AI

spring.ai.google.genai.project-id=PROJECT_ID
spring.ai.google.genai.location=LOCATION
spring.ai.google.genai.chat.options.model=gemini-2.0-flash
spring.ai.google.genai.chat.options.temperature=0.5

Replace the project-id with your Google Cloud Project ID and location is Google Cloud Region like us-central1, europe-west1, etc…

Each model has its own set of supported regions, you can find the list of supported regions in the model page.

This will create a GoogleGenAiChatModel implementation that you can inject into your class. Here is an example of a simple @Controller class that uses the chat model for text generations.

@RestController
public class ChatController {

    private final GoogleGenAiChatModel chatModel;

    @Autowired
    public ChatController(GoogleGenAiChatModel chatModel) {
        this.chatModel = chatModel;
    }

    @GetMapping("/ai/generate")
    public Map generate(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        return Map.of("generation", this.chatModel.call(message));
    }

    @GetMapping("/ai/generateStream")
	public Flux<ChatResponse> generateStream(@RequestParam(value = "message", defaultValue = "Tell me a joke") String message) {
        Prompt prompt = new Prompt(new UserMessage(message));
        return this.chatModel.stream(prompt);
    }
}

Manual Configuration

The GoogleGenAiChatModel implements the ChatModel and uses the com.google.genai.Client to connect to the Google GenAI service.

Add the spring-ai-google-genai dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-google-genai</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-google-genai'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create a GoogleGenAiChatModel and use it for text generations:

Using API Key

Client genAiClient = Client.builder()
    .apiKey(System.getenv("GOOGLE_API_KEY"))
    .build();

var chatModel = new GoogleGenAiChatModel(genAiClient,
    GoogleGenAiChatOptions.builder()
        .model(ChatModel.GEMINI_2_0_FLASH)
        .temperature(0.4)
    .build());

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

Using Vertex AI

Client genAiClient = Client.builder()
    .project(System.getenv("GOOGLE_CLOUD_PROJECT"))
    .location(System.getenv("GOOGLE_CLOUD_LOCATION"))
    .vertexAI(true)
    .build();

var chatModel = new GoogleGenAiChatModel(genAiClient,
    GoogleGenAiChatOptions.builder()
        .model(ChatModel.GEMINI_2_0_FLASH)
        .temperature(0.4)
    .build());

ChatResponse response = this.chatModel.call(
    new Prompt("Generate the names of 5 famous pirates."));

The GoogleGenAiChatOptions provides the configuration information for the chat requests. The GoogleGenAiChatOptions.Builder is fluent options builder.

Migration from Vertex AI Gemini

If you’re currently using the Vertex AI Gemini implementation (spring-ai-vertex-ai-gemini), you can migrate to Google GenAI with minimal changes:

Key Differences

SDK: Google GenAI uses the new com.google.genai.Client instead of com.google.cloud.vertexai.VertexAI
Authentication: Supports both API key and Google Cloud credentials
Package Names: Classes are in org.springframework.ai.google.genai instead of org.springframework.ai.vertexai.gemini
Property Prefix: Uses spring.ai.google.genai instead of spring.ai.vertex.ai.gemini

When to Use Google GenAI vs Vertex AI Gemini

Use Google GenAI when: - You want quick prototyping with API keys - You need the latest Gemini features from the Developer API - You want flexibility to switch between API key and Vertex AI modes

Use Vertex AI Gemini when: - You have existing Vertex AI infrastructure - You need specific Vertex AI enterprise features - Your organization requires Google Cloud-only deployment

Low-level Java Client

The Google GenAI implementation is built on the new Google GenAI Java SDK, which provides a modern, streamlined API for accessing Gemini models.