Google VertexAI Multimodal Embeddings

EXPERIMENTAL. Used for experimental purposes only. Not compatible yet with the VectorStores.

Vertex AI supports two types of embeddings models, text and multimodal. This document describes how to create a multimodal embedding using the Vertex AI Multimodal embeddings API.

The multimodal embeddings model generates 1408-dimension vectors based on the input you provide, which can include a combination of image, text, and video data. The embedding vectors can then be used for subsequent tasks like image classification or video content moderation.

The image embedding vector and text embedding vector are in the same semantic space with the same dimensionality. Consequently, these vectors can be used interchangeably for use cases like searching image by text, or searching video by image.

The VertexAI Multimodal API imposes the following limits.

For text-only embedding use cases, we recommend using the Vertex AI text-embeddings model instead.

Prerequisites

Install the gcloud CLI, appropriate for you OS.
Authenticate by running the following command. Replace PROJECT_ID with your Google Cloud project ID and ACCOUNT with your Google Cloud username.

gcloud config set project <PROJECT_ID> &&
gcloud auth application-default login <ACCOUNT>

Add Repositories and BOM

Spring AI artifacts are published in Maven Central and Spring Snapshot repositories. Refer to the Artifact Repositories section to add these repositories to your build system.

To help with dependency management, Spring AI provides a BOM (bill of materials) to ensure that a consistent version of Spring AI is used throughout the entire project. Refer to the Dependency Management section to add the Spring AI BOM to your build system.

Auto-configuration

There has been a significant change in the Spring AI auto-configuration, starter modules' artifact names. Please refer to the upgrade notes for more information.

Spring AI provides Spring Boot auto-configuration for the VertexAI Embedding Model. To enable it add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-vertex-ai-embedding</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-starter-model-vertex-ai-embedding'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Embedding Properties

The prefix spring.ai.vertex.ai.embedding is used as the property prefix that lets you connect to VertexAI Embedding API.

Property	Description	Default
spring.ai.vertex.ai.embedding.project-id	Google Cloud Platform project ID	-
spring.ai.vertex.ai.embedding.location	Region	-
spring.ai.vertex.ai.embedding.apiEndpoint	Vertex AI Embedding API endpoint.	-

Property

Description

Default

spring.ai.vertex.ai.embedding.project-id

Google Cloud Platform project ID

spring.ai.vertex.ai.embedding.location

Region

spring.ai.vertex.ai.embedding.apiEndpoint

Vertex AI Embedding API endpoint.

Enabling and disabling of the embedding auto-configurations are now configured via top level properties with the prefix spring.ai.model.embedding.

To enable, spring.ai.model.embedding.multimodal=vertexai (It is enabled by default)

To disable, spring.ai.model.embedding.multimodal=none (or any value which doesn’t match vertexai)

This change is done to allow configuration of multiple models.

The prefix spring.ai.vertex.ai.embedding.multimodal is the property prefix that lets you configure the embedding model implementation for VertexAI Multimodal Embedding.

Property	Description	Default
spring.ai.vertex.ai.embedding.multimodal.enabled (Removed and no longer valid)	Enable Vertex AI Embedding API model.	true
spring.ai.model.embedding.multimodal=vertexai	Enable Vertex AI Embedding API model.	vertexai
spring.ai.vertex.ai.embedding.multimodal.options.model	You can get multimodal embeddings by using the following model:	multimodalembedding@001
spring.ai.vertex.ai.embedding.multimodal.options.dimensions	Specify lower-dimension embeddings. By default, an embedding request returns a 1408 float vector for a data type. You can also specify lower-dimension embeddings (128, 256, or 512 float vectors) for text and image data.	1408
spring.ai.vertex.ai.embedding.multimodal.options.video-start-offset-sec	The start offset of the video segment in seconds. If not specified, it’s calculated with max(0, endOffsetSec - 120).	-
spring.ai.vertex.ai.embedding.multimodal.options.video-end-offset-sec	The end offset of the video segment in seconds. If not specified, it’s calculated with min(video length, startOffSec + 120). If both startOffSec and endOffSec are specified, endOffsetSec is adjusted to min(startOffsetSec+120, endOffsetSec).	-
spring.ai.vertex.ai.embedding.multimodal.options.video-interval-sec	The interval of the video the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it impacts the quality of the generated embeddings. Default value: 16.	-

Property

Description

Default

spring.ai.vertex.ai.embedding.multimodal.enabled (Removed and no longer valid)

Enable Vertex AI Embedding API model.

true

spring.ai.model.embedding.multimodal=vertexai

Enable Vertex AI Embedding API model.

vertexai

spring.ai.vertex.ai.embedding.multimodal.options.model

You can get multimodal embeddings by using the following model:

multimodalembedding@001

spring.ai.vertex.ai.embedding.multimodal.options.dimensions

Specify lower-dimension embeddings. By default, an embedding request returns a 1408 float vector for a data type. You can also specify lower-dimension embeddings (128, 256, or 512 float vectors) for text and image data.

1408

spring.ai.vertex.ai.embedding.multimodal.options.video-start-offset-sec

The start offset of the video segment in seconds. If not specified, it’s calculated with max(0, endOffsetSec - 120).

spring.ai.vertex.ai.embedding.multimodal.options.video-end-offset-sec

The end offset of the video segment in seconds. If not specified, it’s calculated with min(video length, startOffSec + 120). If both startOffSec and endOffSec are specified, endOffsetSec is adjusted to min(startOffsetSec+120, endOffsetSec).

spring.ai.vertex.ai.embedding.multimodal.options.video-interval-sec

The interval of the video the embedding will be generated. The minimum value for interval_sec is 4. If the interval is less than 4, an InvalidArgumentError is returned. There are no limitations on the maximum value of the interval. However, if the interval is larger than min(video length, 120s), it impacts the quality of the generated embeddings. Default value: 16.

Manual Configuration

The VertexAiMultimodalEmbeddingModel implements the DocumentEmbeddingModel.

Add the spring-ai-vertex-ai-embedding dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-vertex-ai-embedding</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-vertex-ai-embedding'
}

Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Next, create a VertexAiMultimodalEmbeddingModel and use it for embeddings generations:

VertexAiEmbeddingConnectionDetails connectionDetails =
    VertexAiEmbeddingConnectionDetails.builder()
        .projectId(System.getenv(<VERTEX_AI_GEMINI_PROJECT_ID>))
        .location(System.getenv(<VERTEX_AI_GEMINI_LOCATION>))
        .build();

VertexAiMultimodalEmbeddingOptions options = VertexAiMultimodalEmbeddingOptions.builder()
    .model(VertexAiMultimodalEmbeddingOptions.DEFAULT_MODEL_NAME)
    .build();

var embeddingModel = new VertexAiMultimodalEmbeddingModel(this.connectionDetails, this.options);

Media imageMedial = new Media(MimeTypeUtils.IMAGE_PNG, new ClassPathResource("/test.image.png"));
Media videoMedial = new Media(new MimeType("video", "mp4"), new ClassPathResource("/test.video.mp4"));

var document = new Document("Explain what do you see on this video?", List.of(this.imageMedial, this.videoMedial), Map.of());

EmbeddingResponse embeddingResponse = this.embeddingModel
	.embedForResponse(List.of("Hello World", "World is big and salvation is near"));

DocumentEmbeddingRequest embeddingRequest = new DocumentEmbeddingRequest(List.of(this.document),
        EmbeddingOptions.EMPTY);

EmbeddingResponse embeddingResponse = multiModelEmbeddingModel.call(this.embeddingRequest);

assertThat(embeddingResponse.getResults()).hasSize(3);