Elasticsearch
This section walks you through setting up the Elasticsearch VectorStore
to store document embeddings and perform similarity searches.
Elasticsearch is an open source search and analytics engine based on the Apache Lucene library.
Auto-configuration
Spring AI provides Spring Boot auto-configuration for the Elasticsearch Vector Store.
To enable it, add the following dependency to your project’s Maven pom.xml
or Gradle build.gradle
build files:
-
Maven
-
Gradle
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-elasticsearch-store-spring-boot-starter</artifactId>
</dependency>
dependencies {
implementation 'org.springframework.ai:spring-ai-elasticsearch-store-spring-boot-starter'
}
For spring-boot versions pre 3.3.0 it’s necessary to explicitly add the elasticsearch-java dependency with version > 8.13.3, otherwise the older version used will be incompatible with the queries performed:
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file. |
The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema
boolean in the appropriate constructor or by setting …initialize-schema=true
in the application.properties
file.
Alternatively you can opt-out the initialization and create the index manually using the Elasticsearch client, which can be useful if the index needs advanced mapping or additional configuration.
this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default. |
Please have a look at the list of configuration parameters for the vector store to learn about the default values and configuration options.
These properties can be also set by configuring the ElasticsearchVectorStoreOptions
bean.
Additionally, you will need a configured EmbeddingModel
bean. Refer to the EmbeddingModel section for more information.
Now you can auto-wire the ElasticsearchVectorStore
as a vector store in your application.
@Autowired VectorStore vectorStore;
// ...
List <Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
new Document("The World is Big and Salvation Lurks Around the Corner"),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
// Add the documents to Elasticsearch
vectorStore.add(documents);
// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());
Configuration Properties
To connect to Elasticsearch and use the ElasticsearchVectorStore
, you need to provide access details for your instance.
A simple configuration can either be provided via Spring Boot’s application.yml
,
spring:
elasticsearch:
uris: <elasticsearch instance URIs>
username: <elasticsearch username>
password: <elasticsearch password>
ai:
vectorstore:
elasticsearch:
initialize-schema: true
index-name: custom-index
dimensions: 1536
similarity: cosine
batching-strategy: TOKEN_COUNT # Optional: Controls how documents are batched for embedding
The Spring Boot properties starting with spring.elasticsearch.*
are used to configure the Elasticsearch client:
Property | Description | Default Value |
---|---|---|
|
Connection timeout used when communicating with Elasticsearch. |
|
|
Password for authentication with Elasticsearch. |
- |
|
Username for authentication with Elasticsearch. |
- |
|
Comma-separated list of the Elasticsearch instances to use. |
|
|
Prefix added to the path of every request sent to Elasticsearch. |
- |
|
Delay of a sniff execution scheduled after a failure. |
|
|
Interval between consecutive ordinary sniff executions. |
|
|
SSL bundle name. |
- |
|
Whether to enable socket keep alive between client and Elasticsearch. |
|
|
Socket timeout used when communicating with Elasticsearch. |
|
Properties starting with spring.ai.vectorstore.elasticsearch.*
are used to configure the ElasticsearchVectorStore
:
Property | Description | Default Value |
---|---|---|
|
Whether to initialize the required schema |
|
|
The name of the index to store the vectors |
|
|
The number of dimensions in the vector |
|
|
The similarity function to use |
|
|
Strategy for batching documents when calculating embeddings. Options are |
|
The following similarity functions are available:
-
cosine
- Default, suitable for most use cases. Measures cosine similarity between vectors. -
l2_norm
- Euclidean distance between vectors. Lower values indicate higher similarity. -
dot_product
- Best performance for normalized vectors (e.g., OpenAI embeddings).
More details about each in the Elasticsearch Documentation on dense vectors.
Metadata Filtering
You can leverage the generic, portable metadata filters with Elasticsearch as well.
For example, you can use either the text expression language:
vectorStore.similaritySearch(SearchRequest.builder()
.query("The World")
.topK(TOP_K)
.similarityThreshold(SIMILARITY_THRESHOLD)
.filterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'").build());
or programmatically using the Filter.Expression
DSL:
FilterExpressionBuilder b = new FilterExpressionBuilder();
vectorStore.similaritySearch(SearchRequest.builder()
.query("The World")
.topK(TOP_K)
.similarityThreshold(SIMILARITY_THRESHOLD)
.filterExpression(b.and(
b.in("author", "john", "jill"),
b.eq("article_type", "blog")).build()).build());
Those (portable) filter expressions get automatically converted into the proprietary Elasticsearch Query string query. |
For example, this portable filter expression:
author in ['john', 'jill'] && 'article_type' == 'blog'
is converted into the proprietary Elasticsearch filter format:
(metadata.author:john OR jill) AND metadata.article_type:blog
Manual Configuration
Instead of using the Spring Boot auto-configuration, you can manually configure the Elasticsearch vector store. For this you need to add the spring-ai-elasticsearch-store
to your project:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-elasticsearch-store</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-elasticsearch-store'
}
Create an Elasticsearch RestClient
bean.
Read the Elasticsearch Documentation for more in-depth information about the configuration of a custom RestClient.
@Bean
public RestClient restClient() {
return RestClient.builder(new HttpHost("<host>", 9200, "http"))
.setDefaultHeaders(new Header[]{
new BasicHeader("Authorization", "Basic <encoded username and password>")
})
.build();
}
Then create the ElasticsearchVectorStore
bean using the builder pattern:
@Bean
public VectorStore vectorStore(RestClient restClient, EmbeddingModel embeddingModel) {
ElasticsearchVectorStoreOptions options = new ElasticsearchVectorStoreOptions();
options.setIndexName("custom-index"); // Optional: defaults to "spring-ai-document-index"
options.setSimilarity(COSINE); // Optional: defaults to COSINE
options.setDimensions(1536); // Optional: defaults to model dimensions or 1536
return ElasticsearchVectorStore.builder(restClient, embeddingModel)
.options(options) // Optional: use custom options
.initializeSchema(true) // Optional: defaults to false
.batchingStrategy(new TokenCountBatchingStrategy()) // Optional: defaults to TokenCountBatchingStrategy
.build();
}
// This can be any EmbeddingModel implementation
@Bean
public EmbeddingModel embeddingModel() {
return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}