Elasticsearch
This section walks you through setting up the Elasticsearch VectorStore
to store document embeddings and perform similarity searches.
Elasticsearch is an open source search and analytics engine based on the Apache Lucene library.
Auto-configuration
Spring AI provides Spring Boot auto-configuration for the Elasticsearch Vector Store.
To enable it, add the following dependency to your project’s Maven pom.xml
or Gradle build.gradle
build files:
-
Maven
-
Gradle
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-elasticsearch-store-spring-boot-starter</artifactId>
</dependency>
dependencies {
implementation 'org.springframework.ai:spring-ai-elasticsearch-store-spring-boot-starter'
}
For spring-boot versions pre 3.3.0 it’s necessary to explicitly add the elasticsearch-java dependency with version > 8.13.3, otherwise the older version used will be incompatible with the queries performed:
|
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file. |
The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema
boolean in the appropriate constructor or by setting …initialize-schema=true
in the application.properties
file.
Alternatively you can opt-out the initialization and create the index manually using the Elasticsearch client, which can be useful if the index needs advanced mapping or additional configuration.
this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default. |
Please have a look at the list of configuration parameters for the vector store to learn about the default values and configuration options.
These properties can be also set by configuring the ElasticsearchVectorStoreOptions
bean.
Additionally, you will need a configured EmbeddingModel
bean. Refer to the EmbeddingModel section for more information.
Now you can auto-wire the ElasticsearchVectorStore
as a vector store in your application.
@Autowired VectorStore vectorStore;
// ...
List <Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
new Document("The World is Big and Salvation Lurks Around the Corner"),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
// Add the documents to Qdrant
vectorStore.add(documents);
// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
Configuration Properties
To connect to Elasticsearch and use the ElasticsearchVectorStore
, you need to provide access details for your instance.
A simple configuration can either be provided via Spring Boot’s application.yml
,
spring:
elasticsearch:
uris: <elasticsearch instance URIs>
username: <elasticsearch username>
password: <elasticsearch password>
# API key if needed, e.g. OpenAI
ai:
openai:
api:
key: <api-key>
environment variables,
export SPRING_ELASTICSEARCH_URIS=<elasticsearch instance URIs>
export SPRING_ELASTICSEARCH_USERNAME=<elasticsearch username>
export SPRING_ELASTICSEARCH_PASSWORD=<elasticsearch password>
# API key if needed, e.g. OpenAI
export SPRING_AI_OPENAI_API_KEY=<api-key>
or can be a mix of those.
For example, if you want to store your password as an environment variable but keep the rest in the plain application.yml
file.
If you choose to create a shell script for ease in future work, be sure to run it prior to starting your application by "sourcing" the file, i.e. source <your_script_name>.sh .
|
Spring Boot’s auto-configuration feature for the Elasticsearch RestClient will create a bean instance that will be used by the ElasticsearchVectorStore
.
The Spring Boot properties starting with spring.elasticsearch.*
are used to configure the Elasticsearch client:
Property | Description | Default Value |
---|---|---|
|
Connection timeout used when communicating with Elasticsearch. |
|
|
Password for authentication with Elasticsearch. |
- |
|
Username for authentication with Elasticsearch. |
- |
|
Comma-separated list of the Elasticsearch instances to use. |
|
|
Prefix added to the path of every request sent to Elasticsearch. |
- |
|
Delay of a sniff execution scheduled after a failure. |
|
|
Interval between consecutive ordinary sniff executions. |
|
|
SSL bundle name. |
- |
|
Whether to enable socket keep alive between client and Elasticsearch. |
|
|
Socket timeout used when communicating with Elasticsearch. |
|
Properties starting with the spring.ai.vectorstore.elasticsearch.*
prefix are used to configure ElasticsearchVectorStore
.
Property | Description | Default Value |
---|---|---|
|
Whether to initialize the required schema |
|
|
The name of the index to store the vectors. |
spring-ai-document-index |
|
The number of dimensions in the vector. |
1536 |
|
The similarity function to use. |
|
The following similarity functions are available:
-
cosine
-
l2_norm
-
dot_product
More details about each in the Elasticsearch Documentation on dense vectors.
Metadata Filtering
You can leverage the generic, portable metadata filters with Elasticsearch as well.
For example, you can use either the text expression language:
vectorStore.similaritySearch(SearchRequest.defaults()
.withQuery("The World")
.withTopK(TOP_K)
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
.withFilterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'"));
or programmatically using the Filter.Expression
DSL:
FilterExpressionBuilder b = new FilterExpressionBuilder();
vectorStore.similaritySearch(SearchRequest.defaults()
.withQuery("The World")
.withTopK(TOP_K)
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
.withFilterExpression(b.and(
b.in("john", "jill"),
b.eq("article_type", "blog")).build()));
Those (portable) filter expressions get automatically converted into the proprietary Elasticsearch Query string query. |
For example, this portable filter expression:
author in ['john', 'jill'] && 'article_type' == 'blog'
is converted into the proprietary Elasticsearch filter format:
(metadata.author:john OR jill) AND metadata.article_type:blog
Manual Configuration
Instead of using the Spring Boot auto-configuration, you can manually configure the Elasticsearch vector store. For this you need to add the spring-ai-elasticsearch-store
to your project:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-elasticsearch-store</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-elasticsearch-store'
}
Create an Elasticsearch RestClient
bean.
Read the Elasticsearch Documentation for more in-depth information about the configuration of a custom RestClient.
@Bean
public RestClient restClient() {
RestClient.builder(new HttpHost("<host>", 9200, "http"))
.setDefaultHeaders(new Header[]{
new BasicHeader("Authorization", "Basic <encoded username and password>")
})
.build();
}
and then create the ElasticsearchVectorStore
bean:
@Bean
public ElasticsearchVectorStore vectorStore(EmbeddingModel embeddingModel, RestClient restClient) {
return new ElasticsearchVectorStore( restClient, embeddingModel);
}
// This can be any EmbeddingModel implementation.
@Bean
public EmbeddingModel embeddingModel() {
return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}