Elasticsearch

This section walks you through setting up the Elasticsearch VectorStore to store document embeddings and perform similarity searches.

Elasticsearch is an open source search and analytics engine based on the Apache Lucene library.

Prerequisites

A running Elasticsearch instance. The following options are available:

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the Elasticsearch Vector Store. To enable it, add the following dependency to your project’s Maven pom.xml file:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elasticsearch-store-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-elasticsearch-store-spring-boot-starter'
}
For spring-boot versions pre 3.3.0 it’s necessary to explicitly add the elasticsearch-java dependency with version > 8.13.3, otherwise the older version used will be incompatible with the queries performed:
<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.13.3</version>
</dependency>
Refer to the Dependency Management section to add the Spring AI BOM to your build file.
Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file.

The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema boolean in the appropriate constructor or by setting …​initialize-schema=true in the application.properties file.

this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default.

Please have a look at the list of configuration parameters for the vector store to learn about the default values and configuration options.

Additionally, you will need a configured EmbeddingModel bean. Refer to the EmbeddingModel section for more information.

Now you can auto-wire the ElasticsearchVectorStore as a vector store in your application.

@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to Qdrant
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));

Configuration Properties

To connect to Elasticsearch and use the ElasticsearchVectorStore, you need to provide access details for your instance. A simple configuration can either be provided via Spring Boot’s application.yml,

spring:
  elasticsearch:
    uris: <elasticsearch instance URIs>
    username: <elasticsearch username>
    password: <elasticsearch password>
# API key if needed, e.g. OpenAI
  ai:
    openai:
      api:
        key: <api-key>

environment variables,

export SPRING_ELASTICSEARCH_URIS=<elasticsearch instance URIs>
export SPRING_ELASTICSEARCH_USERNAME=<elasticsearch username>
export SPRING_ELASTICSEARCH_PASSWORD=<elasticsearch password>
# API key if needed, e.g. OpenAI
export SPRING_AI_OPENAI_API_KEY=<api-key>

or can be a mix of those. For example, if you want to store your password as an environment variable but keep the rest in the plain application.yml file.

If you choose to create a shell script for ease in future work, be sure to run it prior to starting your application by "sourcing" the file, i.e. source <your_script_name>.sh.

Spring Boot’s auto-configuration feature for the Elasticsearch RestClient will create a bean instance that will be used by the ElasticsearchVectorStore.

The Spring Boot properties starting with spring.elasticsearch.* are used to configure the Elasticsearch client:

Property Description Default Value

spring.elasticsearch.connection-timeout

Connection timeout used when communicating with Elasticsearch.

1s

spring.elasticsearch.password

Password for authentication with Elasticsearch.

-

spring.elasticsearch.username

Username for authentication with Elasticsearch.

-

spring.elasticsearch.uris

Comma-separated list of the Elasticsearch instances to use.

localhost:9200

spring.elasticsearch.path-prefix

Prefix added to the path of every request sent to Elasticsearch.

-

spring.elasticsearch.restclient.sniffer.delay-after-failure

Delay of a sniff execution scheduled after a failure.

1m

spring.elasticsearch.restclient.sniffer.interval

Interval between consecutive ordinary sniff executions.

5m

spring.elasticsearch.restclient.ssl.bundle

SSL bundle name.

-

spring.elasticsearch.socket-keep-alive

Whether to enable socket keep alive between client and Elasticsearch.

false

spring.elasticsearch.socket-timeout

Socket timeout used when communicating with Elasticsearch.

30s

Properties starting with the spring.ai.vectorstore.elasticsearch.* prefix are used to configure ElasticsearchVectorStore.

Property Description Default Value

spring.ai.vectorstore.elasticsearch.index-name

The name of the index to store the vectors.

spring-ai-document-index

spring.ai.vectorstore.elasticsearch.dimensions

The number of dimensions in the vector.

1536

spring.ai.vectorstore.elasticsearch.similarity

The similarity function to use.

cosine

spring.ai.vectorstore.elasticsearch.initialize-schema

whether to initialize the required schema

false

The following similarity functions are available:

  • cosine

  • l2_norm

  • dot_product

More details about each in the Elasticsearch Documentation on dense vectors.

Metadata Filtering

You can leverage the generic, portable metadata filters with Elasticsearch as well.

For example, you can use either the text expression language:

vectorStore.similaritySearch(SearchRequest.defaults()
        .withQuery("The World")
        .withTopK(TOP_K)
        .withSimilarityThreshold(SIMILARITY_THRESHOLD)
        .withFilterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'"));

or programmatically using the Filter.Expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.defaults()
        .withQuery("The World")
        .withTopK(TOP_K)
        .withSimilarityThreshold(SIMILARITY_THRESHOLD)
        .withFilterExpression(b.and(
                b.in("john", "jill"),
                b.eq("article_type", "blog")).build()));
Those (portable) filter expressions get automatically converted into the proprietary Elasticsearch Query string query.

For example, this portable filter expression:

author in ['john', 'jill'] && 'article_type' == 'blog'

is converted into the proprietary Elasticsearch filter format:

(metadata.author:john OR jill) AND metadata.article_type:blog

Manual Configuration

Instead of using the Spring Boot auto-configuration, you can manually configure the Elasticsearch vector store. For this you need to add the spring-ai-elasticsearch-store to your project:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elasticsearch-store</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-elasticsearch-store'
}

Create an Elasticsearch RestClient bean. Read the Elasticsearch Documentation for more in-depth information about the configuration of a custom RestClient.

@Bean
public RestClient restClient() {
    RestClient.builder(new HttpHost("<host>", 9200, "http"))
        .setDefaultHeaders(new Header[]{
            new BasicHeader("Authorization", "Basic <encoded username and password>")
        })
        .build();
}

and then create the ElasticsearchVectorStore bean:

@Bean
public ElasticsearchVectorStore vectorStore(EmbeddingModel embeddingModel, RestClient restClient) {
    return new ElasticsearchVectorStore( restClient, embeddingModel);
}

// This can be any EmbeddingModel implementation.
@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}