Elasticsearch

This section walks you through setting up the Elasticsearch VectorStore to store document embeddings and perform similarity searches.

Elasticsearch is an open source search and analytics engine based on the Apache Lucene library.

Prerequisites

A running Elasticsearch instance. The following options are available:

Auto-configuration

Spring AI provides Spring Boot auto-configuration for the Elasticsearch Vector Store. To enable it, add the following dependency to your project’s Maven pom.xml or Gradle build.gradle build files:

  • Maven

  • Gradle

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elasticsearch-store-spring-boot-starter</artifactId>
</dependency>
dependencies {
    implementation 'org.springframework.ai:spring-ai-elasticsearch-store-spring-boot-starter'
}

For spring-boot versions pre 3.3.0 it’s necessary to explicitly add the elasticsearch-java dependency with version > 8.13.3, otherwise the older version used will be incompatible with the queries performed:

  • Maven

  • Gradle

<dependency>
    <groupId>co.elastic.clients</groupId>
    <artifactId>elasticsearch-java</artifactId>
    <version>8.13.3</version>
</dependency>
dependencies {
    implementation 'co.elastic.clients:elasticsearch-java:8.13.3'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file.
Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file.

The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema boolean in the appropriate constructor or by setting …​initialize-schema=true in the application.properties file. Alternatively you can opt-out the initialization and create the index manually using the Elasticsearch client, which can be useful if the index needs advanced mapping or additional configuration.

this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default.

Please have a look at the list of configuration parameters for the vector store to learn about the default values and configuration options. These properties can be also set by configuring the ElasticsearchVectorStoreOptions bean.

Additionally, you will need a configured EmbeddingModel bean. Refer to the EmbeddingModel section for more information.

Now you can auto-wire the ElasticsearchVectorStore as a vector store in your application.

@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to Elasticsearch
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.builder().query("Spring").topK(5).build());

Configuration Properties

To connect to Elasticsearch and use the ElasticsearchVectorStore, you need to provide access details for your instance. A simple configuration can either be provided via Spring Boot’s application.yml,

spring:
  elasticsearch:
    uris: <elasticsearch instance URIs>
    username: <elasticsearch username>
    password: <elasticsearch password>
  ai:
    vectorstore:
      elasticsearch:
        initialize-schema: true
        index-name: custom-index
        dimensions: 1536
        similarity: cosine
        batching-strategy: TOKEN_COUNT # Optional: Controls how documents are batched for embedding

The Spring Boot properties starting with spring.elasticsearch.* are used to configure the Elasticsearch client:

Property Description Default Value

spring.elasticsearch.connection-timeout

Connection timeout used when communicating with Elasticsearch.

1s

spring.elasticsearch.password

Password for authentication with Elasticsearch.

-

spring.elasticsearch.username

Username for authentication with Elasticsearch.

-

spring.elasticsearch.uris

Comma-separated list of the Elasticsearch instances to use.

localhost:9200

spring.elasticsearch.path-prefix

Prefix added to the path of every request sent to Elasticsearch.

-

spring.elasticsearch.restclient.sniffer.delay-after-failure

Delay of a sniff execution scheduled after a failure.

1m

spring.elasticsearch.restclient.sniffer.interval

Interval between consecutive ordinary sniff executions.

5m

spring.elasticsearch.restclient.ssl.bundle

SSL bundle name.

-

spring.elasticsearch.socket-keep-alive

Whether to enable socket keep alive between client and Elasticsearch.

false

spring.elasticsearch.socket-timeout

Socket timeout used when communicating with Elasticsearch.

30s

Properties starting with spring.ai.vectorstore.elasticsearch.* are used to configure the ElasticsearchVectorStore:

Property Description Default Value

spring.ai.vectorstore.elasticsearch.initialize-schema

Whether to initialize the required schema

false

spring.ai.vectorstore.elasticsearch.index-name

The name of the index to store the vectors

spring-ai-document-index

spring.ai.vectorstore.elasticsearch.dimensions

The number of dimensions in the vector

1536

spring.ai.vectorstore.elasticsearch.similarity

The similarity function to use

cosine

spring.ai.vectorstore.elasticsearch.batching-strategy

Strategy for batching documents when calculating embeddings. Options are TOKEN_COUNT or FIXED_SIZE

TOKEN_COUNT

The following similarity functions are available:

  • cosine - Default, suitable for most use cases. Measures cosine similarity between vectors.

  • l2_norm - Euclidean distance between vectors. Lower values indicate higher similarity.

  • dot_product - Best performance for normalized vectors (e.g., OpenAI embeddings).

More details about each in the Elasticsearch Documentation on dense vectors.

Metadata Filtering

You can leverage the generic, portable metadata filters with Elasticsearch as well.

For example, you can use either the text expression language:

vectorStore.similaritySearch(SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'").build());

or programmatically using the Filter.Expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.builder()
        .query("The World")
        .topK(TOP_K)
        .similarityThreshold(SIMILARITY_THRESHOLD)
        .filterExpression(b.and(
                b.in("author", "john", "jill"),
                b.eq("article_type", "blog")).build()).build());
Those (portable) filter expressions get automatically converted into the proprietary Elasticsearch Query string query.

For example, this portable filter expression:

author in ['john', 'jill'] && 'article_type' == 'blog'

is converted into the proprietary Elasticsearch filter format:

(metadata.author:john OR jill) AND metadata.article_type:blog

Manual Configuration

Instead of using the Spring Boot auto-configuration, you can manually configure the Elasticsearch vector store. For this you need to add the spring-ai-elasticsearch-store to your project:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-elasticsearch-store</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-elasticsearch-store'
}

Create an Elasticsearch RestClient bean. Read the Elasticsearch Documentation for more in-depth information about the configuration of a custom RestClient.

@Bean
public RestClient restClient() {
    return RestClient.builder(new HttpHost("<host>", 9200, "http"))
        .setDefaultHeaders(new Header[]{
            new BasicHeader("Authorization", "Basic <encoded username and password>")
        })
        .build();
}

Then create the ElasticsearchVectorStore bean using the builder pattern:

@Bean
public VectorStore vectorStore(RestClient restClient, EmbeddingModel embeddingModel) {
    ElasticsearchVectorStoreOptions options = new ElasticsearchVectorStoreOptions();
    options.setIndexName("custom-index");    // Optional: defaults to "spring-ai-document-index"
    options.setSimilarity(COSINE);           // Optional: defaults to COSINE
    options.setDimensions(1536);             // Optional: defaults to model dimensions or 1536

    return ElasticsearchVectorStore.builder(restClient, embeddingModel)
        .options(options)                     // Optional: use custom options
        .initializeSchema(true)               // Optional: defaults to false
        .batchingStrategy(new TokenCountBatchingStrategy()) // Optional: defaults to TokenCountBatchingStrategy
        .build();
}

// This can be any EmbeddingModel implementation
@Bean
public EmbeddingModel embeddingModel() {
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("OPENAI_API_KEY")));
}