Weaviate

This section will walk you through setting up the Weaviate VectorStore to store document embeddings and perform similarity searches.

What is Weaviate?

Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings from your favorite ML-models and scale seamlessly into billions of data objects. It provides tools to store document embeddings, content, and metadata and to search through those embeddings, including metadata filtering.

Prerequisites

  1. EmbeddingModel instance to compute the document embeddings. Several options are available:

    • Transformers Embedding - computes the embedding in your local environment. Follow the ONNX Transformers Embedding instructions.

    • OpenAI Embedding - uses the OpenAI embedding endpoint. You need to create an account at OpenAI Signup and generate the api-key token at API Keys.

    • You can also use the Azure OpenAI Embedding or the PostgresML Embedding Model.

  2. Weaviate cluster. You can set up a cluster locally in a Docker container or create a Weaviate Cloud Service. For the latter, you need to create a Weaviate account, set up a cluster, and get your access API key from the dashboard details.

On startup, the WeaviateVectorStore creates the required SpringAiWeaviate object schema if it’s not already provisioned.

Auto-configuration

Then add the WeaviateVectorStore boot starter dependency to your project:

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-weaviate-store-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-weaviate-store-spring-boot-starter'
}

The vector store implementation can initialize the requisite schema for you, but you must opt-in by specifying the initializeSchema boolean in the appropriate constructor or by setting …​initialize-schema=true in the application.properties file.

this is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default.

The Vector Store, also requires an EmbeddingModel instance to calculate embeddings for the documents. You can pick one of the available EmbeddingModel Implementations.

For example to use the OpenAI EmbeddingModel add the following dependency to your project:

<dependency>
	<groupId>org.springframework.ai</groupId>
	<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
}
Refer to the Dependency Management section to add the Spring AI BOM to your build file. Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file.

To connect to Weaviate and use the WeaviateVectorStore, you need to provide access details for your instance. A simple configuration can either be provided via Spring Boot’s application.properties,

spring.ai.vectorstore.weaviate.host=<host of your Weaviate instance>
spring.ai.vectorstore.weaviate.api-key=<your api key>
spring.ai.vectorstore.weaviate.scheme=http

# API key if needed, e.g. OpenAI
spring.ai.openai.api.key=<api-key>
Check the list of configuration parameters to learn about the default values and configuration options.

Now you can Auto-wire the Weaviate Vector Store in your application and use it

@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));

Configuration properties

You can use the following properties in your Spring Boot configuration to customize the weaviate vector store.

Property Description Default value

spring.ai.vectorstore.weaviate.host

The host of the Weaviate server.

localhost:8080

spring.ai.vectorstore.weaviate.scheme

Connection schema.

http

spring.ai.vectorstore.weaviate.api-key

The API key to use for authentication with the Weaviate server.

-

spring.ai.vectorstore.weaviate.object-class

"SpringAiWeaviate"

spring.ai.vectorstore.weaviate.consistency-level

Desired tradeoff between consistency and speed

ConsistentLevel.ONE

spring.ai.vectorstore.weaviate.filter-field

spring.ai.vectorstore.weaviate.filter-field.<field-name>=<field-type>

-

spring.ai.vectorstore.weaviate.headers

-

spring.ai.vectorstore.weaviate.initialize-schema

Whether to initialize the required schema

false

Metadata filtering

You can leverage the generic, portable metadata filters with WeaviateVectorStore as well.

For example, you can use either the text expression language:

vectorStore.similaritySearch(
   SearchRequest
      .query("The World")
      .withTopK(TOP_K)
      .withSimilarityThreshold(SIMILARITY_THRESHOLD)
      .withFilterExpression("country in ['UK', 'NL'] && year >= 2020"));

or programmatically using the expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(
   SearchRequest
      .query("The World")
      .withTopK(TOP_K)
      .withSimilarityThreshold(SIMILARITY_THRESHOLD)
      .withFilterExpression(b.and(
         b.in("country", "UK", "NL"),
         b.gte("year", 2020)).build()));

The portable filter expressions get automatically converted into the proprietary Weaviate where filters. For example, the following portable filter expression:

country in ['UK', 'NL'] && year >= 2020

is converted into Weaviate GraphQL where filter expression:

operator:And
   operands:
      [{
         operator:Or
         operands:
            [{
               path:["meta_country"]
               operator:Equal
               valueText:"UK"
            },
            {
               path:["meta_country"]
               operator:Equal
               valueText:"NL"
            }]
      },
      {
         path:["meta_year"]
         operator:GreaterThanEqual
         valueNumber:2020
      }]

Manual Configuration

Instead of using the Spring Boot auto-configuration, you can manually configure the WeaviateVectorStore. For this you need to add the spring-ai-weaviate-store dependency to your project:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-weaviate-store</artifactId>
</dependency>

or to your Gradle build.gradle build file.

dependencies {
    implementation 'org.springframework.ai:spring-ai-weaviate-store'
}

To configure Weaviate in your application, you can create a WeaviateClient:

@Bean
public WeaviateClient weaviateClient() {
   try {
      return WeaviateAuthClient.apiKey(
            new Config(<YOUR SCHEME>, <YOUR HOST>, <YOUR HEADERS>),
            <YOUR API KEY>);
   }
   catch (AuthException e) {
      throw new IllegalArgumentException("WeaviateClient could not be created.", e);
   }
}

Integrate with OpenAI’s embeddings by adding the Spring Boot OpenAI starter to your project. This provides you with an implementation of the Embeddings client:

@Bean
public WeaviateVectorStore vectorStore(EmbeddingModel embeddingModel, WeaviateClient weaviateClient) {

   WeaviateVectorStoreConfig.Builder configBuilder = WeaviateVectorStore.WeaviateVectorStoreConfig.builder()
      .withObjectClass(<YOUR OBJECT CLASS>)
      .withConsistencyLevel(<YOUR CONSISTENCY LEVEL>);

   return new WeaviateVectorStore(configBuilder.build(), embeddingModel, weaviateClient);
}

Run Weaviate cluster in docker container

Start Weaviate in a docker container:

docker run -it --rm --name weaviate -e AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED=true -e PERSISTENCE_DATA_PATH=/var/lib/weaviate -e QUERY_DEFAULTS_LIMIT=25 -e DEFAULT_VECTORIZER_MODULE=none -e CLUSTER_HOSTNAME=node1 -p 8080:8080 semitechnologies/weaviate:1.22.4

Starts a Weaviate cluster at localhost:8080/v1 with scheme=http, host=localhost:8080, and apiKey="". Then follow the usage instructions.