Couchbase
This section will walk you through setting up the CouchbaseSearchVectorStore
to store document embeddings and perform similarity searches using Couchbase.
Couchbase is a distributed, JSON document database, with all the desired capabilities of a relational DBMS. Among other features, it allows users to query information using vector-based storage and retrieval.
Prerequisites
A running Couchbase instance. The following options are available: Couchbase * Docker * Capella - Couchbase as a Service * Install Couchbase locally * Couchbase Kubernetes Operator
Auto-configuration
Spring AI provides Spring Boot auto-configuration for the Couchbase Vector Store.
To enable it, add the following dependency to your project’s Maven pom.xml
file:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-couchbase-store-spring-boot-starter</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-couchbase-store-spring-boot-starter'
}
Couchbase Vector search is only available in starting version 7.6 and Java SDK version 3.6.0" |
Refer to the Dependency Management section to add the Spring AI BOM to your build file. |
Refer to the Repositories section to add Milestone and/or Snapshot Repositories to your build file. |
The vector store implementation can initialize the configured bucket, scope, collection and search index for you, with default options, but you must opt-in by specifying the initializeSchema
boolean in the appropriate constructor.
This is a breaking change! In earlier versions of Spring AI, this schema initialization happened by default. |
Please have a look at the list of configuration parameters for the vector store to learn about the default values and configuration options.
Additionally, you will need a configured EmbeddingModel
bean. Refer to the EmbeddingModel section for more information.
Now you can auto-wire the CouchbaseSearchVectorStore
as a vector store in your application.
@Autowired VectorStore vectorStore;
// ...
List <Document> documents = List.of(
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
new Document("The World is Big and Salvation Lurks Around the Corner"),
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
// Add the documents to Qdrant
vectorStore.add(documents);
// Retrieve documents similar to a query
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
Configuration Properties
To connect to Couchbase and use the CouchbaseSearchVectorStore
, you need to provide access details for your instance.
A simple configuration can either be provided via Spring Boot’s application.properties
,
spring.ai.openai.api-key=<key> spring.couchbase.connection-string=<conn_string> spring.couchbase.username=<username> spring.couchbase.password=<password>
environment variables,
export SPRING_COUCHBASE_CONNECTION_STRINGS=<couchbase connection string like couchbase://localhost>
export SPRING_COUCHBASE_USERNAME=<couchbase username>
export SPRING_COUCHBASE_PASSWORD=<couchbase password>
# API key if needed, e.g. OpenAI
export SPRING_AI_OPENAI_API_KEY=<api-key>
or can be a mix of those.
For example, if you want to store your password as an environment variable but keep the rest in the plain application.yml
file.
If you choose to create a shell script for ease in future work, be sure to run it prior to starting your application by "sourcing" the file, i.e. source <your_script_name>.sh .
|
Spring Boot’s auto-configuration feature for the Couchbase Cluster will create a bean instance that will be used by the CouchbaseSearchVectorStore
.
The Spring Boot properties starting with spring.couchbase.*
are used to configure the Couchbase cluster instance:
Property | Description | Default Value |
---|---|---|
|
A couchbase connection string |
|
|
Password for authentication with Couchbase. |
- |
|
Username for authentication with Couchbase. |
- |
|
Minimum number of sockets per node. |
1 |
|
Maximum number of sockets per node. |
12 |
|
Length of time an HTTP connection may remain idle before it is closed and removed from the pool. |
1s |
|
Whether to enable SSL support. Enabled automatically if a "bundle" is provided unless specified otherwise. |
- |
|
SSL bundle name. |
- |
|
Bucket connect timeout. |
10s |
|
Bucket disconnect timeout. |
10s |
|
Timeout for operations on a specific key-value. |
2500ms |
|
Timeout for operations on a specific key-value with a durability level. |
10s |
|
Timeout for operations on a specific key-value with a durability level. |
10s |
|
SQL++ query operations timeout. |
75s |
|
Regular and geospatial view operations timeout. |
75s |
|
Timeout for the search service. |
75s |
|
Timeout for the analytics service. |
75s |
|
Timeout for the management operations. |
75s |
Properties starting with the spring.ai.vectorstore.couchbase.*
prefix are used to configure CouchbaseSearchVectorStore
.
Property | Description | Default Value |
---|---|---|
|
The name of the index to store the vectors. |
spring-ai-document-index |
|
The name of the Couchbase Bucket, parent of the scope. |
default |
|
The name of the Couchbase scope, parent of the collection. Search queries will be executed in the scope context. |
default |
|
The name of the Couchbase collection to store the Documents. |
default |
|
The number of dimensions in the vector. |
1536 |
|
The similarity function to use. |
|
|
The similarity function to use. |
|
|
whether to initialize the required schema |
|
The following similarity functions are available:
-
l2_norm
-
dot_product
The following index optimizations are available:
-
recall
-
latency
More details about each in the Couchbase Documentation on vector searches.
Metadata Filtering
You can leverage the generic, portable metadata filters with the Couchbase store.
For example, you can use either the text expression language:
vectorStore.similaritySearch(
SearchRequest.defaults()
.query("The World")
.topK(TOP_K)
.filterExpression("author in ['john', 'jill'] && article_type == 'blog'"));
or programmatically using the Filter.Expression
DSL:
FilterExpressionBuilder b = new FilterExpressionBuilder();
vectorStore.similaritySearch(SearchRequest.defaults()
.query("The World")
.topK(TOP_K)
.filterExpression(b.and(
b.in("author","john", "jill"),
b.eq("article_type", "blog")).build()));
These filter expressions are converted into the equivalent Couchbase SQL++ filters. |
Manual Configuration
Instead of using the Spring Boot auto-configuration, you can manually configure the Couchbase vector store. For this you need to add the spring-ai-couchbase-store
to your project:
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-couchbase-store</artifactId>
</dependency>
or to your Gradle build.gradle
build file.
dependencies {
implementation 'org.springframework.ai:spring-ai-couchbase-store'
}
Create a Couchbase Cluster
bean.
Read the Couchbase Documentation for more in-depth information about the configuration of a custom Cluster instance.
@Bean
public Cluster cluster() {
Cluster cluster = Cluster.connect("couchbase://localhost",
"username", "password");
}
and then create the CouchbaseSearchVectorStore
bean using the builder pattern:
@Bean
public VectorStore couchbaseSearchVectorStore(Cluster cluster,
EmbeddingModel embeddingModel,
Boolean initializeSchema) {
return CouchbaseSearchVectorStore
.builder(cluster, embeddingModel)
.bucketName("test")
.scopeName("test")
.collectionName("test")
.initializeSchema(initializeSchema)
.build();
}
// This can be any EmbeddingModel implementation.
@Bean
public EmbeddingModel embeddingModel() {
return new OpenAiEmbeddingModel(OpenAiApi.builder().apiKey(this.openaiKey).build());
}