Chroma

This section will walk you through setting up the Chroma VectorStore to store document embeddings and perform similarity searches.

What is Chroma?

Chroma is the open-source embedding database. It gives you the tools to store document embeddings, content, and metadata and to search through those embeddings, including metadata filtering.

Prerequisites

  1. OpenAI Account: Create an account at OpenAI Signup and generate the token at API Keys.

  2. Access to ChromeDB. The setup local ChromaDB appendix shows how to set up a DB locally with a Docker container.

On startup, the ChromaVectorStore creates the required collection if one is not provisioned already.

Configuration

To set up ChromaVectorStore, you’ll need to provide your OpenAI API Key. Set it as an environment variable like so:

export SPRING_AI_OPENAI_API_KEY='Your_OpenAI_API_Key'

Dependencies

Add these dependencies to your project:

  • OpenAI: Required for calculating embeddings.

<dependency>
 <groupId>org.springframework.ai</groupId>
 <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
  • Chroma VectorStore.

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-chroma-store</artifactId>
</dependency>
Refer to the Dependency Management section to add the Spring AI BOM to your build file.

Sample Code

Create a RestTemplate instance with proper ChromaDB authorization configurations and Use it to create a ChromaApi instance:

@Bean
public RestTemplate restTemplate() {
   return new RestTemplate();
}

@Bean
public ChromaApi chromaApi(RestTemplate restTemplate) {
   String chromaUrl = "http://localhost:8000";
   ChromaApi chromaApi = new ChromaApi(chromaUrl, restTemplate);
   return chromaApi;
}

For ChromaDB secured with Static API Token Authentication use the ChromaApi#withKeyToken(<Your Token Credentials>) method to set your credentials. Check the ChromaWhereIT for an example.

For ChromaDB secured with Basic Authentication use the ChromaApi#withBasicAuth(<your user>, <your password>) method to set your credentials. Check the BasicAuthChromaWhereIT for an example.

Integrate with OpenAI’s embeddings by adding the Spring Boot OpenAI starter to your project. This provides you with an implementation of the Embeddings client:

@Bean
public VectorStore chromaVectorStore(EmbeddingClient embeddingClient, ChromaApi chromaApi) {
 return new ChromaVectorStore(embeddingClient, chromaApi, "TestCollection");
}

In your main code, create some documents:

List<Document> documents = List.of(
 new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
 new Document("The World is Big and Salvation Lurks Around the Corner"),
 new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

Add the documents to your vector store:

vectorStore.add(documents);

And finally, retrieve documents similar to a query:

List<Document> results = vectorStore.similaritySearch("Spring");

If all goes well, you should retrieve the document containing the text "Spring AI rocks!!".

Metadata filtering

You can leverage the generic, portable metadata filters with ChromaVector store as well.

For example, you can use either the text expression language:

vectorStore.similaritySearch(
                    SearchRequest.defaults()
                            .withQuery("The World")
                            .withTopK(TOP_K)
                            .withSimilarityThreshold(SIMILARITY_THRESHOLD)
                            .withFilterExpression("author in ['john', 'jill'] && article_type == 'blog'"));

or programmatically using the Filter.Expression DSL:

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(SearchRequest.defaults()
                    .withQuery("The World")
                    .withTopK(TOP_K)
                    .withSimilarityThreshold(SIMILARITY_THRESHOLD)
                    .withFilterExpression(b.and(
                            b.in("john", "jill"),
                            b.eq("article_type", "blog")).build()));
Those (portable) filter expressions get automatically converted into the proprietary Chroma where filter expressions.

For example, this portable filter expression:

author in ['john', 'jill'] && article_type == 'blog'

is converted into the proprietary Chroma format

{"$and":[
	{"author": {"$in": ["john", "jill"]}},
	{"article_type":{"$eq":"blog"}}]
}

Run Chroma Locally

docker run -it --rm --name chroma -p 8000:8000 ghcr.io/chroma-core/chroma:0.4.15

Starts a chroma store at localhost:8000/api/v1