Class CassandraVectorStore

All Implemented Interfaces:
AutoCloseable, Consumer<List<Document>>, DocumentWriter, VectorStore

public class CassandraVectorStore extends AbstractObservationVectorStore implements AutoCloseable
The CassandraVectorStore is for managing and querying vector data in an Apache Cassandra db. It offers functionalities like adding, deleting, and performing similarity searches on documents. The store utilizes CQL to index and search vector data. It allows for custom metadata fields in the documents to be stored alongside the vector and content data. This class requires a CassandraVectorStoreConfig configuration object for initialization, which includes settings like connection details, index name, column names, etc. It also requires an EmbeddingModel to convert documents into embeddings before storing them. A schema matching the configuration is automatically created if it doesn't exist. Missing columns and indexes in existing tables will also be automatically created. Disable this with the CassandraVectorStoreConfig#disallowSchemaChanges(). This class is designed to work with brand new tables that it creates for you, or on top of existing Cassandra tables. The latter is appropriate when wanting to keep data in place, creating embeddings next to it, and performing vector similarity searches in-situ. Instances of this class are not dynamic against server-side schema changes. If you change the schema server-side you need a new CassandraVectorStore instance. When adding documents with the method AbstractObservationVectorStore.add(List<Document>) it first calls embeddingModel to create the embeddings. This is slow. Configure CassandraVectorStoreConfig.Builder.withFixedThreadPoolExecutorSize(int) accordingly to improve performance so embeddings are created and the documents are added concurrently. The default concurrency is 16 (CassandraVectorStoreConfig.DEFAULT_ADD_CONCURRENCY). Remote transformers probably want higher concurrency, and local transformers may need lower concurrency. This concurrency limit does not need to be higher than the max parallel calls made to the AbstractObservationVectorStore.add(List<Document>) method multiplied by the list size. This setting can also serve as a protecting throttle against your embedding model.
Since:
1.0.0
Author:
Mick Semb Wever, Christian Tzolov, Thomas Vitale, Soby Chacko
See Also: