Interface SemanticCache
- All Known Implementing Classes:
DefaultSemanticCache
public interface SemanticCache
Interface defining operations for a semantic cache implementation that stores and
retrieves chat responses based on semantic similarity of queries. This cache uses
vector embeddings to determine similarity between queries.
The semantic cache provides functionality to:
- Store chat responses with their associated queries
- Retrieve responses for semantically similar queries
- Support time-based expiration of cached entries
- Support context-based isolation (e.g., different system prompts)
- Clear the entire cache
Implementations should ensure thread-safety and proper resource management.
- Author:
- Brian Sam-Bodden, Soby Chacko
-
Method Summary
Modifier and TypeMethodDescriptionvoidclear()Removes all entries from the cache.Retrieves a cached response for a semantically similar query.default Optional<ChatResponse> Retrieves a cached response for a semantically similar query, filtered by context.getStore()Returns the underlying vector store used by this cache implementation.voidset(String query, ChatResponse response) Stores a query and its corresponding chat response in the cache.default voidset(String query, ChatResponse response, @Nullable String contextHash) Stores a query and its corresponding chat response in the cache with an optional context identifier for isolation.voidset(String query, ChatResponse response, Duration ttl) Stores a query and response in the cache with a specified time-to-live duration.
-
Method Details
-
set
Stores a query and its corresponding chat response in the cache. Implementations should handle vector embedding of the query and proper storage of both the query embedding and response.- Parameters:
query- The original query text to be cachedresponse- The chat response associated with the query
-
set
Stores a query and its corresponding chat response in the cache with an optional context identifier for isolation. The context hash ensures that cached responses are only returned for queries with matching context (e.g., same system prompt).- Parameters:
query- The original query text to be cachedresponse- The chat response associated with the querycontextHash- Optional hash identifier for context isolation (e.g., system prompt hash). If null, behaves the same asset(String, ChatResponse).
-
set
Stores a query and response in the cache with a specified time-to-live duration. After the TTL expires, the entry should be automatically removed from the cache.- Parameters:
query- The original query text to be cachedresponse- The chat response associated with the queryttl- The duration after which the cache entry should expire
-
get
Retrieves a cached response for a semantically similar query. The implementation should:- Convert the input query to a vector embedding
- Search for similar query embeddings in the cache
- Return the response associated with the most similar query if it meets the similarity threshold
- Parameters:
query- The query to find similar responses for- Returns:
- Optional containing the most similar cached response if found and meets similarity threshold, empty Optional otherwise
-
get
Retrieves a cached response for a semantically similar query, filtered by context. Only returns responses that were stored with the same context hash, ensuring isolation between different contexts (e.g., different system prompts).- Parameters:
query- The query to find similar responses forcontextHash- Optional hash identifier for context filtering. If null, behaves the same asget(String).- Returns:
- Optional containing the most similar cached response if found, matches context, and meets similarity threshold; empty Optional otherwise
-
clear
void clear()Removes all entries from the cache. This operation should be atomic and thread-safe. -
getStore
VectorStore getStore()Returns the underlying vector store used by this cache implementation. This allows access to lower-level vector operations if needed.- Returns:
- The VectorStore instance used by this cache
-