Retrieval Augmented Generation

Retrieval Augmented Generation (RAG) is a technique useful to overcome the limitations of large language models that struggle with long-form content, factual accuracy, and context-awareness.

Spring AI supports RAG by providing a modular architecture that allows you to build custom RAG flows yourself or use out-of-the-box RAG flows using the Advisor API.

Learn more about Retrieval Augmented Generation in the concepts section.


Spring AI provides out-of-the-box support for common RAG flows using the Advisor API.


A vector database stores data that the AI model is unaware of. When a user question is sent to the AI model, a QuestionAnswerAdvisor queries the vector database for documents related to the user question.

The response from the vector database is appended to the user text to provide context for the AI model to generate a response.

Assuming you have already loaded data into a VectorStore, you can perform Retrieval Augmented Generation (RAG) by providing an instance of QuestionAnswerAdvisor to the ChatClient.

ChatResponse response = ChatClient.builder(chatModel)
        .advisors(new QuestionAnswerAdvisor(vectorStore))

In this example, the QuestionAnswerAdvisor will perform a similarity search over all documents in the Vector Database. To restrict the types of documents that are searched, the SearchRequest takes an SQL like filter expression that is portable across all VectorStores.

This filter expression can be configured when creating the QuestionAnswerAdvisor and hence will always apply to all ChatClient requests or it can be provided at runtime per request.

Here is how to create an instance of QuestionAnswerAdvisor where the threshold is 0.8 and to return the top 6 reulsts.

var qaAdvisor = new QuestionAnswerAdvisor(this.vectorStore,

Dynamic Filter Expressions

Update the SearchRequest filter expression at runtime using the FILTER_EXPRESSION advisor context parameter:

ChatClient chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(new QuestionAnswerAdvisor(vectorStore, SearchRequest.builder().build()))

// Update filter expression at runtime
String content = this.chatClient.prompt()
    .user("Please answer my question XYZ")
    .advisors(a -> a.param(QuestionAnswerAdvisor.FILTER_EXPRESSION, "type == 'Spring'"))

The FILTER_EXPRESSION parameter allows you to dynamically filter the search results based on the provided expression.

RetrievalAugmentationAdvisor (Incubating)

Spring AI includes a library of RAG modules that you can use to build your own RAG flows. The RetrievalAugmentationAdvisor is an experimental Advisor providing an out-of-the-box implementation for the most common RAG flows, based on a modular architecture.

The RetrievalAugmentationAdvisor is an experimental feature and is subject to change in future releases.

Sequential RAG Flows

Naive RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()

String answer = chatClient.prompt()

By default, the RetrievalAugmentationAdvisor does not allow the retrieved context to be empty. When that happens, it instructs the model not to answer the user query. You can allow empty context as follows.

Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()

String answer = chatClient.prompt()
Advanced RAG
Advisor retrievalAugmentationAdvisor = RetrievalAugmentationAdvisor.builder()

String answer = chatClient.prompt()


Spring AI implements a Modular RAG architecture inspired by the concept of modularity detailed in the paper "Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks".


Modular RAG is an experimental feature and is subject to change in future releases.


Pre-Retrieval modules are responsible for processing the user query to achieve the best possible retrieval results.

Query Transformation

A component for transforming the input query to make it more effective for retrieval tasks, addressing challenges such as poorly formed queries, ambiguous terms, complex vocabulary, or unsupported languages.


A CompressionQueryTransformer uses a large language model to compress a conversation history and a follow-up query into a standalone query that captures the essence of the conversation.

This transformer is useful when the conversation history is long and the follow-up query is related to the conversation context.

Query query = Query.builder()
        .text("And what is its second largest city?")
        .history(new UserMessage("What is the capital of Denmark?"),
                new AssistantMessage("Copenhagen is the capital of Denmark."))

QueryTransformer queryTransformer = CompressionQueryTransformer.builder()

Query transformedQuery = queryTransformer.transform(query);

The prompt used by this component can be customized via the promptTemplate() method available in the builder.


A RewriteQueryTransformer uses a large language model to rewrite a user query to provide better results when querying a target system, such as a vector store or a web search engine.

This transformer is useful when the user query is verbose, ambiguous, or contains irrelevant information that may affect the quality of the search results.

Query query = new Query("I'm studying machine learning. What is an LLM?");

QueryTransformer queryTransformer = RewriteQueryTransformer.builder()

Query transformedQuery = queryTransformer.transform(query);

The prompt used by this component can be customized via the promptTemplate() method available in the builder.


A TranslationQueryTransformer uses a large language model to translate a query to a target language that is supported by the embedding model used to generate the document embeddings. If the query is already in the target language, it is returned unchanged. If the language of the query is unknown, it is also returned unchanged.

This transformer is useful when the embedding model is trained on a specific language and the user query is in a different language.

Query query = new Query("Hvad er Danmarks hovedstad?");

QueryTransformer queryTransformer = TranslationQueryTransformer.builder()

Query transformedQuery = queryTransformer.transform(query);

The prompt used by this component can be customized via the promptTemplate() method available in the builder.

Query Expansion

A component for expanding the input query into a list of queries, addressing challenges such as poorly formed queries by providing alternative query formulations, or by breaking down complex problems into simpler sub-queries.


A MultiQueryExpander uses a large language model to expand a query into multiple semantically diverse variations to capture different perspectives, useful for retrieving additional contextual information and increasing the chances of finding relevant results.

MultiQueryExpander queryExpander = MultiQueryExpander.builder()
List<Query> queries = expander.expand(new Query("How to run a Spring Boot app?"));

By default, the MultiQueryExpander includes the original query in the list of expanded queries. You can disable this behavior via the includeOriginal method in the builder.

MultiQueryExpander queryExpander = MultiQueryExpander.builder()

The prompt used by this component can be customized via the promptTemplate() method available in the builder.


Retrieval modules are responsible for querying data systems like vector store and retrieving the most relevant documents.

Component responsible for retrieving Documents from an underlying data source, such as a search engine, a vector store, a database, or a knowledge graph.


A VectorStoreDocumentRetriever retrieves documents from a vector store that are semantically similar to the input query. It supports filtering based on metadata, similarity threshold, and top-k results.

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
    .filterExpression(new FilterExpressionBuilder()
        .eq("genre", "fairytale")
List<Document> documents = retriever.retrieve(new Query("What is the main character of the story?"));

The filter expression can be static or dynamic. For dynamic filter expressions, you can pass a Supplier.

DocumentRetriever retriever = VectorStoreDocumentRetriever.builder()
    .filterExpression(() -> new FilterExpressionBuilder()
        .eq("tenant", TenantContextHolder.getTenantIdentifier())
List<Document> documents = retriever.retrieve(new Query("What are the KPIs for the next semester?"));

Document Join

A component for combining documents retrieved based on multiple queries and from multiple data sources into a single collection of documents. As part of the joining process, it can also handle duplicate documents and reciprocal ranking strategies.


A ConcatenationDocumentJoiner combines documents retrieved based on multiple queries and from multiple data sources by concatenating them into a single collection of documents. In case of duplicate documents, the first occurrence is kept. The score of each document is kept as is.

Map<Query, List<List<Document>>> documentsForQuery = ...
DocumentJoiner documentJoiner = new ConcatenationDocumentJoiner();
List<Document> documents = documentJoiner.join(documentsForQuery);


Post-Retrieval modules are responsible for processing the retrieved documents to achieve the best possible generation results.

Document Ranking

A component for ordering and ranking documents based on their relevance to a query to bring the most relevant documents to the top of the list, addressing challenges such as lost-in-the-middle.

Unlike DocumentSelector, this component does not remove entire documents from the list, but rather changes the order/score of the documents in the list. Unlike DocumentCompressor, this component does not alter the content of the documents.

Document Selection

A component for removing irrelevant or redundant documents from a list of retrieved documents, addressing challenges such as lost-in-the-middle and context length restrictions from the model.

Unlike DocumentRanker, this component does not change the order/score of the documents in the list, but rather removes irrelevant or redundant documents. Unlike DocumentCompressor, this component does not alter the content of the documents, but rather removes entire documents.

Document Compression

A component for compressing the content of each document to reduce noise and redundancy in the retrieved information, addressing challenges such as lost-in-the-middle and context length restrictions from the model.

Unlike DocumentSelector, this component does not remove entire documents from the list, but rather alters the content of the documents. Unlike DocumentRanker, this component does not change the order/score of the documents in the list.


Generation modules are responsible for generating the final response based on the user query and retrieved documents.

Query Augmentation

A component for augmenting an input query with additional data, useful to provide a large language model with the necessary context to answer the user query.


The ContextualQueryAugmenter augments the user query with contextual data from the content of the provided documents.

QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder().build();

By default, the ContextualQueryAugmenter does not allow the retrieved context to be empty. When that happens, it instructs the model not to answer the user query.

You can enable the allowEmptyContext option to allow the model to generate a response even when the retrieved context is empty.

QueryAugmenter queryAugmenter = ContextualQueryAugmenter.builder()

The prompts used by this component can be customized via the promptTemplate() and emptyContextPromptTemplate() methods available in the builder.