Recursive Advisors

What is a Recursive Advisor?

Advisors Recursive Recursive advisors are a special type of advisor that can loop through the downstream advisor chain multiple times. This pattern is useful when you need to repeatedly call the LLM until a certain condition is met, such as:

  • Executing tool calls in a loop until no more tools need to be called

  • Validating structured output and retrying if validation fails

  • Implementing Evaluation logic with modifications to the request

  • Implementing retry logic with modifications to the request

The CallAdvisorChain.copy(CallAdvisor after) method is the key utility that enables recursive advisor patterns. It creates a new advisor chain that contains only the advisors that come after the specified advisor in the original chain and allows the recursive advisor to call this sub-chain as needed. This approach ensures that:

  • The recursive advisor can loop through the remaining advisors in the chain

  • Other advisors in the chain can observe and intercept each iteration

  • The advisor chain maintains proper ordering and observability

  • The recursive advisor doesn’t re-execute advisors that came before it

Built-in Recursive Advisors

Spring AI provides two built-in recursive advisors that demonstrate this pattern:

ToolCallingAdvisor

The ToolCallingAdvisor implements the tool calling loop as part of the advisor chain, rather than relying on the model’s internal tool execution. This enables other advisors in the chain to intercept and observe the tool calling process.

Key features:

  • Loops through the advisor chain until the ToolExecutionEligibilityChecker reports no more tool calls

  • Supports "return direct" functionality - when a tool execution has returnDirect=true, it interrupts the tool calling loop and returns the tool execution result directly to the client application instead of sending it back to the LLM

  • Uses callAdvisorChain.copy(this) to create a sub-chain for recursive calls

  • Supports configurable conversation history management via conversationHistoryEnabled

  • Supports a pluggable ToolExecutionEligibilityChecker to customize when the loop iterates

Example usage:

var toolCallingAdvisor = ToolCallingAdvisor.builder()
    .toolCallingManager(toolCallingManager)
    .advisorOrder(BaseAdvisor.HIGHEST_PRECEDENCE + 300)
    .build();

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(toolCallingAdvisor)
    .build();

Conversation History Management

The ToolCallingAdvisor includes a conversationHistoryEnabled configuration option that controls how conversation history is managed during tool calling iterations.

By default (conversationHistoryEnabled=true), the advisor maintains the full conversation history internally during tool call iterations. This means each subsequent LLM call in the tool calling loop includes all previous messages (user message, assistant responses, tool responses).

By default, memory advisors (DEFAULT_CHAT_MEMORY_PRECEDENCE_ORDER = HIGHEST_PRECEDENCE + 200) are placed outside the tool-call loop. The ToolCallingAdvisor (at HIGHEST_PRECEDENCE + 300) manages conversation history internally across iterations. The memory advisor loads history once before the loop and persists only the final user/assistant exchange after it ends. This is the recommended setup because most ChatMemoryRepository implementations do not support tool-call message types.

// Default setup: memory advisor is outside the tool-call loop (no explicit order needed)
var chatMemoryAdvisor = MessageChatMemoryAdvisor.builder(chatMemory).build();
// DEFAULT_CHAT_MEMORY_PRECEDENCE_ORDER = HIGHEST_PRECEDENCE + 200 < ToolCallingAdvisor.DEFAULT_ORDER (+300)
// ToolCallingAdvisor manages its own conversation history across tool-call iterations

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(chatMemoryAdvisor)
    .build();

Use the .disableInternalConversationHistory() method only when placing a memory advisor inside the tool-call loop (order above ToolCallingAdvisor.DEFAULT_ORDER). The memory advisor then handles history on every iteration. Note that only InMemoryChatMemoryRepository supports persisting tool-call messages; other repositories should use the default outside-loop setup above.

The spring-ai-session community project provides a session-aware memory implementation that fully supports tool-call messages and can be safely used inside the tool-call loop with any backend. See the spring-ai-session documentation.
var toolCallingAdvisor = ToolCallingAdvisor.builder()
    .toolCallingManager(toolCallingManager)
    .disableInternalConversationHistory()  // Memory advisor inside the loop handles history
    .advisorOrder(BaseAdvisor.HIGHEST_PRECEDENCE + 300)
    .build();

var chatMemoryAdvisor = MessageChatMemoryAdvisor.builder(chatMemory)
    .advisorOrder(BaseAdvisor.HIGHEST_PRECEDENCE + 400)  // Inside (after) ToolCallingAdvisor
    .build();

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(chatMemoryAdvisor, toolCallingAdvisor)
    .build();

User-Controlled Tool Execution

By default, ToolCallingAdvisor is auto-registered and manages the entire tool-calling loop internally — the caller only receives the final LLM answer. When you need full control over the loop (for example, to stream intermediate progress to a UI, add custom observability, or apply conditional logic between iterations), you can opt out of the auto-registered advisor and drive the loop yourself.

Disable the auto-registration on a per-call basis using AdvisorParams.toolCallingAdvisorAutoRegister(false):

ToolCallingManager toolCallingManager = ToolCallingManager.builder().build();
ToolCallback[] tools = ToolCallbacks.from(new WeatherTools());
ChatOptions chatOptions = ToolCallingChatOptions.builder()
    .toolCallbacks(tools)
    .build();

String question = "What is the weather in Amsterdam and Paris?";

// ToolCallingAdvisor is disabled — no tool loop runs automatically
ChatClientResponse response = chatClient.prompt()
    .user(question)
    .options(chatOptions)
    .advisors(AdvisorParams.toolCallingAdvisorAutoRegister(false))
    .call()
    .chatClientResponse();

Prompt prompt = new Prompt(List.of(new UserMessage(question)), chatOptions);

// Drive the loop manually — each iteration can be forwarded to a UI as it happens
while (response.chatResponse() != null && response.chatResponse().hasToolCalls()) {
    ToolExecutionResult result = toolCallingManager.executeToolCalls(prompt, response.chatResponse());
    prompt = new Prompt(result.conversationHistory(), chatOptions);
    response = chatClient.prompt()
        .messages(result.conversationHistory())
        .options(chatOptions)
        .advisors(AdvisorParams.toolCallingAdvisorAutoRegister(false))
        .call()
        .chatClientResponse();
}

For a complete example including the streaming path, see User-Controlled Tool Execution — With ChatClient.

Observing the Tool-Calling Loop

As an alternative to driving the loop manually, you can place a custom advisor inside the ToolCallingAdvisor loop by giving it an order value greater than ToolCallingAdvisor.DEFAULT_ORDER (e.g. HIGHEST_PRECEDENCE + 400). Such an advisor is invoked on every iteration of the tool-calling loop, not just once at the end, which means it has access to all intermediate messages:

  • In streaming mode — it receives the model’s raw chunk stream for each iteration, including tool-call request chunks, before ToolCallingAdvisor filters them out of the outbound stream.

  • In the call path — the conversation history passed in each subsequent request includes the ToolResponseMessage(s) from the previous iteration, so the advisor observes both tool-call requests and their responses.

This pattern lets you forward intermediate chunks to a side channel (SSE, WebSocket, log) without disrupting the tool-calling loop:

public class ToolCallObservingAdvisor implements CallAdvisor, StreamAdvisor {

    private final Consumer<ChatClientResponse> observer;

    public ToolCallObservingAdvisor(Consumer<ChatClientResponse> observer) {
        this.observer = observer;
    }

    @Override
    public ChatClientResponse aroundCall(ChatClientRequest request, CallAdvisorChain chain) {
        // Inspect all messages on every iteration — subsequent requests include ToolResponseMessages
        request.prompt().getInstructions().forEach(msg -> log.debug("Message: {}", msg));
        ChatClientResponse response = chain.nextCall(request);
        observer.accept(response);
        return response;
    }

    @Override
    public Flux<ChatClientResponse> aroundStream(ChatClientRequest request, StreamAdvisorChain chain) {
        // Observe every chunk including tool-call request chunks that ToolCallingAdvisor will suppress upstream
        return chain.nextStream(request).doOnNext(observer);
    }

    @Override
    public int getOrder() {
        return Ordered.HIGHEST_PRECEDENCE + 400; // inside ToolCallingAdvisor (order 300)
    }
}

Register the observing advisor alongside the auto-registered ToolCallingAdvisor:

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(new ToolCallObservingAdvisor(chunk -> forwardToSse(chunk)))
    .build();

String response = chatClient.prompt()
    .user("What is the weather in Amsterdam and Paris?")
    .tools(new WeatherTools())
    .call()
    .content();

Because ToolCallObservingAdvisor is at order HIGHEST_PRECEDENCE + 400 it is inserted after the auto-registered ToolCallingAdvisor (order HIGHEST_PRECEDENCE + 300) in the chain, so it participates in every tool-call iteration. ToolCallingAdvisor still filters tool-call chunks from what it returns to the outer chain, so the main caller receives only the final answer — the observing advisor handles the side-channel emission.

This approach keeps the tool-calling loop fully managed by the framework while still giving you complete visibility into every intermediate step, and it pairs naturally with the conversationHistoryEnabled and memory advisor patterns described in Conversation History Management.

Return Direct Functionality

The "return direct" feature allows tools to bypass the LLM and return their results directly to the client application. This is useful when:

  • The tool’s output is the final answer and doesn’t need LLM processing

  • You want to reduce latency by avoiding an additional LLM call

  • The tool result should be returned as-is without interpretation

When a tool execution has returnDirect=true, the ToolCallingAdvisor will:

  1. Execute the tool call as normal

  2. Detect the returnDirect flag in the ToolExecutionResult

  3. Break out of the tool calling loop

  4. Return the tool execution result directly to the client application as a ChatResponse with the tool’s output as the generation content

StructuredOutputValidationAdvisor

The StructuredOutputValidationAdvisor validates the structured JSON output against a JSON schema and retries the call if validation fails, up to a specified number of attempts.

Key features:

  • Derives a JSON schema from the expected output type, or accepts a pre-supplied schema string

  • Validates the LLM response against the schema

  • Retries the call if validation fails, up to a configurable number of attempts (default: 3)

  • Augments the prompt with validation error messages on retry attempts to help the LLM correct its output

  • Uses callAdvisorChain.copy(this) to create a sub-chain for recursive calls

  • Optionally supports a custom JsonMapper for JSON processing

The advisor can be configured via outputType (schema derived automatically) or outputJsonSchema (pre-supplied schema string); the two options are mutually exclusive.

Example usage with outputType:

var validationAdvisor = StructuredOutputValidationAdvisor.builder()
    .outputType(MyResponseType.class)
    .maxRepeatAttempts(3)
    .build();

var chatClient = ChatClient.builder(chatModel)
    .defaultAdvisors(validationAdvisor)
    .build();

Example usage with a pre-supplied JSON schema:

var validationAdvisor = StructuredOutputValidationAdvisor.builder()
    .outputJsonSchema(myConverter.getJsonSchema())
    .build();

Alternatively, you can enable schema validation directly on an entity() call without configuring the advisor manually, using EntityParamSpec:

ActorFilms actorFilms = chatClient.prompt()
    .user("Generate the filmography for a random actor.")
    .call()
    .entity(ActorFilms.class, spec -> spec.schemaValidation());