This version is still in development and is not considered stable yet. For the latest stable version, please use Spring AI 1.1.7!

Observability

Spring AI builds upon the observability features in the Spring ecosystem to provide insights into AI-related operations. It provides metrics and tracing capabilities for its core components: ChatClient (including Advisor), ChatModel, EmbeddingModel, ImageModel, and VectorStore.

Refer to the Spring Boot Metrics and Spring Boot Tracing documentation to enable metrics and tracing support in your application.

Low-cardinality keys will be added to metrics and traces, while high-cardinality keys will only be added to traces.

Chat Client

The spring.ai.chat.client observations are recorded when a ChatClient call() or stream() operations are invoked. They measure the time spent performing the invocation and propagate the related tracing information.

Table 1. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	Always `framework`.
`gen_ai.system`	Always `spring_ai`.
`spring.ai.chat.client.stream`	Is the chat model response a stream - `true or false`
`spring.ai.kind`	The kind of framework API in Spring AI: `chat_client`.

Table 2. High Cardinality Keys
Name	Description
`spring.ai.chat.client.advisors`	List of configured chat client advisors.
`spring.ai.chat.client.conversation.id`	Identifier of the conversation when using the chat memory.
`spring.ai.chat.client.tool.names`	Names of the tools passed to the chat client.

Prompt and Completion Data

The ChatClient prompt and completion data is typically big and possibly containing sensitive information. For those reasons, it is not exported by default.

Spring AI supports logging the prompt and completion data to help with debugging and troubleshooting.

Property Description Default

Property	Description	Default
`spring.ai.chat.client.observations.log-prompt`	Whether to log the chat client prompt content.	`false`
`spring.ai.chat.client.observations.log-completion`	Whether to log the chat client completion content.	`false`

spring.ai.chat.client.observations.log-prompt

Whether to log the chat client prompt content.

false

spring.ai.chat.client.observations.log-completion

Whether to log the chat client completion content.

false

If you enable logging of the chat client prompt and completion data, there’s a risk of exposing sensitive or private information. Please, be careful!

Chat Client Advisors

The spring.ai.advisor observations are recorded when an advisor is executed. They measure the time spent in the advisor (including the time spent on the inner advisors) and propagate the related tracing information.

Table 3. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	Always `framework`.
`gen_ai.system`	Always `spring_ai`.
`spring.ai.advisor.name`	Name of the advisor.
`spring.ai.kind`	The kind of framework API in Spring AI: `advisor`.

Table 4. High Cardinality Keys
Name	Description
`spring.ai.advisor.order`	Advisor order in the advisor chain.

Chat Model

Both the OpenAI and Anthropic chat models do emit HTTP-layer observations (chat-model layer + HTTP layer), with one limitation specific to streaming calls. The HTTP-layer observation carries HTTP method, URI, status code, and propagates traceparent on the wire to downstream services (AI gateways, proxies, OpenAI-compatible inference servers).

For synchronous calls, the okhttp.requests span is correctly nested under the gen_ai.client.operation span. For streaming calls, the HTTP span is recorded but is not parented under the chat-model span — the SDK’s async streaming path hops onto ForkJoinPool.commonPool() before invoking Spring AI’s HTTP client, dropping the calling thread’s observation context at that boundary.

See the OpenAI chat docs and Anthropic chat docs for details.

The gen_ai.client.operation observations are recorded when calling the ChatModel call or stream methods. They measure the time spent on method completion and propagate the related tracing information.

The gen_ai.client.token.usage metrics measures number of input and output tokens used by a single model call.

Table 5. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	The name of the operation being performed.
`gen_ai.system`	The model provider as identified by the client instrumentation.
`gen_ai.request.model`	The name of the model a request is being made to.
`gen_ai.response.model`	The name of the model that generated the response.

Table 6. High Cardinality Keys
Name	Description
`gen_ai.request.frequency_penalty`	The frequency penalty setting for the model request.
`gen_ai.request.max_tokens`	The maximum number of tokens the model generates for a request.
`gen_ai.request.presence_penalty`	The presence penalty setting for the model request.
`gen_ai.request.stop_sequences`	List of sequences that the model will use to stop generating further tokens.
`gen_ai.request.stream`	Whether the request was made in streaming mode. Only present when `true`.
`gen_ai.request.temperature`	The temperature setting for the model request.
`gen_ai.request.top_k`	The top_k sampling setting for the model request.
`gen_ai.request.top_p`	The top_p sampling setting for the model request.
`gen_ai.response.finish_reasons`	Reasons the model stopped generating tokens, corresponding to each generation received.
`gen_ai.response.id`	The unique identifier for the AI response.
`gen_ai.usage.cache_creation.input_tokens`	The number of input tokens written to a provider-managed cache.
`gen_ai.usage.cache_read.input_tokens`	The number of input tokens served from a provider-managed cache.
`gen_ai.usage.input_tokens`	The number of tokens used in the model input (prompt).
`gen_ai.usage.output_tokens`	The number of tokens used in the model output (completion).
`gen_ai.usage.total_tokens`	The total number of tokens used in the model exchange.
`spring.ai.model.request.tool.names`	List of tool definitions provided to the model in the request.

For measuring user tokens, the previous table lists the values present in an observation trace. Use the metric name gen_ai.client.token.usage that is provided by the ChatModel.

Chat Prompt and Completion Data

The chat prompt and completion data is typically big and possibly containing sensitive information. For those reasons, it is not exported by default.

Spring AI supports logging chat prompt and completion data, useful for troubleshooting scenarios. When tracing is available, the logs will include trace information for better correlation.

Property Description Default

Property	Description	Default
`spring.ai.chat.observations.log-prompt`	Log the prompt content. `true` or `false`	`false`
`spring.ai.chat.observations.log-completion`	Log the completion content. `true` or `false`	`false`
`spring.ai.chat.observations.include-error-logging`	Include error logging in observations. `true` or `false`	`false`

spring.ai.chat.observations.log-prompt

Log the prompt content. true or false

false

spring.ai.chat.observations.log-completion

Log the completion content. true or false

false

spring.ai.chat.observations.include-error-logging

Include error logging in observations. true or false

false

If you enable logging of the chat prompt and completion data, there’s a risk of exposing sensitive or private information. Please, be careful!

Tool Calling

The spring.ai.tool observations are recorded when performing tool calling in the context of a chat model interaction. They measure the time spent on tool call completion and propagate the related tracing information.

Table 7. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	The name of the operation being performed. It’s always `execute_tool`.
`gen_ai.system`	The provider responsible for the operation. It’s always `spring_ai`.
`spring.ai.kind`	The kind of operation performed by Spring AI. It’s always `tool_call`.
`spring.ai.tool.definition.name`	The name of the tool.
`spring.ai.tool.type`	The type of the tool. By default, it’s `function`.

Table 8. High Cardinality Keys
Name	Description
`spring.ai.tool.definition.description`	Description of the tool.
`spring.ai.tool.definition.schema`	Schema of the parameters used to call the tool.
`spring.ai.tool.call.id`	The ID of the tool call, as identified by the chat model.
`spring.ai.tool.call.arguments`	The input arguments to the tool call. (Only when enabled)
`spring.ai.tool.call.result`	The result of the tool call execution. (Only when enabled)

Tool Call Arguments and Result Data

The input arguments and result from the tool call are not exported by default, as they can be potentially sensitive.

Spring AI supports exporting tool call arguments and result data as span attributes.

Property Description Default

Property	Description	Default
`spring.ai.tools.observations.include-content`	Include the tool call content in observations. `true` or `false`	`false`

spring.ai.tools.observations.include-content

Include the tool call content in observations. true or false

false

If you enable the inclusion of the tool call arguments and result in the observations, there’s a risk of exposing sensitive or private information. Please, be careful!

EmbeddingModel

Observability features are currently supported only for EmbeddingModel implementations from the following AI model providers: Mistral AI, Ollama, and OpenAI. Additional AI model providers will be supported in a future release.

The gen_ai.client.operation observations are recorded on embedding model method calls. They measure the time spent on method completion and propagate the related tracing information.

The gen_ai.client.token.usage metrics measures number of input and output tokens used by a single model call.

Table 9. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	The name of the operation being performed.
`gen_ai.system`	The model provider as identified by the client instrumentation.
`gen_ai.request.model`	The name of the model a request is being made to.
`gen_ai.response.model`	The name of the model that generated the response.

Table 10. High Cardinality Keys
Name	Description
`gen_ai.request.embedding.dimensions`	The number of dimensions the resulting output embeddings have.
`gen_ai.usage.input_tokens`	The number of tokens used in the model input.
`gen_ai.usage.total_tokens`	The total number of tokens used in the model exchange.

For measuring user tokens, the previous table lists the values present in an observation trace. Use the metric name gen_ai.client.token.usage that is provided by the EmbeddingModel.

Image Model

Observability features are currently supported only for ImageModel implementations from the following AI model providers: OpenAI. Additional AI model providers will be supported in a future release.

The gen_ai.client.operation observations are recorded on image model method calls. They measure the time spent on method completion and propagate the related tracing information.

Table 11. Low Cardinality Keys
Name	Description
`gen_ai.operation.name`	The name of the operation being performed.
`gen_ai.system`	The model provider as identified by the client instrumentation.
`gen_ai.request.model`	The name of the model a request is being made to.

Table 12. High Cardinality Keys
Name	Description
`gen_ai.request.image.response_format`	The format in which the generated image is returned.
`gen_ai.request.image.size`	The size of the image to generate (e.g. `1024x1024`).
`gen_ai.request.image.style`	The style of the image to generate.

Image Prompt Data

The image prompt data is typically big and possibly containing sensitive information. For those reasons, it is not exported by default.

Spring AI supports logging image prompt data, useful for troubleshooting scenarios. When tracing is available, the logs will include trace information for better correlation.

Property Description Default

Property	Description	Default
`spring.ai.image.observations.log-prompt`	Log the image prompt content. `true` or `false`	`false`

spring.ai.image.observations.log-prompt

Log the image prompt content. true or false

false

If you enable logging of the image prompt data, there’s a risk of exposing sensitive or private information. Please, be careful!

Vector Stores

All vector store implementations in Spring AI are instrumented to provide metrics and distributed tracing data through Micrometer.

The db.vector.client.operation observations are recorded when interacting with the Vector Store. They measure the time spent on the query, add and remove operations and propagate the related tracing information.

Table 13. Low Cardinality Keys
Name	Description
`db.operation.name`	The name of the operation or command being executed. One of `add`, `delete`, or `query`.
`db.system`	The database management system (DBMS) product as identified by the client instrumentation. One of `pg_vector`, `azure`, `cassandra`, `chroma`, `elasticsearch`, `milvus`, `neo4j`, `opensearch`, `qdrant`, `redis`, `typesense`, `weaviate`, `pinecone`, `oracle`, `mongodb`, `gemfire`, `simple`.
`spring.ai.kind`	The kind of framework API in Spring AI: `vector_store`.

Table 14. High Cardinality Keys
Name	Description
`db.collection.name`	The name of a collection (table, container) within the database.
`db.namespace`	The name of the database, fully qualified within the server address and port.
`db.search.similarity_metric`	The metric used in similarity search.
`db.vector.dimension_count`	The dimension of the vector.
`db.vector.field_name`	The name field as of the vector (e.g. a field name).
`db.vector.query.content`	The content of the search query being executed.
`db.vector.query.filter`	The metadata filters used in the search query.
`db.vector.query.response.documents`	Returned documents from a similarity search query. Optional.
`db.vector.query.similarity_threshold`	Similarity threshold that accepts all search scores. A threshold value of 0.0 means any similarity is accepted or disable the similarity threshold filtering. A threshold value of 1.0 means an exact match is required.
`db.vector.query.top_k`	The top-k most similar vectors returned by a query.

Response Data

The vector search response data is typically big and possibly containing sensitive information. For those reasons, it is not exported by default.

Spring AI supports logging vector search response data, useful for troubleshooting scenarios. When tracing is available, the logs will include trace information for better correlation.

Property Description Default

Property	Description	Default
`spring.ai.vectorstore.observations.log-query-response`	Log the vector store query response content. `true` or `false`	`false`

spring.ai.vectorstore.observations.log-query-response

Log the vector store query response content. true or false

false

If you enable logging of the vector search response data, there’s a risk of exposing sensitive or private information. Please, be careful!

More Metrics Reference

This section documents the metrics emitted by Spring AI components as they appear in Prometheus.

Metric Naming Conventions

Spring AI uses Micrometer. Base metric names use dots (e.g., gen_ai.client.operation), which Prometheus exports with underscores and standard suffixes:

Timers → <base>_seconds_count, <base>_seconds_sum, <base>_seconds_max, and (when supported) <base>_active_count
Counters → <base>_total (monotonic)

The following shows how base metric names expand to Prometheus time series.

Base metric name Exported time series

gen_ai.client.operation

gen_ai_client_operation_seconds_count
gen_ai_client_operation_seconds_sum
gen_ai_client_operation_seconds_max
gen_ai_client_operation_active_count

db.vector.client.operation

db_vector_client_operation_seconds_count
db_vector_client_operation_seconds_sum
db_vector_client_operation_seconds_max
db_vector_client_operation_active_count

References

OpenTelemetry — Semantic Conventions for Generative AI (overview)
Micrometer — Naming Meters

Chat Client Metrics

Metric Name Type Unit Description

Metric Name	Type	Unit	Description
`gen_ai_chat_client_operation_seconds_sum`	Timer	seconds	Total time spent in ChatClient operations (call/stream)
`gen_ai_chat_client_operation_seconds_count`	Counter	count	Number of completed ChatClient operations
`gen_ai_chat_client_operation_seconds_max`	Gauge	seconds	Maximum observed duration of ChatClient operations
`gen_ai_chat_client_operation_active_count`	Gauge	count	Number of ChatClient operations currently in flight

gen_ai_chat_client_operation_seconds_sum

Timer

seconds

Total time spent in ChatClient operations (call/stream)

gen_ai_chat_client_operation_seconds_count

Counter

count

Number of completed ChatClient operations

gen_ai_chat_client_operation_seconds_max

Gauge

seconds

Maximum observed duration of ChatClient operations

gen_ai_chat_client_operation_active_count

Gauge

count

Number of ChatClient operations currently in flight

Active vs Completed: active_count shows in-flight calls; the _seconds series reflect only completed calls.

Chat Model Metrics (Model provider execution)

Metric Name Type Unit Description

Metric Name	Type	Unit	Description
`gen_ai_client_operation_seconds_sum`	Timer	seconds	Total time executing chat model operations
`gen_ai_client_operation_seconds_count`	Counter	count	Number of completed chat model operations
`gen_ai_client_operation_seconds_max`	Gauge	seconds	Maximum observed duration for chat model operations
`gen_ai_client_operation_active_count`	Gauge	count	Number of chat model operations currently in flight

gen_ai_client_operation_seconds_sum

Timer

seconds

Total time executing chat model operations

gen_ai_client_operation_seconds_count

Counter

count

Number of completed chat model operations

gen_ai_client_operation_seconds_max

Gauge

seconds

Maximum observed duration for chat model operations

gen_ai_client_operation_active_count

Gauge

count

Number of chat model operations currently in flight

Token Usage

Metric Name Type Unit Description

Metric Name	Type	Unit	Description
`gen_ai_client_token_usage_total`	Counter	tokens	Total tokens consumed, labeled by token type

gen_ai_client_token_usage_total

Counter

tokens

Total tokens consumed, labeled by token type

Labels

Label Meaning

Label	Meaning
`gen_ai_token_type=input`	Prompt tokens sent to the model
`gen_ai_token_type=output`	Completion tokens returned by the model
`gen_ai_token_type=total`	Input + output

gen_ai_token_type=input

Prompt tokens sent to the model

gen_ai_token_type=output

Completion tokens returned by the model

gen_ai_token_type=total

Input + output

Vector Store Metrics

Metric Name Type Unit Description

Metric Name	Type	Unit	Description
`db_vector_client_operation_seconds_sum`	Timer	seconds	Total time spent in vector store operations (add/delete/query)
`db_vector_client_operation_seconds_count`	Counter	count	Number of completed vector store operations
`db_vector_client_operation_seconds_max`	Gauge	seconds	Maximum observed duration for vector store operations
`db_vector_client_operation_active_count`	Gauge	count	Number of vector store operations currently in flight

db_vector_client_operation_seconds_sum

Timer

seconds

Total time spent in vector store operations (add/delete/query)

db_vector_client_operation_seconds_count

Counter

count

Number of completed vector store operations

db_vector_client_operation_seconds_max

Gauge

seconds

Maximum observed duration for vector store operations

db_vector_client_operation_active_count

Gauge

count

Number of vector store operations currently in flight

Labels

Label Meaning

Label	Meaning
`db_operation_name`	Operation type (`add`, `delete`, `query`)
`db_system`	Vector DB/provider (`redis`, `chroma`, `pgvector`, …)
`spring_ai_kind`	`vector_store`

db_operation_name

Operation type (add, delete, query)

db_system

Vector DB/provider (redis, chroma, pgvector, …)

spring_ai_kind

vector_store

Understanding Active vs Completed

Active (*_active_count) — instantaneous gauge of in-progress operations (concurrency/load).
Completed (*_seconds_sum|count|max) — statistics for operations that have finished:
_seconds_sum / _seconds_count → average latency
_seconds_max → high-water mark since last scrape (subject to registry behavior)