Metrics and Management

This section describes how to capture metrics for Spring Integration. In recent versions, we have relied more on Micrometer (see https://micrometer.io), and we plan to use Micrometer even more in future releases.

Configuring Metrics Capture

Prior to version 4.2, metrics were only available when JMX was enabled. See JMX Support.

To enable MessageSource, MessageChannel, and MessageHandler metrics, add an <int:management/> bean to the application context (in XML) or annotate one of your @Configuration classes with @EnableIntegrationManagement (in Java). MessageSource instances maintain only counts, MessageChannel instances and MessageHandler instances maintain duration statistics in addition to counts. See MessageChannel Metric Features and MessageHandler Metric Features, later in this chapter.

Doing so causes the automatic registration of the IntegrationManagementConfigurer bean in the application context. Only one such bean can exist in the context, and, if registered manually via a <bean/> definition, it must have the bean name set to integrationManagementConfigurer. This bean applies its configuration to beans after all beans in the context have been instantiated.

In addition to metrics, you can control debug logging in the main message flow. In very high volume applications, even calls to isDebugEnabled() can be quite expensive with some logging subsystems. You can disable all such logging to avoid this overhead. Exception logging (debug or otherwise) is not affected by this setting.

The following listing shows the available options for controlling logging:

<int:management
    default-logging-enabled="true" (1)
    default-counts-enabled="false" (2)
    default-stats-enabled="false" (3)
    counts-enabled-patterns="foo, !baz, ba*" (4)
    stats-enabled-patterns="fiz, buz" (5)
    metrics-factory="myMetricsFactory" /> (6)
@Configuration
@EnableIntegration
@EnableIntegrationManagement(
    defaultLoggingEnabled = "true", (1)
    defaultCountsEnabled = "false", (2)
    defaultStatsEnabled = "false", (3)
    countsEnabled = { "foo", "${count.patterns}" }, (4)
    statsEnabled = { "qux", "!*" }, (5)
    MetricsFactory = "myMetricsFactory") (6)
public static class ContextConfiguration {
...
}
1 Set to false to disable all logging in the main message flow, regardless of the log system category settings. Set to 'true' to enable debug logging (if also enabled by the logging subsystem). Only applied if you have not explicitly configured the setting in a bean definition. The default is true.
2 Enable or disable count metrics for components that do not match one of the patterns in <4>. Only applied if you have not explicitly configured the setting in a bean definition. The default is false.
3 Enable or disable statistical metrics for components that do not match one of the patterns in <5>. Only applied if you have not explicitly configured the setting in a bean definition. The default is 'false'.
4 A comma-delimited list of patterns for beans for which counts should be enabled. You can negate the pattern with !. First match (positive or negative) wins. In the unlikely event that you have a bean name starting with !, escape the ! in the pattern. For example, \!something positively matches a bean named !something.
5 A comma-delimited list of patterns for beans for which statistical metrics should be enabled. You can negate the pattern\ with !. First match (positive or negative) wins. In the unlikely event that you have a bean name starting with !, escape the ! in the pattern. \!something positively matches a bean named !something. The collection of statistics implies the collection of counts.
6 A reference to a MetricsFactory. See Metrics Factory.

At runtime, counts and statistics can be obtained by calling getChannelMetrics, getHandlerMetrics and getSourceMetrics (all from the IntegrationManagementConfigurer class), which return MessageChannelMetrics, MessageHandlerMetrics, and MessageSourceMetrics, respectively.

See the Javadoc for complete information about these classes.

When JMX is enabled (see JMX Support), IntegrationMBeanExporter also exposes these metrics.

IMPORTANT: defaultLoggingEnabled, defaultCountsEnabled, and defaultStatsEnabled are applied only if you have not explicitly configured the corresponding setting in a bean definition.

Starting with version 5.0.2, the framework automatically detects whether the application context has a single MetricsFactory bean and, if so, uses it instead of the default metrics factory.

These legacy metrics have been deprecated in favor of Micrometer metrics discussed below. Legacy metrics support will be removed in a future release.

Micrometer Integration

Starting with version 5.0.3, the presence of a Micrometer MeterRegistry in the application context triggers support for Micrometer metrics in addition to the built-in metrics (note that the legacy built-in metrics will be removed in a future release).

Micrometer was first supported in version 5.0.2, but changes were made to the Micrometer Meters in version 5.0.3 to make them more suitable for use in dimensional systems. Further changes were made in 5.0.4. If you use Micrometer, a minimum of version 5.0.4 is recommended, since some of the changes in 5.0.4 were breaking API changes.

To use Micrometer, add one of the MeterRegistry beans to the application context. If the IntegrationManagementConfigurer detects exactly one MeterRegistry bean, it configures a MicrometerMetricsCaptor bean with a name of integrationMicrometerMetricsCaptor.

For each MessageHandler and MessageChannel, timers are registered. For each MessageSource, a counter is registered.

This only applies to objects that extend AbstractMessageHandler, AbstractMessageChannel, and AbstractMessageSource (which is the case for most framework components).

With Micrometer metrics, the statsEnabled flag has no effect, since statistics capture is delegated to Micrometer. The countsEnabled flag controls whether the Micrometer Meter instances are updated when processing each message.

The Timer Meters for send operations on message channels have the following names or tags:

  • name: spring.integration.send

  • tag: type:channel

  • tag: name:<componentName>

  • tag: result:(success|failure)

  • tag: exception:(none|exception simple class name)

  • description: Send processing time

(A failure result with a none exception means the channel’s send() operation returned false.)

The Counter Meters for receive operations on pollable message channels have the following names or tags:

  • name: spring.integration.receive

  • tag: type:channel

  • tag: name:<componentName>

  • tag: result:(success|failure)

  • tag: exception:(none|exception simple class name)

  • description: Messages received

The Timer Meters for operations on message handlers have the following names or tags:

  • name: spring.integration.send

  • tag: type:handler

  • tag: name:<componentName>

  • tag: result:(success|failure)

  • tag: exception:(none|exception simple class name)

  • description: Send processing time

The Counter meters for message sources have the following names/tags:

  • name: spring.integration.receive

  • tag: type:source

  • tag: name:<componentName>

  • tag: result:success

  • tag: exception:none

  • description: Messages received

In addition, there are three Gauge Meters:

  • spring.integration.channels: The number of MessageChannels in the application.

  • spring.integration.handlers: The number of MessageHandlers in the application.

  • spring.integration.sources: The number of MessageSources in the application.

It is possible to customize the names and tags of Meters created by integration components by providing a subclass of MicrometerMetricsCaptor. The MicrometerCustomMetricsTests test case shows a simple example of how to do that. You can also further customize the meters by overloading the build() methods on builder subclasses.

Starting with version 5.1.13, the QueueChannel exposes Micrometer gauges for queue size and remaining capacity:

  • name: spring.integration.channel.queue.size

  • tag: type:channel

  • tag: name:<componentName>

  • description: The size of queue channel

and

  • name: spring.integration.channel.queue.remaining.capacity

  • tag: type:channel

  • tag: name:<componentName>

  • description: The remaining.capacity of queue channel

MessageChannel Metric Features

These legacy metrics will be removed in a future release. See Micrometer Integration.

Message channels report metrics according to their concrete type. If you are looking at a DirectChannel, you see statistics for the send operation. If it is a QueueChannel, you also see statistics for the receive operation as well as the count of messages that are currently buffered by this QueueChannel. In both cases, some metrics are simple counters (message count and error count), and some are estimates of averages of interesting quantities. The algorithms used to calculate these estimates are described briefly in the following table.

Table 1. MessageChannel Metrics
Metric Type Example Algorithm

Count

Send Count

Simple incrementer. Increases by one when an event occurs.

Error Count

Send Error Count

Simple incrementer. Increases by one when an send results in an error.

Duration

Send Duration (method execution time in milliseconds)

Exponential moving average with decay factor (ten by default). Average of the method execution time over roughly the last ten (by default) measurements.

Rate

Send Rate (number of operations per second)

Inverse of Exponential moving average of the interval between events with decay in time (lapsing over 60 seconds by default) and per measurement (last ten events by default).

Error Rate

Send Error Rate (number of errors per second)

Inverse of exponential moving average of the interval between error events with decay in time (lapsing over 60 seconds by default) and per measurement (last ten events by default).

Ratio

Send Success Ratio (ratio of successful to total sends)

Estimate the success ratio as the exponential moving average of the series composed of values (1 for success and 0 for failure, decaying as per the rate measurement over time and events by default). The error ratio is: 1 - success ratio.

MessageHandler Metric Features

These legacy metrics will be removed in a future release. See Micrometer Integration.

The following table shows the statistics maintained for message handlers. Some metrics are simple counters (message count and error count), and one is an estimate of averages of send duration. The algorithms used to calculate these estimates are described briefly in the following table:

Table 2. MessageHandlerMetrics
Metric Type Example Algorithm

Count

Handle Count

Simple incrementer. Increases by one when an event occurs.

Error Count

Handler Error Count

Simple incrementer. Increases by one when an invocation results in an error.

Active Count

Handler Active Count

Indicates the number of currently active threads currently invoking the handler (or any downstream synchronous flow).

Duration

Handle Duration (method execution time in milliseconds)

Exponential moving average with decay factor (ten by default). Average of the method execution time over roughly the last ten (default) measurements.

Time-Based Average Estimates

A feature of the time-based average estimates is that they decay with time if no new measurements arrive. To help interpret the behavior over time, the time (in seconds) since the last measurement is also exposed as a metric.

There are two basic exponential models: decay per measurement (appropriate for duration and anything where the number of measurements is part of the metric) and decay per time unit (more suitable for rate measurements where the time in between measurements is part of the metric). Both models depend on the fact that S(n) = sum(i=0,i=n) w(i) x(i) has a special form when w(i) = r^i, with r=constant: S(n) = x(n) + r S(n-1) (so you only have to store S(n-1) (not the whole series x(i)) to generate a new metric estimate from the last measurement). The algorithms used in the duration metrics use r=exp(-1/M) with M=10. The net effect is that the estimate, S(n), is more heavily weighted to recent measurements and is composed roughly of the last M measurements. So M is the “window” or lapse rate of the estimate. For the vanilla moving average, i is a counter over the number of measurements. For the rate, we interpret i as the elapsed time or a combination of elapsed time and a counter (so the metric estimate contains contributions roughly from the last M measurements and the last T seconds).

Metrics Factory

A strategy interface MetricsFactory has been introduced to let you provide custom channel metrics for your MessageChannel instances and MessageHandler instances. By default, a DefaultMetricsFactory provides a default implementation of MessageChannelMetrics and MessageHandlerMetrics, described earlier. To override the default MetricsFactory, configure it as described earlier, by providing a reference to your MetricsFactory bean instance. You can either customize the default implementations, as described in the next section, or provide completely different implementations by extending AbstractMessageChannelMetrics or AbstractMessageHandlerMetrics.

In addition to the default metrics factory described earlier, the framework provides the AggregatingMetricsFactory. This factory creates AggregatingMessageChannelMetrics and AggregatingMessageHandlerMetrics instances. In very high volume scenarios, the cost of capturing statistics can be prohibitive (the time to make two calls to the system and store the data for each message). The aggregating metrics aggregate the response time over a sample of messages. This can save significant CPU time.

The statistics are likely to be skewed if messages arrive in bursts. These metrics are intended for use with high, constant-volume, message rates.

The following example shows how to define an aggregrating metrics factory:

<bean id="aggregatingMetricsFactory"
            class="org.springframework.integration.support.management.AggregatingMetricsFactory">
    <constructor-arg value="1000" /> <!-- sample size -->
</bean>

The preceding configuration aggregates the duration over 1000 messages. Counts (send and error) are maintained per-message, but the statistics are per 1000 messages.

Customizing the Default Channel and Handler Statistics

See Time-Based Average Estimates and the Javadoc for the ExponentialMovingAverage* classes for more information about these values.

By default, the DefaultMessageChannelMetrics and DefaultMessageHandlerMetrics use a “window” of ten measurements, a rate period of one second (meaning rate per second) and a decay lapse period of one minute.

If you wish to override these defaults, you can provide a custom MetricsFactory that returns appropriately configured metrics and provide a reference to it in the MBean exporter, as described earlier.

The following example shows how to do so:

public static class CustomMetrics implements MetricsFactory {

    @Override
    public AbstractMessageChannelMetrics createChannelMetrics(String name) {
        return new DefaultMessageChannelMetrics(name,
                new ExponentialMovingAverage(20, 1000000.),
                new ExponentialMovingAverageRate(2000, 120000, 30, true),
                new ExponentialMovingAverageRatio(130000, 40, true),
                new ExponentialMovingAverageRate(3000, 140000, 50, true));
    }

    @Override
    public AbstractMessageHandlerMetrics createHandlerMetrics(String name) {
        return new DefaultMessageHandlerMetrics(name, new ExponentialMovingAverage(20, 1000000.));
    }

}
Advanced Customization

The customizations described earlier are wholesale and apply to all appropriate beans exported by the MBean exporter. This is the extent of customization available when you use XML configuration.

Individual beans can be provided with different implementations using by Java @Configuration or programmatically at runtime (after the application context has been refreshed) by invoking the configureMetrics methods on AbstractMessageChannel and AbstractMessageHandler.

Performance Improvement

Previously, the time-based metrics (see Time-Based Average Estimates) were calculated in real time. The statistics are now calculated when retrieved instead. This resulted in a significant performance improvement, at the expense of a small amount of additional memory for each statistic. As discussed earlier, you can disable the statistics altogether while retaining the MBean that allows the invocation of Lifecycle methods.