13. Spring Boot Actuator

Spring Boot for Apache Geode and Pivotal GemFire (SBDG) adds Spring Boot Actuator support and dedicated HealthIndicators for Apache Geode and Pivotal GemFire. Equally, the provided HealthIndicators will even work with Pivotal Cloud Cache, which is backed by Pivotal GemFire, when pushing your Spring Boot applications to Pivotal CloudFoundry (PCC).

Spring Boot HealthIndicators provide details about the runtime operation and behavior of your Apache Geode or Pivotal GemFire based Spring Boot applications. For instance, by querying the right HealthIndicator endpoint, you would be able to get the current hit/miss count for your Region.get(key) data access operations.

In addition to vital health information, SBDG provides basic, pre-runtime configuration meta-data about the Apache Geode / Pivotal GemFire components that are monitored by Spring Boot Actuator. This makes it easier to see how the application was configured all in one place, rather than in properties files, Spring config, XML, etc.

The provided Spring Boot HealthIndicators fall under one of three categories:

The following sections give a brief overview of all the available Spring Boot HealthIndicators provided for Apache Geode/Pivotal GemFire, out-of-the-box.

13.1 Base HealthIndicators

The following section covers Spring Boot HealthIndicators that apply to both peer Cache and ClientCache, Spring Boot applications. That is, these HealthIndicators are not specific to the cache type.

In both Apache Geode and Pivotal GemFire, the cache instance is either a peer Cache instance, which makes your Spring Boot application part of a GemFire/Geode cluster, or more commonly, a ClientCache instance that talks to an existing cluster. Your Spring Boot application can only be one cache type or the other and can only have a single instance of that cache type.

13.1.1 GeodeCacheHealthIndicator

The GeodeCacheHealthIndicator provides essential details about the (single) cache instance (Client or Peer) along with the underlying DistributedSystem, the DistributedMember and configuration details of the ResourceManager.

When your Spring Boot application creates an instance of a peer Cache, the DistributedMember object represents your application as a peer member/node of the DistributedSystem formed from a collection of connected peers (i.e. the cluster), to which your application also has access, indirectly via the cache instance.

This is no different for a ClientCache even though the client is technically not part of the peer/server cluster. But, it still creates instances of the DistributedSystem and DistributedMember objects, respectively.

The following configuration meta-data and health details about each object is covered:

Table 13.1. Cache Details

NameDescription

geode.cache.name

Name of the member in the distributed system.

geode.cache.closed

Determines whether the cache has been closed.

geode.cache.cancel-in-progress

Cancellation of operations in progress.


Table 13.2. DistributedMember Details

NameDescription

geode.distributed-member.id

DistributedMember identifier (used in logs internally).

geode.distributed-member.name

Name of the member in the distributed system.

geode.distributed-members.groups

Configured groups to which the member belongs.

geode.distributed-members.host

Name of the machine on which the member is running.

geode.distributed-members.process-id

Identifier of the JVM process (PID).


Table 13.3. DistributedSystem Details

NameDescription

geode.distributed-system.member-count

Total number of members in the cluster (1 for clients).

geode.distributed-system.connected

Indicates whether the member is currently connected to the cluster.

geode.distributed-system.reconnecting

Indicates whether the member is in a reconnecting state, which happens when a network partition occurs and the member gets disconnected from the cluster.

geode.distributed-system.properties-location

Location of the standard configuration properties.

geode.distributed-system.security-properties-location

Location of the security configuration properties.


Table 13.4. ResourceManager Details

NameDescription

geode.resource-manager.critical-heap-percentage

Percentage of heap at which the cache is in danger of becoming inoperable.

geode.resource-manager.critical-off-heap-percentage

Percentage of off-heap at which the cache is in danger of becoming inoperable.

geode.resource-manager.eviction-heap-percentage

Percentage of heap at which eviction begins on Regions configured with a Heap LRU Eviction policy.

geode.resource-manager.eviction-off-heap-percentage

Percentage of off-heap at which eviction begins on Regions configured with a Heap LRU Eviction policy.


13.1.2 GeodeRegionsHealthIndicator

The GeodeRegionsHealthIndicator provides details about all the configured and known Regions in the cache. If the cache is a client, then details will include all LOCAL, PROXY and CACHING_PROXY Regions. If the cache is a peer, then the details will include all LOCAL, PARTITION and REPLICATE Regions.

While the configuration meta-data details are not exhaustive, essential details along with basic performance metrics are covered:

Table 13.5. Region Details

NameDescription

geode.cache.regions.<name>.cloning-enabled

Whether Region values are cloned on read (e.g. cloning-enabled is true when cache transactions are used to prevent in-place modifications).

geode.cache.regions.<name>.data-policy

Policy used to manage the data in the Region (e.g. PARTITION, REPLICATE, etc).

geode.cache.regions.<name>.initial-capacity

Initial number of entries that can be held by a Region before it needs to be resized.

geode.cache.regions.<name>.load-factor

Load factor used to determine when to resize the Region when it nears capacity.

geode.cache.regions.<name>.key-constraint

Type constraint for Region keys.

geode.cache.regions.<name>.off-heap

Determines whether this Region will store values in off-heap memory (NOTE: Keys are always kept on Heap).

geode.cache.regions.<name>.pool-name

If this Region is a client Region, then this property determines the configured connection Pool (NOTE: Regions can have and use dedicated Pools for their data access operations.)

geode.cache.regions.<name>.pool-name

Determines the Scope of the Region, which plays a factor in the Regions consistency-level, as it pertains to acknowledgements for writes.

geode.cache.regions.<name>.value-constraint

Type constraint for Region values.


Additionally, when the Region is a peer Cache PARTITION Region, then the following details are also covered:

Table 13.6. Partition Region Details

NameDescription

geode.cache.regions.<name>.partition.collocated-with

Indicates this Region is collocated with another PARTITION Region, which is necessary when performing equi-joins queries (NOTE: distributed joins are not supported).

geode.cache.regions.<name>.partition.local-max-memory

Total amount of Heap memory allowed to be used by this Region on this node.

geode.cache.regions.<name>.partition.redundant-copies

Number of replicas for this PARTITION Region, which is useful in High Availability (HA) use cases.

geode.cache.regions.<name>.partition.total-max-memory

Total amount of Heap memory allowed to be used by this Region across all nodes in the cluster hosting this Region.

geode.cache.regions.<name>.partition.total-number-of-buckets

Total number of buckets (shards) that this Region is divided up into (NOTE: defaults to 113).


Finally, when statistics are enabled (e.g. using @EnableStatistics, (see here for more details), the following details are available:

Table 13.7. Region Statistic Details

NameDescription

geode.cache.regions.<name>.statistics.hit-count

Number of hits for a Region entry.

geode.cache.regions.<name>.statistics.hit-ratio

Ratio of hits to the number of Region.get(key) calls.

geode.cache.regions.<name>.statistics.last-accessed-time

For an entry, determines the last time it was accessed with Region.get(key).

geode.cache.regions.<name>.statistics.last-modified-time

For an entry, determines the time a Region’s entry value was last modified.

geode.cache.regions.<name>.statistics.miss-count

Returns the number of times that a Region.get was performed and no value was found locally.


13.1.3 GeodeIndexesHealthIndicator

The GeodeIndexesHealthIndicator provides details about the configured Region Indexes used in OQL query data access operations.

The following details are covered:

Table 13.8. Index Details

NameDescription

geode.index.<name>.from-clause

Region from which data is selected.

geode.index.<name>.indexed-expression

The Region value fields/properties used in the Index expression.

geode.index.<name>.projection-attributes

For all other Indexes, returns "", but for Map Indexes, returns either "" or the specific Map keys that were indexed.

geode.index.<name>.region

Region to which the Index is applied.


Additionally, when statistics are enabled (e.g. using @EnableStatistics; (see here for more details), the following details are available:

Table 13.9. Index Statistic Details

NameDescription

geode.index.<name>.statistics.number-of-bucket-indexes

Number of bucket Indexes created in a Partitioned Region.

geode.index.<name>.statistics.number-of-keys

Number of keys in this Index.

geode.index.<name>.statistics.number-of-map-indexed-keys

Number of keys in this Index at the highest-level.

geode.index.<name>.statistics.number-of-values

Number of values in this Index.

geode.index.<name>.statistics.number-of-updates

Number of times this Index has been updated.

geode.index.<name>.statistics.read-lock-count

Number of read locks taken on this Index.

geode.index.<name>.statistics.total-update-time

Total amount of time (ns) spent updating this Index.

geode.index.<name>.statistics.total-uses

Total number of times this Index has been accessed by an OQL query.


13.1.4 GeodeDiskStoresHealthIndicator

The GeodeDiskStoresHealthIndicator provides details about the configured DiskStores in the system/application. Remember, DiskStores are used to overflow and persist data to disk, including type meta-data tracked by PDX when the values in the Region(s) have been serialized with PDX and the Region(s) are persistent.

Most of the tracked health information pertains to configuration:

Table 13.10. DiskStore Details

NameDescription

geode.disk-store.<name>.allow-force-compaction

Indicates whether manual compaction of the DiskStore is allowed.

geode.disk-store.<name>.auto-compact

Indicates if compaction occurs automatically.

geode.disk-store.<name>.compaction-threshold

Percentage at which the oplog will become compactable.

geode.disk-store.<name>.disk-directories

Location of the oplog disk files.

geode.disk-store.<name>.disk-directory-sizes

Configured and allowed sizes (MB) for the disk directory storing the disk files.

geode.disk-store.<name>.disk-usage-critical-percentage

Critical threshold of disk usage proportional to the total disk volume.

geode.disk-store.<name>.disk-usage-warning-percentage

Warning threshold of disk usage proportional to the total disk volume.

geode.disk-store.<name>.max-oplog-size

Maximum size (MB) allowed for a single oplog file.

geode.disk-store.<name>.queue-size

Size of the queue used to batch writes flushed to disk.

geode.disk-store.<name>.time-interval

Time to wait (ms) before writes are flushed to disk from the queue if the size limit has not be reached.

geode.disk-store.<name>.uuid

Universally Unique Identifier for the DiskStore across Distributed System.

geode.disk-store.<name>.write-buffer-size

Size the of write buffer the DiskStore uses to write data to disk.


13.2 ClientCache HealthIndicators

The ClientCache based HealthIndicators provide additional details specifically for Spring Boot, cache client applications. These HealthIndicators are only available when the Spring Boot application creates a ClientCache instance (i.e. is a cache client), which is the default.

13.2.1 GeodeContinuousQueriesHealthIndicator

The GeodeContinuousQueriesHealthIndicator provides details about registered client Continuous Queries (CQ). CQs enable client applications to receive automatic notification about events that satisfy some criteria. That criteria can be easily expressed using the predicate of an OQL query (e.g. “SELECT * FROM /Customers c WHERE c.age > 21”). Anytime data of interests is inserted or updated, and matches the criteria specified in the OQL query predicate, an event is sent to the registered client.

The following details are covered for CQs by name:

Table 13.11. Continuous Query(CQ) Details

NameDescription

geode.continuous-query.<name>.oql-query-string

OQL query constituting the CQ.

geode.continuous-query.<name>.closed

Indicates whether the CQ has been closed.

geode.continuous-query.<name>.closing

Indicates whether the CQ is the process of closing.

geode.continuous-query.<name>.durable

Indicates whether the CQ events will be remembered between client sessions.

geode.continuous-query.<name>.running

Indicates whether the CQ is currently running.

geode.continuous-query.<name>.stopped

Indicates whether the CQ has been stopped.


In addition, the following CQ query and statistical data is covered:

Table 13.12. Continuous Query(CQ), Query Details

NameDescription

geode.continuous-query.<name>.query.number-of-executions

Total number of times the query has been executed.

geode.continuous-query.<name>.query.total-execution-time

Total amount of time (ns) spent executing the query.

geode.continuous-query.<name>.statistics.number-of-deletes

 

Table 13.13. Continuous Query(CQ), Statistic Details

NameDescription

geode.continuous-query.<name>.statistics.number-of-deletes

Number of Delete events qualified by this CQ.

geode.continuous-query.<name>.statistics.number-of-events

Total number of events qualified by this CQ.

geode.continuous-query.<name>.statistics.number-of-inserts

Number of Insert events qualified by this CQ.

geode.continuous-query.<name>.statistics.number-of-updates

Number of Update events qualified by this CQ.


In a more general sense, the GemFire/Geode Continuous Query system is tracked with the following, additional details on the client:

Table 13.14. Continuous Query(CQ), Statistic Details

NameDescription

geode.continuous-query.count

Total count of CQs.

geode.continuous-query.number-of-active

Number of currently active CQs (if available).

geode.continuous-query.number-of-closed

Total number of closed CQs (if available).

geode.continuous-query.number-of-created

Total number of created CQs (if available).

geode.continuous-query.number-of-stopped

Number of currently stopped CQs (if available).

geode.continuous-query.number-on-client

Number of CQs that are currently active or stopped (if available).


13.2.2 GeodePoolsHealthIndicator

The GeodePoolsHealthIndicator provide details about all the configured client connection Pools. This HealthIndicator primarily provides configuration meta-data for all the configured Pools.

The following details are covered:

Table 13.15. Pool Details

NameDescription

geode.pool.count

Total number of client connection Pools.

geode.pool.<name>.destroyed

Indicates whether the Pool has been destroyed.

geode.pool.<name>.free-connection-timeout

Configured amount of time to wait for a free connection from the Pool.

geode.pool.<name>.idle-timeout

The amount of time to wait before closing unused, idle connections not exceeding the configured number of minimum required connections.

geode.pool.<name>.load-conditioning-interval

Controls how frequently the Pool will check to see if a connection to a given server should be moved to a different server to improve the load balance.

geode.pool.<name>.locators

List of configured Locators.

geode.pool.<name>.max-connections

Maximum number of connections obtainable from the Pool.

geode.pool.<name>.min-connections

Minimum number of connections contained by the Pool.

geode.pool.<name>.multi-user-authentication

Determines whether the Pool can be used by multiple authenticated users.

geode.pool.<name>.online-locators

Returns a list of living Locators.

geode.pool.<name>.pending-event-count

Approximate number of pending subscription events maintained at server for this durable client Pool at the time it (re)connected to the server.

geode.pool.<name>.ping-interval

How often to ping the servers to verify they are still alive.

geode.pool.<name>.pr-single-hop-enabled

Whether the client will acquire a direct connection to the server containing the data of interests.

geode.pool.<name>.read-timeout

Number of milliseconds to wait for a response from a server before timing out the operation and trying another server (if any are available).

geode.pool.<name>.retry-attempts

Number of times to retry a request after timeout/exception.

geode.pool.<name>.server-group

Configures the group in which all servers this Pool connects to must belong.

geode.pool.<name>.servers

List of configured servers.

geode.pool.<name>.socket-buffer-size

Socket buffer size for each connection made in this Pool.

geode.pool.<name>.statistic-interval

How often to send client statistics to the server.

geode.pool.<name>.subscription-ack-interval

Interval in milliseconds to wait before sending acknowledgements to the cache server for events received from the server subscriptions.

geode.pool.<name>.subscription-enabled

Enabled server-to-client subscriptions.

geode.pool.<name>.subscription-message-tracking-timeout

Time-to-Live period (ms), for subscription events the client has received from the server.

geode.pool.<name>.subscription-redundancy

Redundancy level for this Pools server-to-client subscriptions, which is used to ensure clients will not miss potentially important events.

geode.pool.<name>.thread-local-connections

Thread local connection policy for this Pool.


13.3 Peer Cache HealthIndicators

The peer Cache based HealthIndicators provide additional details specifically for Spring Boot, peer cache member applications. These HealthIndicators are only available when the Spring Boot application creates a peer Cache instance.

[Note]Note

The default cache instance created by Spring Boot for Apache Geode/Pivotal GemFire is a ClientCache instance.

[Tip]Tip

To control what type of cache instance is created, such as a "peer", then you can explicitly declare either the @PeerCacheApplication, or alternatively, the @CacheServerApplication, annotation on your @SpringBootApplication annotated class.

13.3.1 GeodeCacheServersHealthIndicator

The GeodeCacheServersHealthIndicator provides details about the configured Apache Geode/Pivotal GemFire CacheServers. CacheServer instances are required to enable clients to connect to the servers in the cluster.

This HealthIndicator captures basic configuration meta-data and runtime behavior/characteristics of the configured CacheServers:

Table 13.16. CacheServer Details

NameDescription

geode.cache.server.count

Total number of configured CacheServer instances on this peer member.

geode.cache.server.<index>.bind-address

IP address of the NIC to which the CacheServer ServerSocket is bound (useful when the system contains multiple NICs).

geode.cache.server.<index>.hostname-for-clients

Name of the host used by clients to connect to the CacheServer (useful with DNS).

geode.cache.server.<index>.load-poll-interval

How often (ms) to query the load probe on the CacheServer.

geode.cache.server.<index>.max-connections

Maximum number of connections allowed to this CacheServer.

geode.cache.server.<index>.max-message-count

Maximum number of messages that can be enqueued in a client queue.

geode.cache.server.<index>.max-threads

Maximum number of Threads allowed in this CacheServer to service client requests.

geode.cache.server.<index>.max-time-between-pings

Maximum time between client pings.

geode.cache.server.<index>.message-time-to-live

Time (seconds) in which the client queue will expire.

geode.cache.server.<index>.port

Network port to which the CacheServer ServerSocket is bound and listening for the client connections.

geode.cache.server.<index>.running

Determines whether this CacheServer is currently running and accepting client connections.

geode.cache.server.<index>.socket-buffer-size

Configured buffer size of the Socket connection used by this CacheServer.

geode.cache.server.<index>.tcp-no-delay

Configures the TCP/IP TCP_NO_DELAY setting on outgoing Sockets.


In addition to the configuration settings shown above, the CacheServer’s ServerLoadProbe tracks additional details about the runtime characteristics of the CacheServer, as follows:

Table 13.17. CacheServer Metrics and Load Details

NameDescription

geode.cache.server.<index>.load.connection-load

Load on the server due to client to server connections.

geode.cache.server.<index>.load.load-per-connection

Estimate of the how much load each new connection will add to this server.

geode.cache.server.<index>.load.subscription-connection-load

Load on the server due to subscription connections.

geode.cache.server.<index>.load.load-per-subscription-connection

Estimate of the how much load each new subscriber will add to this server.

geode.cache.server.<index>.metrics.client-count

Number of connected clients.

geode.cache.server.<index>.metrics.max-connection-count

Maximum number of connections made to this CacheServer.

geode.cache.server.<index>.metrics.open-connection-count

Number of open connections to this CacheServer.

geode.cache.server.<index>.metrics.subscription-connection-count

Number of subscription connections to this CacheServer.


13.3.2 GeodeAsyncEventQueuesHealthIndicator

The GeodeAsyncEventQueuesHealthIndicator provides details about the configured AsyncEventQueues. AEQs can be attached to Regions to configure asynchronous, write-behind behavior.

This HealthIndicator captures configuration meta-data and runtime characteristics for all AEQs, as follows:

Table 13.18. AsyncEventQueue Details

NameDescription

geode.async-event-queue.count

Total number of configured AEQs.

geode.async-event-queue.<id>.batch-conflation-enabled

Indicates whether batch events are conflated when sent.

geode.async-event-queue.<id>.batch-size

Size of the batch that gets delivered over this AEQ.

geode.async-event-queue.<id>.batch-time-interval

Max time interval that can elapse before a batch is sent.

geode.async-event-queue.<id>.disk-store-name

Name of the disk store used to overflow & persist events.

geode.async-event-queue.<id>.disk-synchronous

Indicates whether disk writes are sync or async.

geode.async-event-queue.<id>.dispatcher-threads

Number of Threads used to dispatch events.

geode.async-event-queue.<id>.forward-expiration-destroy

Indicates whether expiration destroy operations are forwarded to AsyncEventListener.

geode.async-event-queue.<id>.max-queue-memory

Maximum memory used before data needs to be overflowed to disk.

geode.async-event-queue.<id>.order-policy

Order policy followed while dispatching the events to AsyncEventListeners.

geode.async-event-queue.<id>.parallel

Indicates whether this queue is parallel (higher throughput) or serial.

geode.async-event-queue.<id>.persistent

Indicates whether this queue stores events to disk.

geode.async-event-queue.<id>.primary

Indicates whether this queue is primary or secondary.

geode.async-event-queue.<id>.size

Number of entries in this queue.


13.3.3 GeodeGatewayReceiversHealthIndicator

The GeodeGatewayReceiversHealthIndicator provide details about the configured (WAN) GatewayReceivers, which are capable of receiving events from remote clusters when using Apache Geode/Pivotal GemFire’s multi-site, WAN topology.

This HealthIndicator captures configuration meta-data along with the running state for each GatewayReceiver:

Table 13.19. GatewayReceiver Details

NameDescription

geode.gateway-receiver.count

Total number of configured GatewayReceivers.

geode.gateway-receiver.<index>.bind-address

IP address of the NIC to which the GatewayReceiver ServerSocket is bound (useful when the system contains multiple NICs).

geode.gateway-receiver.<index>.end-port

End value of the port range from which the GatewayReceiver’s port will be chosen.

geode.gateway-receiver.<index>.host

IP address or hostname that Locators will tell clients (i.e. GatewaySenders) that this GatewayReceiver is listening on.

geode.gateway-receiver.<index>.max-time-between-pings

Maximum amount of time between client pings.

geode.gateway-receiver.<index>.port

Port on which this GatewayReceiver listens for clients (i.e. GatewaySenders).

geode.gateway-receiver.<index>.running

Indicates whether this GatewayReceiver is running and accepting client connections (from GatewaySenders).

geode.gateway-receiver.<index>.socket-buffer-size

Configured buffer size for the Socket connections used by this GatewayReceiver.

geode.gateway-receiver.<index>.start-port

Start value of the port range from which the GatewayReceiver’s port will be chosen.


13.3.4 GeodeGatewaySendersHealthIndicator

The GeodeGatewaySendersHealthIndicator provides details about the configured GatewaySenders. GatewaySenders are attached to Regions in order to send Region events to remote clusters in Apache Geode/Pivotal GemFire’s multi-site, WAN topology.

This HealthIndicator captures essential configuration meta-data and runtime characteristics for each GatewaySender:

Table 13.20. GatewaySender Details

NameDescription

geode.gateway-sender.count

Total number of configured GatewaySenders.

geode.gateway-sender.<id>.alert-threshold

Alert threshold (ms) for entries in this GatewaySender’s queue.

geode.gateway-sender.<id>.batch-conflation-enabled

Indicates whether batch events are conflated when sent.

geode.gateway-sender.<id>.batch-size

Size of the batches sent.

geode.gateway-sender.<id>.batch-time-interval

Max time interval that can elapse before a batch is sent.

geode.gateway-sender.<id>.disk-store-name

Name of the DiskStore used to overflow and persist queue events.

geode.gateway-sender.<id>.disk-synchronous

Indicates whether disk writes are sync or async.

geode.gateway-sender.<id>.dispatcher-threads

Number of Threads used to dispatch events.

geode.gateway-sender.<id>.max-queue-memory

Maximum amount of memory (MB) usable for this GatewaySender’s queue.

geode.gateway-sender.<id>.max-parallelism-for-replicated-region

 

geode.gateway-sender.<id>.order-policy

Order policy followed while dispatching the events to GatewayReceivers.

geode.gateway-sender.<id>.parallel

Indicates whether this GatewaySender is parallel (higher throughput) or serial.

geode.gateway-sender.<id>.paused

Indicates whether this GatewaySender is paused.

geode.gateway-sender.<id>.persistent

Indicates whether this GatewaySender persists queue events to disk.

geode.gateway-sender.<id>.remote-distributed-system-id

Identifier for the remote distributed system.

geode.gateway-sender.<id>.running

Indicates whether this GatewaySender is currently running.

geode.gateway-sender.<id>.socket-buffer-size

Configured buffer size for the Socket connections between this GatewaySender and its receiving GatewayReceiver.

geode.gateway-sender.<id>.socket-read-timeout

Amount of time (ms) that a Socket read between this sending GatewaySender and its receiving GatewayReceiver will block.