Asynchronous Inline Caching with Spring

This guide walks you through building a simple Spring Boot application using Spring’s Cache Abstraction backed by Apache Geode as the caching provider for Asynchronous Inline Caching.

It is assumed that the reader is familiar with the Spring programming model. No prior knowledge of Spring’s Cache Abstraction or Apache Geode is required to utilize caching in your Spring Boot applications.

Additionally, this Sample builds on the concepts from the Inline Caching with Spring and Look-Aside Caching with Spring guides. Therefore, it would be helpful to have read those guides before proceeding through this guide.

Let’s begin.

Refer to the Inline Caching section, and specifically, Asynchronous Inline Caching, in the Caching with Apache Geode chapter of the reference documentation for more information.

Index

Back to Samples

1. Background

In Synchronous Inline Caching, data is immediately read from or written to the primary data source, (a.k.a. the System of Record (SOR)), before the cache is modified, thereby guaranteeing a degree of consistency between the cache and the backend data source. The "synchronous" arrangement of the Inline Caching pattern is commonly referred to as "Read/Write-Through".

With Asynchronous Inline Caching, data changes are written to the primary data source asynchronously, after the cache has already been modified. The "asynchronous" arrangement of the Inline Caching pattern is commonly referred to as "Write-Behind". The cache entry is modified, then, and only then, will the primary data source reflect the changes sometime later.

Due to the asynchronous nature of Asynchronous Inline Caching, it is possible for the primary data source (i.e. System of Record (SOR)) and cache to be out-of-sync. Additionally, the primary data source may contain information that the cache does not. That is, another application may be updating the primary data source and not using the cache. Conversely, a cache entry change may not be promptly written to the primary data source until the "Write-Behind" operation is triggered, which is often implementation dependent. A data change could violate a database constraint, fail to commit and be rolled back. All sorts of reasons can cause the primary data source and the cache to get out-of-sync, or become inconsistent.

For this reason, throughput and latency are the primary application concerns and motivation, rather than consistency, when using the Asynchronous Inline Caching pattern.

The general pattern of Inline Caching is depicted as follows:

The layer in the application/system architecture involving the Inline Caching logic sits between the cache and the primary data source:

In Synchronous, Read/Write-Through, Inline Caching, the system/application architecture appears as follows:

With Asynchronous, Write-Behind, Inline Caching, the system/application architecture would instead appear as:

IMPLEMENTATION

As readers should know or will learn, the application cache is backed by an Apache Geode Region.

In Synchronous, Read-Through and/or Write-Through, Inline Caching, a CacheLoader is configured for the Region and used to "Read-Through" to the backend/primary data source on a cache miss. When a cache entry is written, a configured CacheWriter for the Region is invoked to "Write-Through" to the backend/primary data source. The cache is only modified if the CacheWriter was successful in modifying the backend/primary data source.

Both the CacheLoader and CacheWriter are optional. That is, you can configure just one side of Synchronous Inline Caching or the other, either "Read-Through" or "Write-Through", both, or neither.

With Asynchronous, Write-Behind, Inline Caching, you (may) configure the Region with an associated AsyncEventQueue (AEQ) and registered AsyncEventListener. When the cache is written to, the entry event is then forwarded and stored on the AEQ, where at some time later, the registered AsyncEventListener for the AEQ will be invoked to process the (batch of) AsyncEvents, which can then asynchronously modify the backend/primary data source.

Unlike Synchronous Inline Caching, Asynchronous Inline Caching does not have an equivalent for "Read-Through", such as "Read-Behind", particularly in a Reactive sense.

At some point later, we may consider the development of "Read-Behind" with with use of Reactive Programming and the Reactive Spring Data Repository abstraction.

It should also be intuitive that the listener registered on the AEQ attached to the (cache) Region does not have to process the events by writing to a backend data store. It could write to a message queue, to the file system, or do just about anything a user desires. However, OOTB SBDB provides support to inject a Spring Data Repository into an AEQ listener to write to any backend data store supported by the Spring Data Repository abstraction.

Back to Samples

2. Example

For our example, we have built a Golf Tournament application that runs a simulation with a set of professional golfers playing at The Masters. The (12) golfers play 18 holes of golf in pairs and proceed from hole 1 to hole 18 in under a minute. For each hole played, their score of the hole is calculated. At the end of the round, each golfer’s final score is calculated relative to par for the golf course (72).

The Golf Tournament application is a Spring Boot application using Apache Geode to cache the golfers score in realtime as the players complete each hole. However, to make the play "official", the golfer’s score is recorded to a backend database (RDBMS), asynchronously using Asynchronous, Write-Behind, Inline Caching. It is assumed that there is additional validation required (e.g. signing scorecards) that goes on before the final score is accepted and recorded to the System of Record (SOR), in the "history books", so to speak.

Now that the problem context has been established, let’s review a few of the application classes.

Each of the application domain classes are code snippets or simply a preview of the actual class, and not actual code. See the actual Sample code for more detail.

We start by defining our Golf Tournament application domain model types, starting with the Golfer class. Essentially, the Golfer class models a person who plays golf and is defined as:

Golfer class.

@Entity
@Table(name = "golfers")
public class Golfer implements Comparable<Golfer> {

    @javax.persistence.Id @Id
    private String name;

    private Integer hole = 0;

    private Integer score = 0;

}

The Golfer class has been annotated with JPA’s @Entity annotation making it a proper (persistent) entity class. The Golfer class is also annotated with @Table to persist instances of Golfer into the "golfers" table of the database.

The application also defines a non-entity, GolfCourse class to model the golf course, which requires a name and List of pars for each hole (all 18 holes) of the golf course:

GolfCourse class

class GolfCourse {

    private final String name;

    private final List<Integer> parForHole = new ArrayList<>(18);

}

Next, a non-entity, GolfTournament class has been defined to model the golf tournament being played. It expects a name for the tournament, the GolfCourse where the tournament is held and played, and a Set of Golfers (players) registered to play.

Additionally, the GolfTournament class contains an inner class, the Pairing class, to group the registered players into pairs to play a round.

GolfTournament class

class GolfTournament implements Iterable<Pairing> {

    private final String name;

    private GolfCourse golfCourse;

    private final List<Pairing>  pairings = new ArrayList<>();

    private final Set<Golfer> players = new ArrayList<>();

    public static class Pair {

        private final Golfer playerOne;
        private final Golfer playerTwo;

    }
}

The GolfTournament.Pairing class serves as a composite acting on both players in the pair, such as to advance the hole of play.

The GolfTournament class has additional builder methods to register players, build pairings, enable the tournament to be played and determine when the tournament is finished (i.e. when all pairs complete all 18 holes of play).

There is a GolferRepository interface extending the JpaRepository interface to persist the state of each Golfer to the backend database:

GolferRepository interface

interface GolferRepository extends JpaRepository<Golfer, String> { }

While GolferRepository extends from the JpaRepository interface directly, it is recommended to extend the CrudRepository interface instead, keeping your application SD Repositories agnostic from the underlying data store. The reason GolferRepository extends from the JpaRepository interface directly, is to make it absolutely clear that the Golfer state will be persisted to a backend database (RDBMS) with JPA using Hibernate as the provider.

The GolferRepository will be used by SBDG’s Asynchronous Inline Caching framework and infrastructure components.

The Repository is injected into and used by the AsyncEventListener registered on the AEQ attached to the "Golfers" Region to perform Asynchronous, Write-Behind, Inline Caching, operations to the backend database and System of Record (SOR).

We’ll see in a moment how this association is made and how Asynchronous Inline Caching is setup, made simple by SBDG.

To encapsulate the application logic and provide a (possibly transactional) facade to the Golfer’s state, a GolferService class has been defined:

GolferService class

@Service
class GolferService {

	@CachePut(cacheNames = "Golfers", key = "#golfer.name")
	public Golfer update(Golfer golfer) {
		return golfer;
	}

    public List<Golfer> getAllGolfersFromCache() {
        // Use SDG GemfireTemplate to access the "Golfers" Region
    }

    public List<Golfer> getAllGolfersFromDatabase() {
        // Use the GolfersRepository to access the "Golfers" stored in the database.
    }
}

The GolferService class has been marked as a application service using Spring’s @Service stereotype annotation.

Along with the GolferService the application uses a PgaTourService class to manage and run a (single) GolfTournament. Its primary method used to run a GolfTournament is the play() method:

PgaTourService class, play() method

	@Scheduled(initialDelay = 5000L, fixedDelay = 2500L)
	public void play() {

		GolfTournament golfTournament = this.golfTournament;

		if (isNotFinished(golfTournament)) {
			playHole(golfTournament);
			finish(golfTournament);
		}
	}

This is a Spring @Scheduled service method called every 2.5 seconds after an initial delay of 5 seconds. Essentially, the service method iterates through the pairings and each Golfer plays all 18 holes. Their scores are calculated and recorded for each hole until the round is completed, where the players score is then calculated relative to par for the golf course and recorded to the cache, which eventually updates the database.

To get everything started, a Spring Boot application class (i.e. a class annotated with the @SpringBootApplication annotation) is used to bootstrap the Golf Tournament application.

BootGeodeAsyncInlineCachingClientApplication class

@SpringBootApplication
@SuppressWarnings("unused")
public class BootGeodeAsyncInlineCachingClientApplication {

	private static final String APPLICATION_NAME = "GolfClientApplication";

	public static void main(String[] args) {
		SpringApplication.run(BootGeodeAsyncInlineCachingClientApplication.class, args);
	}

	@Configuration
	@EnableScheduling
	static class GolfApplicationConfiguration {

		@Bean
		ApplicationRunner runGolfTournament(PgaTourService pgaTourService) {

			return args -> {

				GolfTournament golfTournament = GolfTournament.newGolfTournament("The Masters")
					.at(GolfCourseBuilder.buildAugustaNational())
					.register(GolferBuilder.buildGolfers(GolferBuilder.FAVORITE_GOLFER_NAMES))
					.buildPairings()
					.play();

				pgaTourService.manage(golfTournament);

			};
		}
	}

	@Configuration
	@UseMemberName(APPLICATION_NAME)
	@EnableCachingDefinedRegions(serverRegionShortcut = RegionShortcut.REPLICATE)
	static class GeodeConfiguration { }

	@PeerCacheApplication
	@Profile("peer-cache")
	@Import({ AsyncInlineCachingConfiguration.class, AsyncInlineCachingRegionConfiguration.class })
	static class PeerCacheApplicationConfiguration { }

}

The GolfTournament is kicked off in the ApplicationRunner.

ApplicationRunner bean in the GolfApplicationConfiguration class

	@Configuration
	@EnableScheduling
	static class GolfApplicationConfiguration {

		@Bean
		ApplicationRunner runGolfTournament(PgaTourService pgaTourService) {

			return args -> {

				GolfTournament golfTournament = GolfTournament.newGolfTournament("The Masters")
					.at(GolfCourseBuilder.buildAugustaNational())
					.register(GolferBuilder.buildGolfers(GolferBuilder.FAVORITE_GOLFER_NAMES))
					.buildPairings()
					.play();

				pgaTourService.manage(golfTournament);

			};
		}
	}

As the golf tournament progresses (in the @Scheduled, PgaTourService.play() service method), updates to the Golfers in the pairs are written to the "Golfers" cache (i.e. "Golfers" Region) by calling the GolferService.update(:Golfer) service method:

GolferService class, update(:Golfer) method

	@CachePut(cacheNames = "Golfers", key = "#golfer.name")
	public Golfer update(Golfer golfer) {
		return golfer;
	}

This service method simply "puts" the Golfer in the cache (i.e. "Golfers" Region) mapped to the Golfer’s name (as a key/value cache entry).

The cache/Region entry put operation results in cache event being added to the AEQ, which will eventually trigger the SBDG framework-provided AsyncEventListener with our injected GolferRepository to write the Golfer’s state to the backend database.

The configuration of the "Golfers" Region (cache) with an AEQ and listener using the GolferRepository is defined as follows:

AsyncInlineCachingConfiguration class

@Configuration
@SuppressWarnings("unused")
public class AsyncInlineCachingConfiguration {

	protected static final String GOLFERS_REGION_NAME = "Golfers";

	@Bean
	@Profile("queue-batch-size")
	AsyncInlineCachingRegionConfigurer<Golfer, String> batchSizeAsyncInlineCachingConfigurer(
			@Value("${spring.geode.sample.async-inline-caching.queue.batch-size:25}") int queueBatchSize,
			GolferRepository golferRepository) {

		return AsyncInlineCachingRegionConfigurer.create(golferRepository, GOLFERS_REGION_NAME)
			.withQueueBatchConflationEnabled()
			.withQueueBatchSize(queueBatchSize)
			.withQueueBatchTimeInterval(Duration.ofMinutes(15))
			.withQueueDispatcherThreadCount(1);
	}

	@Bean
	@Profile("queue-batch-time-interval")
	AsyncInlineCachingRegionConfigurer<Golfer, String> batchTimeIntervalAsyncInlineCachingConfigurer(
			@Value("${spring.geode.sample.async-inline-caching.queue.batch-time-interval-ms:5000}") int queueBatchTimeIntervalMilliseconds,
			GolferRepository golferRepository) {

		return AsyncInlineCachingRegionConfigurer.create(golferRepository, GOLFERS_REGION_NAME)
			.withQueueBatchSize(1000000)
			.withQueueBatchTimeInterval(Duration.ofMillis(queueBatchTimeIntervalMilliseconds))
			.withQueueDispatcherThreadCount(1);
	}
}

The Spring @Configuration class used to enable Async Inline Caching consists of 2 different AEQ configurations and bean definitions.

The first is a AEQ configured with a "preference" for being triggered on the batch size, i.e. the number of events present in the AEQ:

AEQ batch size configuration

	@Bean
	@Profile("queue-batch-size")
	AsyncInlineCachingRegionConfigurer<Golfer, String> batchSizeAsyncInlineCachingConfigurer(
			@Value("${spring.geode.sample.async-inline-caching.queue.batch-size:25}") int queueBatchSize,
			GolferRepository golferRepository) {

		return AsyncInlineCachingRegionConfigurer.create(golferRepository, GOLFERS_REGION_NAME)
			.withQueueBatchConflationEnabled()
			.withQueueBatchSize(queueBatchSize)
			.withQueueBatchTimeInterval(Duration.ofMinutes(15))
			.withQueueDispatcherThreadCount(1);
	}

The second AEQ configuration uses a "preference" for being triggered based on a batch time interval, i.e. after a period of time has elapsed, such as 5 seconds.

AEQ batch time interval configuration

	@Bean
	@Profile("queue-batch-time-interval")
	AsyncInlineCachingRegionConfigurer<Golfer, String> batchTimeIntervalAsyncInlineCachingConfigurer(
			@Value("${spring.geode.sample.async-inline-caching.queue.batch-time-interval-ms:5000}") int queueBatchTimeIntervalMilliseconds,
			GolferRepository golferRepository) {

		return AsyncInlineCachingRegionConfigurer.create(golferRepository, GOLFERS_REGION_NAME)
			.withQueueBatchSize(1000000)
			.withQueueBatchTimeInterval(Duration.ofMillis(queueBatchTimeIntervalMilliseconds))
			.withQueueDispatcherThreadCount(1);
	}

The default AEQ batch time interval in Apache Geode is 5 milliseconds (5 ms). However, to demonstrate the asynchronous nature of the cache to database updates, a much longer delay was used. Likewise, the default AEQ batch size in Apache Geode is 100.

In both AEQ configurations and bean definitions, the batch size and batch time interval have been set (overriding the Apache Geode defaults) in order to show the effects of each AEQ configuration independently. As you can imagine, particularly in a highly concurrent and transactional application with frequent updates, it would be hard to determine whether the AEQ event processing (via the listener) was triggered by the batch time interval or the batch size. And, with a default 5 millisecond batch time interval, it is hard to witness the asynchronous nature of the cache to database updates to begin with.

We will have more to say on the AEQ configuration below, in the conclusion.

The final class in the golf application is a GolferController class annotated with Spring’s @RestController annotation in order to expose our golf application functionality as an API in a REST-ful interface:

GolferService class, update(:Golfer) method

@RestController
@RequestMapping("/api/golf/tournament")
@SuppressWarnings("unused")
public class GolferController {

	private final GolferService golferService;

	public GolferController(@NonNull GolferService golferService) {

		Assert.notNull(golferService, "GolferService must not be null");

		this.golferService = golferService;
	}

	protected @NonNull GolferService getGolferService() {
		return this.golferService;
	}

	@GetMapping("/cache")
	public List<Golfer> getGolfersFromCache() {
		return getGolferService().getAllGolfersFromCache();
	}

	@GetMapping("/database")
	public List<Golfer> getGolfersFromDatabase() {
		return getGolferService().getAllGolfersFromDatabase();
	}
}

The Spring Web MVC @RestController class exposes two REST-ful API web service endpoints returning JSON data:

http://localhost:8080/api/golf/tournament/cache - used to get the current state of the Golfers from the cache
http://localhost:8080/api/golf/tournament/database - used to get the current state of the Golfers from the database

Both web service endpoints are consumed by the golf-tournament-view.html page, which uses jQuery and AJAX to make periodic HTTP requests to refresh the page.

3. Run the Example

To run the example, there are few more configuration details we need to cover.

While it is possible to run this example using an Apache Geode client/server topology, we keep things simple by running the example using a single Spring Boot application class, namely the BootGeodeAsyncInlineCachingClientApplication along with a peer cache configuration.

That is, in our BootGeodeAsyncInlineCachingClientApplication class, we also apply the PeerCacheApplicationConfiguration by enabling the Spring Profile, "peer-cache":

PeerCacheApplicationConfiguration class

	@PeerCacheApplication
	@Profile("peer-cache")
	@Import({ AsyncInlineCachingConfiguration.class, AsyncInlineCachingRegionConfiguration.class })
	static class PeerCacheApplicationConfiguration { }

It should be noted that AEQs can only be created and registered on Regions existing on the server-side of an Apache Geode system. That is, you cannot add an AEQ to a client-side Region. Therefore, in all your Async Inline Caching Uses Cases (UC), synchronous or asynchronous, it will be the servers in an Apache Geode cluster that are responsible for Write-Behind functionality to the backend data store, not a Spring Boot, Apache Geode client application.

However, for demonstration purposes, we override SBDG’s auto-configuration providing a ClientCache instance by default simply by enabling the "peer-cache" Spring Profile, which replaces the ClientCache instance with a peer Cache instance instead.

Finally, when running this application, you must decide on your AEQ management strategy.

For example, do you want the AEQ listener to be triggered by batch size (i.e. the number of cache events) or using the batch time interval. Each strategy can be enabled using a Spring Profile, either "queue-batch-size" or "queue-batch-time-interval". This allows you to experiment with different AEQ management strategies and observe the effects.

In total, the Spring Profiles you need to enable would appear as follows:

Spring Profiles to enable when running the application.

-Dspring.profiles.active=peer-cache,queue-batch-size,server

Of course, you can replace "queue-batch-size" with "queue-batch-time-interval".

The final run configuration of the Spring Boot application, as seen in IntelliJ IDEA is:

BootGeodeAsyncInlineCachingClientApplication IntelliJ IDEA Run Configuration

To access the golf application, simply navigate to:

http://localhost:8080/golf-tournament-view.html

You should see a web page similar to:

You can also run this example using the SBDG Gradle build from the command-line like so:

Run the example using Gradle

$ gradlew --no-daemon :spring-geode-sample-caching-inline-async:bootRun

This is convenient since the Spring Profiles are already configured for you.

However, when you switch to using the "queue-batch-time-interval" you will see a similar effect and behavior, but on a slightly different schedule for the database updates, i.e. at a fixed 5 second interval.

4. Conclusion

Asynchronous Inline Caching can be a powerful pattern of caching applied to your Spring Boot application workflows depending on the use case and requirements.

If throughput and latency are absolutely critical to your application design in order to achieve the necessary responsiveness and quality of experience your users' expect, and consistency (i.e. between the cache and the backend System of Record (SOR), or database) is not as important of a concern, then you might want to consider Asynchronous Inline Caching.

There are many factors to consider in the configuration of the AEQ that is at the heart of the Asynchronous Inline Caching pattern, such as the appropriate batch size and batch time interval. Neither setting is exclusive from the other, in fact. Both settings are considered when Apache Geode makes a decision of when to trigger the listener registered on the AEQ to process the events for operations originating from the Region to which the AEQ is attached.

You must decide on the batch size, based on how many events might occur in a given period of time. If the frequency is quite high, then you might need a smaller batch size, for instance. The AEQ is in-memory after all, therefore you must be conscious of memory constraints on your system, especially during peak loads. Of course, the AEQ can be configured to overflow events to disk and even persist events between restarts, but ideally you want these events to be processed in as near realtime as possible.

However, when the load on your application is low and events occur sporadically, you must also be mindful that the events do not sit in the AEQ for too long. If you have batch size of 1000, and there are currently only 20 events (well, any number of events less than 1000) sitting in the AEQ waiting to be processed, then the batch time interval becomes important, especially so that these remaining events (less than the configured batch size) don’t wait in the queue indefinitely. The configured Queue Dispatcher Thread count plays into this as well.

Other factors to consider are whether you can conflate the events in the AEQ. This minimizes the number of events for a single logical Object to the latest update. Additionally, do you need to overflow events to disk after the configured maximum queue memory is reached, or should events simply be discarded? Do you need to maintain the events in the queue between restarts (i.e. configure the AEQ to be persistent)? Do the disk writes for overflow and/or persistence need to be synchronous? Do the events in the queue need to be ordered based on some OrderPolicy? Do the events need to be filtered? How many dispatcher threads do you require? Etc. Etc.

There are many important things consider in the configuration of the AEQ when using Asynchronous Inline Caching for Write-Behind capabilities. Usually, it is safe to start with the defaults and adjust as needed, and as your measurements and tests dictate.

We hope that you found this guide useful and informative when tackling difficult problems, the kind of application problems where the Asynchronous Inline Caching pattern can be applied with immediate benefits.

Back to Samples