Chapter 6. Retry

6.1. RetryTemplate

To make processing more robust and less prone to failure, sometimes it helps to automatically retry a failed operation in case it might succeed on a subsequent attempt. Errors that are susceptible to this kind of treatment are transient in nature, for example a remote call to a web service or RMI service that fails because of a network glitch, or a DeadLockLoserException in a database update. To automate the retry of such operations Spring Batch has the RetryOperations strategy. The RetryOperations interface looks like this:

public interface RetryOperations {

    Object execute(RetryCallback retryCallback) throws Exception;

}

where the callback is a simple interface that allows you to insert some business logic to be retried

public interface RetryCallback {

    Object doWithRetry(RetryContext context) throws Throwable;

}

The callback is executed and if it fails (by throwing an Exception), it will be retried until either it is successful, or the implementation decides to abort.

The simplest general purpose implementation of RetryOperations is RetryTemplate. It could be used like this

RetryTemplate template = new RetryTemplate();

template.setRetryPolicy(new TimeoutRetryPolicy(30000L));

Object result = template.execute(new RetryCallback() {

    public Object doWithRetry(RetryContext context) {
        // Do stuff that might fail, e.g. webservice operation
        return result;
    }

});

In the example we execute a web service call and return the result to the user. If that call fails then it is retried until a timeout is reached.

6.1.1. RetryContext

The method parameter for the RetryCallback is a RetryContext. Many callbacks will simply ignore the context, but if necessary it can be used as an attribute bag to store data for the duration of the iteration.

A RetryContext will have a parent context if there is a nested retry in progress in the same thread. The parent context is occasionally useful for storing data that need to be shared between calls to execute.

6.2. Retry Policies

Inside a RetryTemplate the decision to retry or fail in the execute method is determined by a RetryPolicy which is also a factory for the RetryContext. The RetryTemplate has the reponsibility to use the current policy to create a RetryContext and pass that in to the RetryCallback at every attempt. After a callback fails the RetryTemplate has to make a call to the RetryPolicy to ask it to update its state (which will be stored in the RetryContext), and then it asks the policy if another attempt can be made. If another attempt cannot be made (e.g. a limit is reached or a timeout is detected) then the policy is also responsible for handling the exhausted state. Simple implementations will just throw RetryExhaustedException, and any enclosing transaction will be rolled back. More sophisticated implementations might attempt to take some recovery action, in which case the transaction can remain intact.

Tip

Failures are inherently either retryable or not - if the same exception is always going to be thrown from the business logic, it doesn't help to retry it. So don't retry on all exception types - try to focus on only those exceptions that you expect to be retryable. It's not usually harmful to the business logic to retry more aggressively, but it's wasteful because if a failure is determinstic there could be a very tight loop retrying something that you know in advance is fatal.

6.2.1. Stateless Retry

In the simplest case a retry is just a while loop - the RetryTemplate can just keep trying until it either succeeds or fails. The RetryContext contains some state to determine whether to retry or abort, but this state is on the stack and there is no need to store it anywhere globally, so we call this stateless retry. The distinction between stateless and stateful retry is contained in the implementation of the RetryPolicy (the RetryTemplate can handle both). In a stateless retry, the callback is always executed in the same thread on retry as when it failed.

Spring Batch provides some simple general purpose implementations of stateless RetryPolicy, for example a SimpleRetryPolicy, and the TimeoutRetryPolicy used in the example above.

The SimpleRetryPolicy just allows a retry on any of a named list of exception types, up to a fixed number of times. It also has a list of "fatal" exceptions that should never be retried, and this list overrides the retryable list, so it can be used to give finer control over the retry behaviour, e.g.

SimpleRetryPolicy policy = new SimpleRetryPolicy(5);
// Retry on all exceptions (this is the default)
policy.setRetryableExceptions(new Class[] {Exception.class});
// ... but never retry IllegalStateException
policy.setFatalExceptions(new Class[] {ILlegalStateException.class});

// Use the policy...
RetryTemplate template = new RetryTemplate();
template.setRetryPolicy(policy);
template.execute(new RetryCallback() {
    public Object doWithRetry(RetryContext context) {
        // business logic here
    }
});

There is also a more flexible implementation called ExceptionClassifierRetryPolicy, which allows the user to configure different retry behaviour for an arbitrary set of excecption types though the ExceptionClassifier abstraction. The policy works by calling on the classifier to convert an exception into a delegate RetryPolicy, so for example, one exception type can be retried more times before failure than another by mapping it to a different policy.

Users might need to implement their own retry policies for more customized decisions, e.g. if there is a well-known solution-specific classification of exceptions into retryable and not retryable.

6.2.2. Stateful Retry

Where the failure has caused a transactional resource to become invalid there are some special considerations. This does not apply to a simple remote call because there was no transactional resource (usually), but it does sometimes apply to a database update, especially when using Hibernate. In this case it only makes sense to rethrow the exception that called the failure immediately, so that the transaction can roll back, and we can start a new valid one.

In these cases a stateless retry is not good enough because the re-throw and roll back necessarily involve leaving the RetryOperations.execute() method and potentially losing the context that was on the stack. To avoid losing it we have to introduce a storage strategy to lift it off the stack and put it (at a minimum) in heap storage. For this purpose Spring Batch provides a storage strategy RetryContextCache. The default implementation of the RetryContextCache is in memory, using a simple Map. Advanced usage with multiple processes in a clustered environment might also consider implementing the RetryContextCache with a cluster cache of some sort (even in a clustered environment this might be overkill).

6.2.2.1. Item processing and stateful retry

Part of the reponsibility of a stateful retry policy is to recognise the failed operations when they come back in a new transaction. To facilitate this in the commonest case where an object (like a message or message payload) is being processed, Spring Batch provides the ItemWriterRetryPolicy. This works in conjunction with a special RetryCallback implementation ItemWriterRetryCallback, which in turn relies on the user providing an ItemWriter. This callback implements the common pattern where it passes the item to a writer.

The way the failed operations are recognised in this implementation is by identifying the item across multiple invocations of the retry. To identify the item the user can provide an ItemKeyGenerator strategy, and this is responsible for returning a unique key identifying the item. The identifier is used as a key in the RetryContextCache. An ItemKeyGenerator can be provided either by injecting it directly into the ItemWriterRetryCallback, or by implementing the interface in the ItemWriter, or by accepting the default which is to simply use the item itself as a key.

Warning

If you use the default item key generation strategy be very careful with the implementation of Object.equals() and Object.hashCode() in your item class. In particular, if the ItemWriter is going to insert the item into a database and update a primary key field it is not a good idea to use the primary key in the equals and hashCode implementations, because their values will change before and after the call to teh ItemWriter. The best advice is to use a business key to identify the items.

When the retry is exhausted, because a stateful retry is always in a fresh transaction, there is also the option to handle the failed item in a different way, instead of calling the RetryCallback (which is presumed now to be likely to fail). This option is provided by the ItemRecoverer strategy. Like the key generator, it can be directly injected or provided by implementing the interface in the ItemWriter.

The decision to retry or not is actually delegated to a regular stateless retry policy, so the usual concerns about limits and timeouts can be injected into the ItemWriterRetryPolicy through the delegate property.

6.3. Backoff Policies

When retrying after a transient failure it often helps to wait a bit before trying again, because usually the failure is caused by some problem that will only be resolved by waiting. If a RetryCallback fails, the RetryTemplate can pause execution according to the BackoffPolicy in place.

public interface BackoffPolicy {

    BackOffContext start(RetryContext context);

    void backOff(BackOffContext backOffContext) 
        throws BackOffInterruptedException;

}

A BackoffPolicy is free to implement the backOff in any way it chooses. The policies provided by Spring Batch out of the box all use Object.wait(). A common use case is to backoff with an exponentially increasing wait period, to avoid two retries getting into lock step and both failing - this is a lesson learned from the ethernet. For this purpose Spring Batch provides the ExponentialBackoffPolicy.

6.4. Listeners

Often it is useful to be able to receive additional callbacks for cross cutting concerns across a number of different retries. For this purpose Spring Batch provides the RetryListener interface. The RetryTemplate allows users to register RetryListeners, and they will be given callbacks with the RetryContext and Throwable where available during the iteration.

The interface looks like this:

public interface RetryListener {

    void open(RetryContext context, RetryCallback callback);

    void onError(RetryContext context, RetryCallback callback, Throwable e);

    void close(RetryContext context, RetryCallback callback, Throwable e);
}

The open and close callbacks come before and after the entire retry in the simplest case, and onError applies to the individual RetryCallback calls. The close method might also receive a Throwable, if there has been an error it is the last one thrown by the RetryCallback.

Note that when there is more than one listener, they are in a list, so there is an order. In this case open is called in the same order, and onError and close are called in reverse order.

6.5. Declarative Retry

Sometimes there is some business processing that you know you want to retry every time it happens. The classic example of this is the remote service call. Spring Batch provides an AOP interceptor that wraps a method call in a RetryOperations for just this purpose. The RetryOperationsInterceptor executes the intercepted method and retries on failure according to the RetryPolicy in the provided RepeatTemplate.

Here is an example of declarative iteration using the Spring AOP namespace to repeat a service call to a method called remoteCall (for more detail on how to configure AOP interceptors see the Spring User Guide):

<aop:config>
    <aop:pointcut id="transactional"
        expression="execution(* com...*Service.remoteCall(..))" />
    <aop:advisor pointcut-ref="transactional"
        advice-ref="retryAdvice" order="-1"/>
</aop:config>

<bean id="retryAdvice"
    class="org.springframework.batch.retry.interceptor.RetryOperationsInterceptor"/>

The example above uses a default RetryTemplate inside the interceptor. To change the policies, listeners etc. you only need to inject an instance of RetryTemplate into the interceptor.