Spring Batch - Frequently Asked Questions

Frequently Asked Questions

What's the current release and what are the plans for future releases?
Is it possible to execute jobs in multiple threads or multiple processes?
How can I make an item reader thread safe
What is the Spring Batch philosophy on the use of flexible strategies and default implementations? Can you add a public getter for this or that property?
How does Spring Batch differ from Quartz? Is there a place for them both in a solution?
How do I schedule a job with Spring Batch?
How does Spring Batch allow project to optimize for performance and scalability (through parallel processing or other)?
How can messaging be used to scale batch architectures?
How can I contribute to Spring Batch?

What's the current release and what are the plans for future releases?

You can track the progress and planning in JIRA (http://opensource.atlassian.com/projects/spring/browse/BATCH).

Is it possible to execute jobs in multiple threads or multiple processes?

There are three ways to approach this - but we recommend exercising caution in the analysis of such requirements (is it really necessary?).

Add a TaskExecutor to the repeatTemplate used to control step execution (the outer step operations). The FactoryBeans provided for configuring Steps (e.g. FaultTolerantStepFactoryBean) have a "taskExecutor" property you can set. This works as long as the step is intrinsically restartable (idempotent effectively). The parallel job sample shows how it might work in practice - this uses a "process indicator" pattern to mark input records as complete, inside the business transaction.
Use the PartitionStep to split your step execution explicitly amongst several Step instances. Spring Batch has a local multi-threaded implementation of the main strategy for this (PartitionHandler), which makes it a great choice for IO intensive jobs. Remember to use scope="step" for the stateful components in a step executing in this fashion, so that separate instances are created per step execution, and there is no cross talk between threads. See below for more details.
Use the Remote Chunking approach as implemented in the spring-batch-integration subproject. This requires some durable middleware (e.g. JMS) for reliable communication between the driving step and the remote workers. The basic idea is to use a special ItemWriter on the driving process, and a listener pattern on the worker processes (via a ChunkProcessor). See below for more details.

[top]

How can I make an item reader thread safe

You can synchronize the read() method (e.g. by wrapping it in a delegator that does the synchronization). Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set saveState=false on the reader.

[top]

What is the Spring Batch philosophy on the use of flexible strategies and default implementations? Can you add a public getter for this or that property?

There are a great many extension points in Spring Batch for the framework developer (as opposed to the implementor of business logic). We expect clients to create their own more specific strategies that can be plugged in to control things like commit intervals ( CompletionPolicy ), rules about how to deal with exceptions ( ExceptionHandler ), and many others.

In general we try to dissuade users from extending framework classes. The Java language doesn't give us as much flexibility to mark classes and interfaces as internal. Generally you can expect anything at the top level of the source tree in packages org.springframework.batch.* to be public, but not necessarily sub-classable. Extending our concrete implementations of most strategies is discouraged in favour of a composition or forking approach. If your code can use only the interfaces from Spring Batch, that gives you the greatest possible portability.

[top]

How does Spring Batch differ from Quartz? Is there a place for them both in a solution?

Spring Batch and Quartz have different goals. Spring Batch provides functionality for processing large volumes of data and Quartz provides functionality for scheduling tasks. So Quartz could complement Spring Batch, but are not excluding technologies. A common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression and the Spring Core convenience SchedulerFactoryBean .

[top]

How do I schedule a job with Spring Batch?

Use a scheduling tool. There are plenty of them out there. Examples: Quartz, Control-M, Autosys. Quartz doesn't have all the features of Control-M or Autosys - it is supposed to be lightweight. If you want something even more lightweight you can just use the OS (cron, at, etc.).

Simple sequential dependencies can be implemented using the job-steps model of Spring Batch, and the non-sequential features in Spring Batch 2.0. We think this is quite common. And in fact it makes it easier to correct a common mis-use of scehdulers - having hundreds of jobs configured, many of which are not independent, but only depend on one other.

[top]

How does Spring Batch allow project to optimize for performance and scalability (through parallel processing or other)?

We see this as one of the roles of the Job or Step. A specific implementation of the Step deals with the concern of breaking apart the business logic and sharing it efficiently between parallel processes or processors (see PartitionStep ). There are a number of technologies that could play a role here. The essence is just a set of concurrent remote calls to distributed agents that can handle some business processing. Since the business processing is already typically modularised - e.g. input an item, process it - Spring Batch can strategise the distribution in a number of ways. One implementation that we have had some experience with is a set of remote web services handling the business processing. We send a specific range of primary keys for the inputs to each of a number of remote calls. The same basic strategy would work with any of the Spring Remoting protocols (plain RMI, HttpInvoker, JMS, Hessian etc.) with little more than a couple of lines change in the execution layer configuration.

[top]

How can messaging be used to scale batch architectures?

There is a good deal of practical evidence from existing projects that a pipeline approach to batch processing is highly beneficial, leading to resilience and high throughput. We are often faced with mission-critical applications where audit trails are essential, and guaranteed processing is demanded, but where there are extremely tight limits on performance under load, or where high throughput gives a competitive advantage. Matt Welsh's work shows that a Staged Event Driven Architecture (SEDA) has enormous benefits over more rigid processing architectures, and message-oriented middleware (JMS, AQ, MQ, Tibco etc.) gives us a lot of resilience out of the box. There are particular benefits in a system where there is feedback between downstream and upstream stages, so the number of consumers can be adjusted to account for the amount of demand. So how does this fit into Spring Batch? The spring-batch-integration project has this pattern implemented in Spring Integration, and can be used to scale up the remote processing of any step with many items to process. See in particular the "chunk" package, and the ItemWriter and ChunkHandler implementations in there.

[top]

How can I contribute to Spring Batch?

Use the community forum to get involved in discussions about the product and its design. There is a process for contributions and eventually becoming a committer. The process is pretty standard for all Apache-licensed projects. You make contributions through JIRA (so sign up now); you assign the copyright of any contributions using a standard Apache-like CLA (see the Apache one for example - ours might be slightly different); when the contributions reach a certain level, or you convince us otherwise that you are going to be committed long term, even if part time, then you can become a committer.

[top]

Documentation

Support

Modules

Project Documentation

Frequently Asked Questions