-
When will Batch 2.0 be out? My understanding is that the
current 1.0 and 1.1 releases can be used on projects, is that
correct?
-
We expect Batch 2.0 to be released in the Fall of 2008.
You can track the progress and planning in JIRA
(http://opensource.atlassian.com/projects/spring/browse/BATCH).
Batch 1.0 and 1.1 are stable and very much usable. There will be
some migration work if you decide to upgrade to 2.0 at a later point.
It's perfectly reasonable to not upgrade an existing project and
continue to use the Batch 1.1.x releases. This is especially true
if you use Java 1.4 since Spring 2.0 is Java 5 or later only.
-
When will support for the more complex job execution
classes appear? Our Client has a number of
multi-threaded batch jobs.
-
Multi-threaded execution in a single VM is perfectly
possible with 1.0 - but we recommend exercising
caution in the analysis of such requirements (is it
really necessary?). Several people have tried it and
it works as long as the step is intrinsically
restartable (idempotent effectively). The parellel
job sample shows how it might work in practice - this
uses a "process indicator" pattern to mark input
records as complete, inside the business transaction.
-
There are 4 main layers of the architecture
(application, core, execution, and infrastructure), what
is the vision for how the execution layer might be used
in future? What happened to the "container" layer?
-
The "layers" described are nicely segregated in
terms of dependency. Each layer only depends (at
compile time) on layers below it.
We recognised that what we used to call the
container layer actually is composed of two distinct
contexts, "Core" and "Execution". So the full
catalogue of contexts is:
-
Application
is the business logic. It is written by the
application developer - the client of Spring
Batch - and only depends on the other Core
interfaces for compilation and
configuration.
-
Core
is the public API of Spring Batch, including
the core batch domain of Job, Step,
configuration and Executor interfaces.
-
Execution
is the deployment, execution and management
concerns. Different execution environments
(e.g. in a JEE container, out of container)
are configured differently, but can execute
the same application business logic.
-
Infrastructure
is a set of low level tools, that are used
to implement the execution and parts of the
core layers.
The "execution" layer is fertile ground for
collaboration and contributions from the community
and from projects in the field. The central
interface is
JobLauncher
with methods for starting and stopping jobs. The
vision for this is that there can be multiple
implementations of
JobLauncher
providing different architectural patterns, and
delivering different levels of scalability and
robustness, without changing either the business
logic or the job configuration.
-
What is the Spring Batch philosophy on the use of
flexible strategies and default implementations?
-
There are a great many extension points in Spring Batch
for the framework developer (as opposed to the
implementor of business logic). We expect clients to
create their own more specific strategies that can be
plugged in to control things like commit intervals (
CompletionPolicy
), rules about how to deal with exceptions (
ExceptionHandler
), and many others.
-
How does Spring Batch differ from Quartz? Is there a
place for them both in a solution?
-
Spring Batch and Quartz have different goals. Spring
Batch provides functionality for processing large
volumes of data and Quartz provides functionality
for scheduling tasks. So Quartz could complement
Spring Batch, but are not excluding technologies. A
common combination would be to use Quartz as a
trigger for a Spring Batch job using a Cron
expression and the Spring Core convenience
SchedulerFactoryBean
.
-
How do I schedule a job with Spring Batch?
-
Use a scheduling tool. There are plenty of them out
there. Examples: Quartz, Control-M, Autosys. Quartz
doesn't have all the features of Control-M or
Autosys - it is supposed to be lightweight. If you
want something even more lightweight you can just
use the OS (cron, at, etc.).
Simple sequential dependencies can be implemented
using the job-steps model of Spring Batch. We think
this is quite common. And in fact it makes it easier
to correct a common mis-use of scehdulers - having
hundreds of jobs configured, many of which are not
independent, but only depend on one other.
-
How will Spring Batch allow project to optimize for
performance and scalability (through parallel processing
or other)?
-
We see this as one of the roles of the Execution layer.
A specific implementation (or implementations) of the
StepExecutor
can deal with the concern of breaking apart the business
logic and sharing it efficiently between parallel
processes or processors. There are a number of
technologies that could play a role here. The essence is
just a set of concurrent remote calls to distributed
agents that can handle some business processing. Since
the business processing is already typically modularised
- e.g. input an item, process it - Spring Batch can
strategise the distribution in a number of ways. One
implementation that we have had some experience with
(and have a prototype for) is a set of remote EJBs
handling the business processing. We switch off Home
caching in the container and then send a specific range
of primary keys for the inputs to each of a number of
remote calls. The same basic strategy would work with
any of the Spring Remoting protocols (plain RMI,
HttpInvoker, JMS, Hessian etc.) with little more than a
couple of lines change in the execution layer
configuration.
-
What are the key concepts in the Spring Batch core
domain?
-
In a nutshell: A JobConfiguration with a list of
StepConfigurations is passed to a JobExecutor.
From this a Job is constructed consisting of a
series of Steps, each of which is executed by a
StepExecutor. The StepExecutor contains all the
strategies for deciding when to complete, when to
commit, when to abort and when to continue.
Many Jobs in practice consist of a single Step. Step
is very useful and best practice for breaking a Job
down into logical units, rather than having to
execute separate Jobs (potentially in separate OS
processes) which have no obvious logical connection.
Jobs can be executed once, or many times with
different logical identifiers (JobIdentifier). It is
also possible to restart a failed Job with the same
or a modified input source, and identify the
resulting JobExecution as a separate entity. In this
way the progress of a Job and its history of
successful and failed executions can easily be
tracked. The same argument applies to Steps, which
have their corresponding StepExecution entity.
-
How can messaging be used to scale batch architectures?
-
There is a good deal of practical evidence from existing
projects that a pipeline approach to batch processing is
highly beneficial, leading to resilience and high
throughput. We are often faced with mission-critical
applications where audit trails are essential, and
guaranteed processing is demanded, but where there are
extremely tight limits on performance under load, or
where high throughput gives a competitive advantage.
Matt Welsh's work shows that a Staged Event Driven
Architecture (SEDA) has enormous benefits over more
rigid processing architectures, and message-oriented
middleware (JMS, AQ, MQ, Tibco etc.) gives us a lot of
resilience out of the box. There are particular benefits
in a system where there is feedback between downstream
and upstream stages, so the number of consumers can be
adjusted to account for the amount of demand. So how
does this fit into Spring Batch? Well it's a good
example of an
StepExecutor
or (more broadly) execution runtime if the deployment is
grid- or cluster-based, or in any way involves multiple
OS processes.
- How can I contribute to Spring Batch?
-
Use JIRA and the forum to get involved in discussions
about the product and its design. There is a process for
contributions and eventually becoming a committer. The
process is pretty standard for all Apache-licensed
projects. You make contributions through JIRA (so sign
up now); you assign the copyright of any contributions
using a standard Apache-like CLA (see the Apache one for
example - ours might be slightly different); when the
contributions reach a certain level, or you convince us
otherwise that you are going to be committed long term,
even if part time, then you can become a committer.