Overview of the Spring Batch Core

The Spring Batch Core Domain consists of an API for launching, monitoring and managing batch jobs.

The Spring Batch Core Domain with dependencies to infrastructure indicated schematically.

The figure above shows the central parts of the core domain and its main touch points with the batch application develepor (Job and Step). To launch a job there is a JobLauncher interface that can be used to simplify the launching for dumb clients like JMX or a command line.

A Job is composed of a list of Steps, each of which is executed in turn by the Job. The Step is a central strategy in the Spring Batch Core. Implementations of Step are responsible for sharing the work out, but in ways that the configuration doesn't need to be aware of. For instance, the same or very similar Step configuration might be used in a simple in-process sequential executor, or in a multi-threaded implementation, or one that delegates to remote calls to a distributed system.

The Spring Batch Core Domain extended to include the datababase entities and identifier strategy.

A Job can be re-used to create multiple job instances and this is reflected in the figure above showing an extended picture of the core domain. When a Job is launched it first checks to see if a job with the same JobParameters was already executed. We expect one of the following outcomes, depending on the Job:

  • If the job was not previously launched then it can be created and executed. A new JobInstance is created and stored in a repository (usually a database). A new JobExecution is also created to track the progress of this particular execution.
  • If the job was previously launched and failed the Job is responsible for indicating whether it believes it is restartable (is a restart legal and expected). There is a flag for this purpose on the Job. If the there was a previous failure - maybe the operator has fixed some bad input and wants to run it again - then we might want to restart the previous job.

    If the job was previously launched with teh same JobParameters and completed successfully, then it is an error to restart it. An ad-hoc request needs to be distinguished from previous runs by adding a unique job parameter. In either case a new JobExecution is created and stored to monitor this execution of the JobInstance.

Overview of the Spring Batch Launch Environment

The diagram below provides an overview of the high level components, technical services, and basic operations required by a batch architecture. This architecture framework is a blueprint that has been proven through decades of implementations on the last several generations of platforms (COBOL/Mainframe, C++/Unix, and now Java/anywhere). Simple Batch provides a physical implementation of the layers, components and technical services commonly found in robust, maintainable systems used to address the creation of simple to complex batch applications, with the infrastructure and extensions to address very complex processing needs. The materials below will walk through the details of the diagram.

Simple Batch Launch Environment high level flow and interaction of the architecture.

Tiers

The application style is organized into four logical tiers, which include Run, Job, Application, and Data tiers. The primary goal for organizing an application according to the tiers is to embed what is known as "separation of concerns" within the system. Effective separation of concerns results in reducing the impact of change to the system.

  • Run Tier: The Run Tier is concerned with the scheduling and launching of the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities.
  • Job Tier: The Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced.
  • Application Tier: The Application Tier contains components required to execute the program. It contains specific modules that address the required batch functionality and enforces policies around a module execution (e.g., commit intervals, capture of statistics, etc.)
  • Data Tier: The Data Tier provides the integration with the physical data sources that might include databases, files, or queues. Note: In some cases the Job tier can be completely missing and in other cases one job script can start several batch job instances.