To any experienced batch architect, the overall concepts of batch processing used in Spring Batch should be familiar and comfortable. There are “Jobs” and “Steps” and developer supplied processing units called ItemReaders and ItemWriters. However, because of the Spring patterns, operations, templates, callbacks, and idioms, there are opportunities for the following:
significant improvement in adherence to a clear separation of concerns
clearly delineated architectural layers and services provided as interfaces
simple and default implementations that allowed for quick adoption and ease of use out-of-the-box
significantly enhanced extensibility
The diagram below is only a slight variation of the batch reference architecture that has been used for decades. It provides an overview of the high level components, technical services, and basic operations required by a batch architecture. This architecture framework is a blueprint that has been proven through decades of implementations on the last several generations of platforms (COBOL/Mainframe, C++/Unix, and now Java/anywhere). JCL and COBOL developers are likely to be as comfortable with the concepts as C++, C# and Java developers. Spring Batch provides a physical implementation of the layers, components and technical services commonly found in robust, maintainable systems used to address the creation of simple to complex batch applications, with the infrastructure and extensions to address very complex processing needs.

Figure 2.1: Batch Stereotypes
The above diagram highlights the interactions and key services
    provided by the Spring Batch framework. The colors used are important to
    understanding the responsibilities of a developer in Spring Batch. Grey
    represents an external application such as an enterprise scheduler or a
    database. It's important to note that scheduling is grey, and should thus
    be considered separate from Spring Batch. Blue represents application
    architecture services. In most cases these are provided by Spring Batch
    with out of the box implementations, but an architecture team may make
    specific implementations that better address their specific needs. Yellow
    represents the pieces that must be configured by a developer. For example,
    a job schedule needs to be configured so that the job is kicked off at the
    appropriate time. A job configuration file also needs to be created, which
    defines how a job will be run. It is also worth noting that the
    ItemReader and ItemWriter
    used by an application may just as easily be a custom one made by a
    developer for their specific batch job, rather than one provided by Spring
    Batch or an architecture team.
The Batch Application Style is organized into four logical tiers, which include Run, Job, Application, and Data. The primary goal for organizing an application according to the tiers is to embed what is known as "separation of concerns" within the system. These tiers can be conceptual but may prove effective in mapping the deployment of the artifacts onto physical components like Java runtimes and integration with data sources and targets. Effective separation of concerns results in reducing the impact of change to the system. The four conceptual tiers containing batch artifacts are:
Run Tier: The Run Tier is concerned with the scheduling and launching of the application. A vendor product is typically used in this tier to allow time-based and interdependent scheduling of batch jobs as well as providing parallel processing capabilities.
Job Tier: The Job Tier is responsible for the overall execution of a batch job. It sequentially executes batch steps, ensuring that all steps are in the correct state and all appropriate policies are enforced.
Application Tier: The Application Tier contains components required to execute the program. It contains specific tasks that address required batch functionality and enforces policies around execution (e.g., commit intervals, capture of statistics, etc.)
Data Tier: The Data Tier provides integration with the physical data sources that might include databases, files, or queues.
This section describes stereotypes relating to the concept of a
    batch job. A Job is an entity that encapsulates an
    entire batch process. As is common with other Spring projects, a
    Job will be wired together via an XML configuration
    file. This file may be referred to as the "job configuration". However,
    Job is just the top of an overall hierarchy:

A job is represented by a Spring bean that implements the
      Job interface and contains all of the information
      necessary to define the operations performed by a job. A job
      configuration is typically contained within a Spring XML configuration
      file and the job's name is determined by the "id" attribute associated
      with the job configuration bean. The job configuration contains
The simple name of the job
Definition and ordering of Steps
Whether or not the job is restartable
A default simple implementation of the Job
      interface is provided by Spring Batch in the form of the
      SimpleJob class which creates some standard
      functionality on top of Job, namely a standard
      execution logic that all jobs should utilize. In general, all jobs
      should be defined using a bean of type
      SimpleJob:
  <bean id="footballJob"
        class="org.springframework.batch.core.job.SimpleJob">
    <property name="steps">
      <list>
        <!-- Step Bean details ommitted for clarity -->
        <bean id="playerload" parent="simpleStep" />
        <bean id="gameLoad" parent="simpleStep" />
        <bean id="playerSummarization" parent="simpleStep" />
      </list>
    </property>
    <property name="restartable" value="true" />
  </bean>A JobInstance refers to the concept of a
      logical job run. Let's consider a batch job that should be run once at
      the end of the day, such as the 'EndOfDay' job from the diagram above.
      There is one 'EndOfDay' Job, but each individual
      run of the Job must be tracked separately. In the
      case of this job, there will be one logical
      JobInstance per day. For example, there will be a
      January 1st run, and a January 2nd run. If the January 1st run fails the
      first time and is run again the next day, it's still the January 1st
      run. (Usually this corresponds with the data its processing as well,
      meaning the January 1st run processes data for January 1st, etc) That is
      to say, each JobInstance can have multiple
      executions. (JobExecution is discussed in more
      detail below) and only one JobInstance
      corresponding to a particular Job can be running
      at a given time. The definition of a JobInstance
      has absolutely no bearing on the data the will be loaded. It is entirely
      up to the ItemReader implementation used to
      determine how data will be loaded. For example, in the EndOfDay
      scenario, there may be a column on the data that indicates the
      'effective date' or 'schedule date' to which the data belongs. So, the
      January 1st run would only load data from the 1st, and the January 2nd
      run would only use data from the 2nd. Because this determination will
      likely be a business decision, it is left up to the
      ItemReader to decide. What using the same
      JobInstance will determine, however, is whether
      or not the 'state' (i.e. the ExecutionContext, which is discussed below)
      from previous executions will be used. Using a new
      JobInstance will mean 'start from the beginning'
      and using an existing instance will generally mean 'start from where you
      left off'.
Having discussed JobInstance and how it
      differs from Job, the natural question to ask is:
      "how is one JobInstance distinguished from
      another?" The answer is: JobParameters.
      JobParameters are any set of parameters used to
      start a batch job, which can be used for identification or even as
      reference data during the run. In the example above, where there are two
      instances, one for January 1st, and another for January 2nd, there is
      really only one Job, one that was started with a job parameter of
      01-01-2008 and another that was started with a parameter of 01-02-2008.
      Thus, the contract can be defined as: JobInstance
      = Job + JobParameters.
      This allows a developer to effectively control how you a
      JobInstance is defined, since they control what
      parameters are passed in.
A JobExecution refers to the technical
      concept of a single attempt to run a Job. An
      execution may end in failure or success, but the
      JobInstance corresponding to a given execution
      will not be considered complete unless the execution completes
      successfully. Using the EndOfDay Job described
      above as an example, consider a JobInstance for 01-01-2008 that failed
      the first time it was run. If it is ran again, with the same job
      parameters as the first run (01-01-2008), a new JobExecution will be
      created. However, there will still be only one
      JobInstance.
A Job defines what a job is and how it is
      to be executed, and JobInstance is a purely
      organizational object to group executions together, primarily to enable
      correct restart semantics. A JobExecution,
      however, is the primary storage mechanism for what actually happened
      during a run, and as such contains many more properties that must be
      controlled and persisted:
Table 2.1. JobExecution properties
| status | A BatchStatusobject that
              indicates the status of the execution. While it's running, it's
              BatchStatus.STARTED, if it fails it's BatchStatus.FAILED, and if
              it finishes successfully it's BatchStatus.COMPLETED | 
| startTime | A java.util.Daterepresenting the
              current system time when the execution was started. | 
| endTime | A java.util.Daterepresenting the
              current system time when the execution finished, regardless of
              whether or not it was successful. | 
| exitStatus | The ExitStatusindicating the
              result of the run. It is most important because it contains an
              exit code that will be returned to the caller. See chapter 5 for
              more details. | 
| createTime | A java.util.Daterepresenting the
              current system time when the JobExecution was first persisted.
              The job may not have been started yet (and thus has no start
              time), but it will always have a createTime, which is required
              by the framework for managing job level
              ExecutionContexts. | 
These properties are important because they will be persisted and can be used to completely determine the status of an execution. For example, if the EndOfDay job for 01-01 is executed at 9:00 PM, and fails at 9:30, the following entries will be made in the batch meta data tables:
Table 2.3. BATCH_JOB_PARAMS
| JOB_INSTANCE_ID | TYPE_CD | KEY_NAME | DATE_VAL | 
| 1 | DATE | schedule.Date | 2008-01-01 00:00:00 | 
Table 2.4. BATCH_JOB_EXECUTION
| JOB_EXECUTION_ID | JOB_INSTANCE_ID | START_TIME | END_TIME | STATUS | 
| 1 | 1 | 2008-01-01 21:00:23.571 | 2008-01-01 21:30:17.132 | FAILED | 
extra columns in the tables have been removed for added clarity.
Now that the job has failed, let's assume that it took the entire
      course of the night for the problem to be determined, so that the 'batch
      window' is now closed. Assuming the window starts at 9:00 PM, the job
      will be kicked off again for 01-01, starting where it left off and
      completing successfully at 9:30. Because it's now the next day, the
      01-02 job must be run as well, which is kicked off just afterwards at
      9:31, and completes in it's normal one hour time at 10:30. There is no
      requirement that one JobInstance be kicked off
      after another, unless there is potential for the two jobs to attempt to
      access the same data, causing issues with locking at the database level.
      It is entirely up to the scheduler to determine when a
      Job should be run. Since they're separate
      JobInstances, Spring Batch will make no attempt to stop them from being
      run concurrently. (Attempting to run the same
      JobInstance while another is already running will
      result in a JobExecutionAlreadyRunningException
      being thrown) There should now be an extra entry in both the
      JobInstance and
      JobParameters tables, and two extra entries in
      the JobExecution table:
Table 2.6. BATCH_JOB_PARAMS
| JOB_INSTANCE_ID | TYPE_CD | KEY_NAME | DATE_VAL | 
| 1 | DATE | schedule.Date | 2008-01-01 00:00:00 | 
| 2 | DATE | schedule.Date | 2008-01-02 00:00:00 | 
Table 2.7. BATCH_JOB_EXECUTION
| JOB_EXECUTION_ID | JOB_INSTANCE_ID | START_TIME | END_TIME | STATUS | 
| 1 | 1 | 2008-01-01 21:00 | 2008-01-01 21:30 | FAILED | 
| 2 | 1 | 2008-01-02 21:00 | 2008-01-02 21:30 | COMPLETED | 
| 3 | 2 | 2008-01-02 21:31 | 2008-01-02 22:29 | COMPLETED | 
A Step is a domain object that encapsulates
    an independent, sequential phase of a batch job. Therefore, every
    Job is composed entirely of one or more steps. A
    Step should be thought of as a unique processing
    stream that will be executed in sequence. For example, if you have one
    step that loads a file into a database, another that reads from the
    database, validates the data, preforms processing, and then writes to
    another table, and another that reads from that table and writes out to a
    file. Each of these steps will be performed completely before moving on to
    the next step. The file will be completely read into the database before
    step 2 can begin. As with Job, a
    Step has an individual
    StepExecution that corresponds with a unique
    JobExecution:

A Step contains all of the information
      necessary to define and control the actual batch processing. This is a
      necessarily vague description because the contents of any given
      Step are at the discretion of the developer
      writing a Job. A Step can be as simple or complex
      as the developer desires. A simple Step might
      load data from a file into the database, requiring little or no code.
      (depending upon the implementations used) A more complex
      Step may have complicated business rules that are
      applied as part of the processing.
Steps are defined by instantiating implementations of the
      Step interface. Two step implementation classes
      are available in the Spring Batch framework, and they are each discussed
      in detail in Chatper 4 of this guide. For most situations, the
      ItemOrientedStep implementation is sufficient,
      but for situations where only one call is needed, such as a stored
      procedure call or a wrapper around existing script, a
      TaskletStep may be a better option.
A StepExecution represents a single attempt
      to execute a Step. Using the example from
      JobExecution, if there is a
      JobInstance for the "EndOfDayJob", with
      JobParameters of "01-01-2008" that fails to
      successfully complete its work the first time it is run, when it is
      executed again, a new StepExecution will be
      created. Each of these step executions may represent a different
      invocation of the batch framework, but they will all correspond to the
      same JobInstance, just as multiple
      JobExecutions belong to the same
      JobInstance. However, if a step fails to execute
      because the step before it fails, there will be no execution persisted
      for it. An execution will only be created when the
      Step is actually started.
Step executions are represented by objects of the
      StepExecution class. Each execution contains a
      reference to its corresponding step and
      JobExecution, and transaction related data such
      as commit and rollback count and start and end times. Additionally, each
      step execution will contain an ExecutionContext,
      which contains any data a developer needs persisted across batch runs,
      such as statistics or state information needed to restart. The following
      is a listing of the properties for
      StepExecution:
Table 2.8. StepExecution properties
| status | A BatchStatusobject that
              indicates the status of the execution. While it's running, the
              status is BatchStatus.STARTED, if it fails the status is
              BatchStatus.FAILED, and if it finishes successfully the status
              is BatchStatus.COMPLETED | 
| startTime | A java.util.Daterepresenting the
              current system time when the execution was started. | 
| endTime | A java.util.Daterepresenting the
              current system time when the execution finished, regardless of
              whether or not it was successful. | 
| exitStatus | The ExitStatusindicating the
              result of the execution. It is most important because it
              contains an exit code that will be returned to the caller. See
              chapter 5 for more details. | 
| executionContext | The 'property bag' containing any user data that needs to be persisted between executions. | 
| commitCount | The number transactions that have been committed for this execution | 
| itemCount | The number of items that have been processed for this execution. | 
| rollbackCount | The number of times the business transaction controlled
              by the Stephas been rolled back. | 
| readSkipCount | The number of times readhas
              failed, resulting in a skipped item. | 
| writeSkipCount | The number of times writehas
              failed, resulting in a skipped item. | 
An ExecutionContext represents a collection
      of key/value pairs that are persisted and controlled by the framework in
      order to allow developers a place to store persistent state that is
      scoped to a StepExecution or
      JobExecution. For those familiar with Quartz, it
      is very similar to JobDataMap. The best usage
      example is restart. Using flat file input as an example, while
      processing individual lines, the framework periodically persists the
      ExecutionContext at commit points. This allows
      the ItemReader to store its state in case a fatal
      error occurs during the run, or even if the power goes out. All that is
      needed is to put the current number of lines read into the context, and
      the framework will do the rest:
executionContext.putLong(getKey(LINES_READ_COUNT), reader.getPosition());
Using the EndOfDay example from the Job Stereotypes section as an example, assume there's one step: 'loadData', that loads a file into the database. After the first failed run, the meta data tables would look like the following:
Table 2.10. BATCH_JOB_PARAMS
| JOB_INSTANCE_ID | TYPE_CD | KEY_NAME | DATE_VAL | 
| 1 | DATE | schedule.Date | 2008-01-01 00:00:00 | 
Table 2.11. BATCH_JOB_EXECUTION
| JOB_EXECUTION_ID | JOB_INSTANCE_ID | START_TIME | END_TIME | STATUS | 
| 1 | 1 | 2008-01-01 21:00:23.571 | 2008-01-01 21:30:17.132 | FAILED | 
Table 2.12. BATCH_STEP_EXECUTION
| STEP_EXECUTION_ID | JOB_EXECUTION_ID | STEP_NAME | START_TIME | END_TIME | STATUS | 
| 1 | 1 | loadDate | 2008-01-01 21:00:23.571 | 2008-01-01 21:30:17.132 | FAILED | 
In this case, the Step ran for 30
      minutes and processed 40,321 'pieces', which would represent lines in a
      file in this scenario. This value will be updated just before each
      commit by the framework, and can contain multiple rows corresponding to
      entries within the ExecutionContext. Being
      notified before a commit requires one of the various StepListeners, or
      an ItemStream, which are discussed in more detail
      later in this guide. As with the previous example, it is assumed that
      the Job is restarted the next day. When it is restarted, the values from
      the ExecutionContext of the last run are
      reconstituted from the database, and when the
      ItemReader is opened, it can check to see if it
      has any stored state in the context, and initialize itself from
      there:
  if (executionContext.containsKey(getKey(LINES_READ_COUNT))) {
    log.debug("Initializing for restart. Restart data is: " + executionContext);
    long lineCount = executionContext.getLong(getKey(LINES_READ_COUNT));
    LineReader reader = getReader();
    Object record = "";
    while (reader.getPosition() < lineCount && record != null) {
       record = readLine();
    }
  }In this case, after the above code is executed, the current line
      will be 40,322, allowing the Step to start again
      from where it left off. The ExecutionContext can
      also be used for statistics that need to be persisted about the run
      itself. For example, if a flat file contains orders for processing that
      exist across multiple lines, it may be necessary to store how many
      orders have been processed (which is much different from than the number
      of lines read) so that an email can be sent at the end of the
      Step with the total orders processed in the body.
      The framework handles storing this for the developer, in order to
      correctly scope it with an individual
      JobInstance. It can be very difficult to know
      whether an existing ExecutionContext should be
      used or not. For example, using the 'EndOfDay' example from above, when
      the 01-01 run starts again for the second time, the framework recognizes
      that it is the same JobInstance and on an
      individual Step basis, pulls the
      ExecutionContext out of the database and hands it
      as part of the StepExecution to the
      Step itself. Conversely, for the 01-02 run the
      framework recognizes that it is a different instance, so an empty
      context must be handed to the Step. There are
      many of these types of determinations that the framework makes for the
      developer to ensure the state is given to them at the correct time. It
      is also important to note that exactly one
      ExecutionContext exists per
      StepExecution at any given time. Clients of the
      ExecutionContext should be careful because this
      creates a shared keyspace, so care should be taken when putting values
      in to ensure no data is overwritten, however, the
      Step stores absolutely no data in the context, so
      there is no way to adversely affect the framework.
JobRepository is the persistence mechanism
    for all of the Stereotypes mentioned above. When a job is first launched,
    a JobExecution is obtained by calling the
    repository's createJobExecution method, and
    during the course of execution, StepExecution and
    JobExecution are persisted by passing them to the
    repository:
  public interface JobRepository {
    public JobExecution createJobExecution(Job job, JobParameters jobParameters)
         throws JobExecutionAlreadyRunningException, JobRestartException;
    void saveOrUpdate(JobExecution jobExecution);
    void saveOrUpdate(StepExecution stepExecution);
    void saveOrUpdateExecutionContext(StepExecution stepExecution);
    StepExecution getLastStepExecution(JobInstance jobInstance, Step step);
    int getStepExecutionCount(JobInstance jobInstance, Step step);
}
JobLauncher represents a simple interface for
    launching a Job with a given set of
    JobParameters:
  public interface JobLauncher {
    public JobExecution run(Job job, JobParameters jobParameters) throws JobExecutionAlreadyRunningException,
        JobRestartException;
}
It is expected that implementations will obtain a valid
    JobExecution from the
    JobRepository and execute the
    Job.
JobLocator represents an interface for
    locating a Job:
  public interface JobLocator {
    Job getJob(String name) throws NoSuchJobException;
  }This interface is very necessary due to the nature of Spring itself.
    Because it can't be guaranteed that one
    ApplicationContext equals one
    Job, an abstraction is needed to obtain a
    Job for a given name. It becomes especially useful
    when launching jobs from within a Java EE application server.
ItemReader is an abstraction that represents
    the retrieval of input for a Step, one item at a
    time. When the ItemReader has exhausted the items
    it can provide, it will indicate this by returning null. More details
    about the ItemReader interface and its various
    implementations can be found in Chapter 3.
ItemWriter is an abstraction that represents
    the output of a Step, one item at a time.
    Generally, an item writer has no knowledge of the input it will receive
    next, only the item that was passed in its current invocation. More
    details about the ItemWriter interface and it's
    various implementations can be found in Chapter 3.
A Tasklet represents the execution of a
    logical unit of work, as defined by its implementation of the Spring Batch
    provided Tasklet interface. A
    Tasklet is useful for encapsulating processing
    logic that is not natural to split into read-(transform)-write phases,
    such as invoking a system command or a stored procedure.