Batch

This section goes into more detail about Spring Cloud Task’s integration with Spring Batch. Tracking the association between a job execution and the task in which it was executed as well as remote partitioning through Spring Cloud Deployer are covered in this section.

Associating a Job Execution to the Task in which It Was Executed

Spring Boot provides facilities for the execution of batch jobs within a Spring Boot Uber-jar. Spring Boot’s support of this functionality lets a developer execute multiple batch jobs within that execution. Spring Cloud Task provides the ability to associate the execution of a job (a job execution) with a task’s execution so that one can be traced back to the other.

Spring Cloud Task achieves this functionality by using the TaskBatchExecutionListener. By default, this listener is auto configured in any context that has both a Spring Batch Job configured (by having a bean of type Job defined in the context) and the spring-cloud-task-batch jar on the classpath. The listener is injected into all jobs that meet those conditions.

Overriding the TaskBatchExecutionListener

To prevent the listener from being injected into any batch jobs within the current context, you can disable the autoconfiguration by using standard Spring Boot mechanisms.

To only have the listener injected into particular jobs within the context, override the batchTaskExecutionListenerBeanPostProcessor and provide a list of job bean IDs, as shown in the following example:

public static TaskBatchExecutionListenerBeanPostProcessor batchTaskExecutionListenerBeanPostProcessor() {
	TaskBatchExecutionListenerBeanPostProcessor postProcessor =
		new TaskBatchExecutionListenerBeanPostProcessor();

	postProcessor.setJobNames(Arrays.asList(new String[] {"job1", "job2"}));

	return postProcessor;
}
You can find a sample batch application in the samples module of the Spring Cloud Task Project, here.

Remote Partitioning

Spring Cloud Deployer provides facilities for launching Spring Boot-based applications on most cloud infrastructures. The DeployerPartitionHandler and DeployerStepExecutionHandler delegate the launching of worker step executions to Spring Cloud Deployer.

To configure the DeployerStepExecutionHandler, you must provide a Resource representing the Spring Boot Uber-jar to be executed, a TaskLauncherHandler, and a JobExplorer. You can configure any environment properties as well as the max number of workers to be executing at once, the interval to poll for the results (defaults to 10 seconds), and a timeout (defaults to -1 or no timeout). The following example shows how configuring this PartitionHandler might look:

@Bean
public PartitionHandler partitionHandler(TaskLauncher taskLauncher,
		JobExplorer jobExplorer) throws Exception {

	MavenProperties mavenProperties = new MavenProperties();
	mavenProperties.setRemoteRepositories(new HashMap<>(Collections.singletonMap("springRepo",
		new MavenProperties.RemoteRepository(repository))));

 	Resource resource =
		MavenResource.parse(String.format("%s:%s:%s",
				"io.spring.cloud",
				"partitioned-batch-job",
				"1.1.0.RELEASE"), mavenProperties);

	DeployerPartitionHandler partitionHandler =
		new DeployerPartitionHandler(taskLauncher, jobExplorer, resource, "workerStep");

	List<String> commandLineArgs = new ArrayList<>(3);
	commandLineArgs.add("--spring.profiles.active=worker");
	commandLineArgs.add("--spring.cloud.task.initialize.enable=false");
	commandLineArgs.add("--spring.batch.initializer.enabled=false");

	partitionHandler.setCommandLineArgsProvider(
		new PassThroughCommandLineArgsProvider(commandLineArgs));
	partitionHandler.setEnvironmentVariablesProvider(new NoOpEnvironmentVariablesProvider());
	partitionHandler.setMaxWorkers(2);
	partitionHandler.setApplicationName("PartitionedBatchJobTask");

	return partitionHandler;
}
When passing environment variables to partitions, each partition may be on a different machine with different environment settings. Consequently, you should pass only those environment variables that are required.

Notice in the example above that we have set the maximum number of workers to 2. Setting the maximum of workers establishes the maximum number of partitions that should be running at one time.

The Resource to be executed is expected to be a Spring Boot Uber-jar with a DeployerStepExecutionHandler configured as a CommandLineRunner in the current context. The repository enumerated in the preceding example should be the remote repository in which the Spring Boot Uber-jar is located. Both the manager and worker are expected to have visibility into the same data store being used as the job repository and task repository. Once the underlying infrastructure has bootstrapped the Spring Boot jar and Spring Boot has launched the DeployerStepExecutionHandler, the step handler executes the requested Step. The following example shows how to configure the DeployerStepExecutionHandler:

@Bean
public DeployerStepExecutionHandler stepExecutionHandler(JobExplorer jobExplorer) {
	DeployerStepExecutionHandler handler =
		new DeployerStepExecutionHandler(this.context, jobExplorer, this.jobRepository);

	return handler;
}
You can find a sample remote partition application in the samples module of the Spring Cloud Task project, here.

Asynchronously launch remote batch partitions

By default batch partitions are launched sequentially. However, in some cases this may affect performance as each launch will block until the resource (For example: provisioning a pod in Kubernetes) is provisioned. In these cases you can provide a ThreadPoolTaskExecutor to the DeployerPartitionHandler. This will launch the remote batch partitions based on the configuration of the ThreadPoolTaskExecutor. For example:

	@Bean
	public ThreadPoolTaskExecutor threadPoolTaskExecutor() {
		ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
		executor.setCorePoolSize(4);
		executor.setThreadNamePrefix("default_task_executor_thread");
		executor.setWaitForTasksToCompleteOnShutdown(true);
		executor.initialize();
		return executor;
	}

	@Bean
	public PartitionHandler partitionHandler(TaskLauncher taskLauncher, JobExplorer jobExplorer,
		TaskRepository taskRepository, ThreadPoolTaskExecutor executor) throws Exception {
		Resource resource = this.resourceLoader
			.getResource("maven://io.spring.cloud:partitioned-batch-job:2.2.0.BUILD-SNAPSHOT");

		DeployerPartitionHandler partitionHandler =
			new DeployerPartitionHandler(taskLauncher, jobExplorer, resource,
				"workerStep", taskRepository, executor);
	...
	}
We need to close the context since the use of ThreadPoolTaskExecutor leaves a thread active thus the app will not terminate. To close the application appropriately, we will need to set spring.cloud.task.closecontextEnabled property to true.

Notes on Developing a Batch-partitioned application for the Kubernetes Platform

  • When deploying partitioned apps on the Kubernetes platform, you must use the following dependency for the Spring Cloud Kubernetes Deployer:

    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-deployer-kubernetes</artifactId>
    </dependency>
  • The application name for the task application and its partitions need to follow the following regex pattern: [a-z0-9]([-a-z0-9]*[a-z0-9]). Otherwise, an exception is thrown.

Batch Informational Messages

Spring Cloud Task provides the ability for batch jobs to emit informational messages. The “Spring Batch Events” section covers this feature in detail.

Batch Job Exit Codes

As discussed earlier, Spring Cloud Task applications support the ability to record the exit code of a task execution. However, in cases where you run a Spring Batch Job within a task, regardless of how the Batch Job Execution completes, the result of the task is always zero when using the default Batch/Boot behavior. Keep in mind that a task is a boot application and that the exit code returned from the task is the same as a boot application. To override this behavior and allow the task to return an exit code other than zero when a batch job returns an BatchStatus of FAILED, set spring.cloud.task.batch.fail-on-job-failure to true. Then the exit code can be 1 (the default) or be based on the specified ExitCodeGenerator)

This functionality uses a new ApplicationRunner that replaces the one provided by Spring Boot. By default, it is configured with the same order. However, if you want to customize the order in which the ApplicationRunner is run, you can set its order by setting the spring.cloud.task.batch.applicationRunnerOrder property. To have your task return the exit code based on the result of the batch job execution, you need to write your own CommandLineRunner.