4. Task / Batch
Prev		Next

4. Task / Batch

4.1 Batch Job on Cloud Foundry

In this demonstration, you will learn how to orchestrate short-lived data processing application (eg: Spring Batch Jobs) using Spring Cloud Task and Spring Cloud Data Flow on Cloud Foundry.

4.1.1 Prerequisites

Local PCFDev instance
Local install of cf CLI command line tool
Running instance of mysql in PCFDev
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download or you can build it yourself.

	Note
	the Spring Cloud Data Flow Shell and Local server implementation are in the same repository and are both built by running `./mvnw install` from the project root directory. If you have already run the build, use the jar in `spring-cloud-dataflow-shell/target`

To run the Shell open a new terminal session:

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/


Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

	Note
	The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the Data Flow Server’s REST API and supports a DSL that simplifies the process of defining a stream or task and managing its lifecycle. Most of these samples use the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard, (or wherever it the server is hosted) to perform equivalent operations.

Spring Cloud Data Flow installed on Cloud Foundry Follow the installation instructions to run Spring Cloud Data Flow on Cloud Foundry.

4.1.2 Building and Running the Demo

	Note
	PCF 1.7.12 or greater is required to run Tasks on Spring Cloud Data Flow. As of this writing, PCFDev and PWS supports builds upon this version.

Task support needs to be enabled on pcf-dev. Being logged as admin, issue the following command:

cf enable-feature-flag task_creation
Setting status of task_creation as admin...

OK

Feature task_creation Enabled.

	Note
	For this sample, all you need is the `mysql` service and in PCFDev, the `mysql` service comes with a different plan. From CF CLI, create the service by: `cf create-service p-mysql 512mb mysql` and bind this service to `dataflow-server` by: `cf bind-service dataflow-server mysql`.

Note

All the apps deployed to PCFDev start with low memory by default. It is recommended to change it to at least 768MB for dataflow-server. Ditto for every app spawned by Spring Cloud Data Flow. Change the memory by: cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_MEMORY 512. Likewise, we would have to skip SSL validation by: cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION true.

Tasks in Spring Cloud Data Flow require an RDBMS to host "task repository" (see here for more details), so let’s instruct the Spring Cloud Data Flow server to bind the mysql service to each deployed task:
```
$ cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES mysql
$ cf restage dataflow-server
```
Note
We only need mysql service for this sample.

As a recap, here is what you should see as configuration for the Spring Cloud Data Flow server:

cf env dataflow-server

....
User-Provided:
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_MEMORY: 512
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: pass
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: false
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: mysql
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io
SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: user

No running env variables have been set

No staging env variables have been set

Notice that dataflow-server application is started and ready for interaction via dataflow-server.local.pcfdev.io endpoint

Build and register the batch-job example from Spring Cloud Task samples. For convenience, the final uber-jar artifact is provided with this sample.

dataflow:>app register --type task --name simple_batch_job --uri https://github.com/spring-cloud/spring-cloud-dataflow-samples/raw/master/src/main/asciidoc/tasks/simple-batch-job/batch-job-1.3.0.BUILD-SNAPSHOT.jar

Create the task with simple-batch-job application
```
dataflow:>task create foo --definition "simple_batch_job"
```
Note
Unlike Streams, the Task definitions don’t require explicit deployment. They can be launched on-demand, scheduled, or triggered by streams.

Verify there’s still no Task applications running on PCFDev - they are listed only after the initial launch/staging attempt on PCF

$ cf apps
Getting apps in org pcfdev-org / space pcfdev-space as user...
OK

name              requested state   instances   memory   disk   urls
dataflow-server   started           1/1         768M     512M   dataflow-server.local.pcfdev.io

Let’s launch foo
```
dataflow:>task launch foo
```

Verify the execution of foo by tailing the logs

$ cf logs foo
Retrieving logs for app foo in org pcfdev-org / space pcfdev-space as user...

2016-08-14T18:48:54.22-0700 [APP/TASK/foo/0]OUT Creating container
2016-08-14T18:48:55.47-0700 [APP/TASK/foo/0]OUT

2016-08-14T18:49:06.59-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.598  INFO 14 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job1]] launched with the following parameters: [{}]

...
...

2016-08-14T18:49:06.78-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.785  INFO 14 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job1]] completed with the following parameters: [{}] and the following status: [COMPLETED]

...
...

2016-08-14T18:49:07.36-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.363  INFO 14 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job2]] launched with the following parameters: [{}]

...
...

2016-08-14T18:49:07.53-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.536  INFO 14 --- [           main] o.s.b.c.l.support.SimpleJobLauncher      : Job: [SimpleJob: [name=job2]] completed with the following parameters: [{}] and the following status: [COMPLETED]

...
...

2016-08-14T18:49:07.71-0700 [APP/TASK/foo/0]OUT Exit status 0
2016-08-14T18:49:07.78-0700 [APP/TASK/foo/0]OUT Destroying container
2016-08-14T18:49:08.47-0700 [APP/TASK/foo/0]OUT Successfully destroyed container

	Note
	Verify `job1` and `job2` operations embedded in `simple-batch-job` application are launched independently and they returned with the status `COMPLETED`.

	Note
	Unlike LRPs in Cloud Foundry, tasks are short-lived, so the logs aren’t always available. They are generated only when the Task application runs; at the end of Task operation, the container that ran the Task application is destroyed to free-up resources.

List Tasks in Cloud Foundry

$ cf apps
Getting apps in org pcfdev-org / space pcfdev-space as user...
OK

name              requested state   instances   memory   disk   urls
dataflow-server   started           1/1         768M     512M   dataflow-server.local.pcfdev.io
foo               stopped           0/1         1G       1G

Verify Task execution details

dataflow:>task execution list
╔══════════════════════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗
║        Task Name         │ID│         Start Time         │          End Time          │Exit Code║
╠══════════════════════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣
║foo                       │1 │Sun Aug 14 18:49:05 PDT 2016│Sun Aug 14 18:49:07 PDT 2016│0        ║
╚══════════════════════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝

Verify Job execution details

dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │         Start Time         │Step Execution Count │Definition Status ║
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║2  │1      │job2     │Sun Aug 14 18:49:07 PDT 2016│1                    │Destroyed         ║
║1  │1      │job1     │Sun Aug 14 18:49:06 PDT 2016│1                    │Destroyed         ║
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

4.1.3 Summary

In this sample, you have learned:

How to register and orchestrate Spring Batch jobs in Spring Cloud Data Flow
How to use the cf CLI in the context of Task applications orchestrated by Spring Cloud Data Flow
How to verify task executions and task repository

4.2 Batch File Ingest

In this demonstration, you will learn how to create a data processing application using Spring Batch which will then be run within Spring Cloud Data Flow.

4.2.1 Prerequisites

A Running Data Flow Server Follow the installation instructions to run Spring Cloud Data Flow on a local host.
A Running Data Flow Shell

The Spring Cloud Data Flow Shell is available for download or you can build it yourself.

	Note
	the Spring Cloud Data Flow Shell and Local server implementation are in the same repository and are both built by running `./mvnw install` from the project root directory. If you have already run the build, use the jar in `spring-cloud-dataflow-shell/target`

To run the Shell open a new terminal session:

$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR>
$ java -jar spring-cloud-dataflow-shell-<VERSION>.jar
  ____                              ____ _                __
 / ___| _ __  _ __(_)_ __   __ _   / ___| | ___  _   _  __| |
 \___ \| '_ \| '__| | '_ \ / _` | | |   | |/ _ \| | | |/ _` |
  ___) | |_) | |  | | | | | (_| | | |___| | (_) | |_| | (_| |
 |____/| .__/|_|  |_|_| |_|\__, |  \____|_|\___/ \__,_|\__,_|
  ____ |_|    _          __|___/                 __________
 |  _ \  __ _| |_ __ _  |  ___| | _____      __  \ \ \ \ \ \
 | | | |/ _` | __/ _` | | |_  | |/ _ \ \ /\ / /   \ \ \ \ \ \
 | |_| | (_| | || (_| | |  _| | | (_) \ V  V /    / / / / / /
 |____/ \__,_|\__\__,_| |_|   |_|\___/ \_/\_/    /_/_/_/_/_/


Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help".
dataflow:>

	Note
	The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the Data Flow Server’s REST API and supports a DSL that simplifies the process of defining a stream or task and managing its lifecycle. Most of these samples use the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard, (or wherever it the server is hosted) to perform equivalent operations.

4.2.2 Batch File Ingest Demo Overview

The source for the demo project is located in here. The sample is a Spring Boot application that demonstrates how to read data from a flat file, perform processing on the records, and store the transformed data into a database using Spring Batch.

The key classes for creating the batch job are:

BatchConfiguration.java - this is where we define our batch job, the step and components that are used read, process, and write our data. In the sample we use a FlatFileItemReader which reads a delimited file, a custom PersonItemProcessor to transform the data, and a JdbcBatchItemWriter to write our data to a database.
Person.java - the domain object representing the data we are reading and processing in our batch job. The sample data contains records made up of a persons first and last name.
PersonItemProcessor.java - this class is an ItemProcessor implementation which receives records after they have been read and before they are written. This allows us to transform the data between these two steps. In our sample ItemProcessor implementation, we simply transform the first and last name of each Person to uppercase characters.
Application.java - the main entry point into the Spring Boot application which is used to launch the batch job

Resource files are included to set up the database and provide sample data:

schema-all.sql - this is the database schema that will be created when the application starts up. In this sample, an in-memory database is created on start up and destroyed when the application exits.
data.csv - sample data file containing person records used in the demo

	Note
	This example expects to use the Spring Cloud Data Flow Server’s embedded H2 database. If you wish to use another repository, be sure to add the correct dependencies to the pom.xml and update the schema-all.sql.

4.2.3 Building and Running the Demo

Build the demo JAR
```
$ mvn clean package
```

dataflow:>app register --name fileIngest --type task --uri file:///path/to/target/ingest-X.X.X.jar
Successfully registered application 'task:fileIngest'
dataflow:>

Create the task

dataflow:>task create fileIngestTask --definition fileIngest
Created new task 'fileIngestTask'
dataflow:>

Launch the task

dataflow:>task launch fileIngestTask --arguments "localFilePath=classpath:data.csv"
Launched task 'fileIngestTask'
dataflow:>

Inspect logs

The log file path for the launched task can be found in the local server output, for example:

2017-10-27 14:58:18.112  INFO 19485 --- [nio-9393-exec-6] o.s.c.d.spi.local.LocalTaskLauncher      : launching task fileIngestTask-8932f73d-f17a-4bba-b44d-3fd9df042ac0
   Logs will be in /var/folders/6x/tgtx9xbn0x16xq2sx1j2rld80000gn/T/spring-cloud-dataflow-983191515779755562/fileIngestTask-1509130698071/fileIngestTask-8932f73d-f17a-4bba-b44d-3fd9df042ac0

Verify Task execution details

dataflow:>task execution list
╔══════════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗
║  Task Name   │ID│         Start Time         │          End Time          │Exit Code║
╠══════════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣
║fileIngestTask│1 │Fri Oct 27 14:58:20 EDT 2017│Fri Oct 27 14:58:20 EDT 2017│0        ║
╚══════════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝

Verify Job execution details

dataflow:>job execution list
╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗
║ID │Task ID│Job Name │         Start Time         │Step Execution Count │Definition Status ║
╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣
║1  │1      │ingestJob│Fri Oct 27 14:58:20 EDT 2017│1                    │Created           ║
╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝

4.2.4 Summary

In this sample, you have learned:

How to create a data processing batch job application
How to register and orchestrate Spring Batch jobs in Spring Cloud Data Flow
How to verify status via logs and shell commands

Prev		Next
3. Streaming	Home	5. Stream Launching Batch Job

	Note
	Unlike Streams, the Task definitions don’t require explicit deployment. They can be launched on-demand, scheduled, or triggered by streams.