In this demonstration, you will learn how to orchestrate short-lived data processing application (eg: Spring Batch Jobs) using Spring Cloud Task and Spring Cloud Data Flow on Cloud Foundry.
The Spring Cloud Data Flow Shell is available for download or you can build it yourself.
Note | |
---|---|
the Spring Cloud Data Flow Shell and Local server implementation are in the same repository and are both built by running |
To run the Shell open a new terminal session:
$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR> $ java -jar spring-cloud-dataflow-shell-<VERSION>.jar ____ ____ _ __ / ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| | \___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` | ___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| | |____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_| ____ |_| _ __|___/ __________ | _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \ | | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \ | |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / / |____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/ Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help". dataflow:>
Note | |
---|---|
The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the Data Flow Server’s REST API and supports a DSL that simplifies the process of defining a stream or task and managing its lifecycle. Most of these samples use the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard, (or wherever it the server is hosted) to perform equivalent operations. |
Note | |
---|---|
PCF 1.7.12 or greater is required to run Tasks on Spring Cloud Data Flow. As of this writing, PCFDev and PWS supports builds upon this version. |
Task support needs to be enabled on pcf-dev. Being logged as admin
, issue the following command:
cf enable-feature-flag task_creation Setting status of task_creation as admin... OK Feature task_creation Enabled.
Note | |
---|---|
For this sample, all you need is the |
Note | |
---|---|
All the apps deployed to PCFDev start with low memory by default. It is recommended to change it to at least 768MB for |
Tasks in Spring Cloud Data Flow require an RDBMS to host "task repository" (see here for more details), so let’s instruct the Spring Cloud Data Flow server to bind the mysql
service to each deployed task:
$ cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES mysql $ cf restage dataflow-server
Note | |
---|---|
We only need |
As a recap, here is what you should see as configuration for the Spring Cloud Data Flow server:
cf env dataflow-server .... User-Provided: SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_MEMORY: 512 SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: pass SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: false SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: mysql SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: user No running env variables have been set No staging env variables have been set
dataflow-server
application is started and ready for interaction via dataflow-server.local.pcfdev.io
endpointBuild and register the batch-job example from Spring Cloud Task samples. For convenience, the final uber-jar artifact is provided with this sample.
dataflow:>app register --type task --name simple_batch_job --uri https://github.com/spring-cloud/spring-cloud-dataflow-samples/raw/master/src/main/asciidoc/tasks/simple-batch-job/batch-job-1.3.0.BUILD-SNAPSHOT.jar
Create the task with simple-batch-job
application
dataflow:>task create foo --definition "simple_batch_job"
Note | |
---|---|
Unlike Streams, the Task definitions don’t require explicit deployment. They can be launched on-demand, scheduled, or triggered by streams. |
Verify there’s still no Task applications running on PCFDev - they are listed only after the initial launch/staging attempt on PCF
$ cf apps Getting apps in org pcfdev-org / space pcfdev-space as user... OK name requested state instances memory disk urls dataflow-server started 1/1 768M 512M dataflow-server.local.pcfdev.io
Let’s launch foo
dataflow:>task launch foo
Verify the execution of foo
by tailing the logs
$ cf logs foo Retrieving logs for app foo in org pcfdev-org / space pcfdev-space as user... 2016-08-14T18:48:54.22-0700 [APP/TASK/foo/0]OUT Creating container 2016-08-14T18:48:55.47-0700 [APP/TASK/foo/0]OUT 2016-08-14T18:49:06.59-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.598 INFO 14 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job1]] launched with the following parameters: [{}] ... ... 2016-08-14T18:49:06.78-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:06.785 INFO 14 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job1]] completed with the following parameters: [{}] and the following status: [COMPLETED] ... ... 2016-08-14T18:49:07.36-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.363 INFO 14 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job2]] launched with the following parameters: [{}] ... ... 2016-08-14T18:49:07.53-0700 [APP/TASK/foo/0]OUT 2016-08-15 01:49:07.536 INFO 14 --- [ main] o.s.b.c.l.support.SimpleJobLauncher : Job: [SimpleJob: [name=job2]] completed with the following parameters: [{}] and the following status: [COMPLETED] ... ... 2016-08-14T18:49:07.71-0700 [APP/TASK/foo/0]OUT Exit status 0 2016-08-14T18:49:07.78-0700 [APP/TASK/foo/0]OUT Destroying container 2016-08-14T18:49:08.47-0700 [APP/TASK/foo/0]OUT Successfully destroyed container
Note | |
---|---|
Verify |
Note | |
---|---|
Unlike LRPs in Cloud Foundry, tasks are short-lived, so the logs aren’t always available. They are generated only when the Task application runs; at the end of Task operation, the container that ran the Task application is destroyed to free-up resources. |
List Tasks in Cloud Foundry
$ cf apps Getting apps in org pcfdev-org / space pcfdev-space as user... OK name requested state instances memory disk urls dataflow-server started 1/1 768M 512M dataflow-server.local.pcfdev.io foo stopped 0/1 1G 1G
Verify Task execution details
dataflow:>task execution list ╔══════════════════════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗ ║ Task Name │ID│ Start Time │ End Time │Exit Code║ ╠══════════════════════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣ ║foo │1 │Sun Aug 14 18:49:05 PDT 2016│Sun Aug 14 18:49:07 PDT 2016│0 ║ ╚══════════════════════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝
Verify Job execution details
dataflow:>job execution list ╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗ ║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Status ║ ╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣ ║2 │1 │job2 │Sun Aug 14 18:49:07 PDT 2016│1 │Destroyed ║ ║1 │1 │job1 │Sun Aug 14 18:49:06 PDT 2016│1 │Destroyed ║ ╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝
In this demonstration, you will learn how to create a data processing application using Spring Batch which will then be run within Spring Cloud Data Flow.
The Spring Cloud Data Flow Shell is available for download or you can build it yourself.
Note | |
---|---|
the Spring Cloud Data Flow Shell and Local server implementation are in the same repository and are both built by running |
To run the Shell open a new terminal session:
$ cd <PATH/TO/SPRING-CLOUD-DATAFLOW-SHELL-JAR> $ java -jar spring-cloud-dataflow-shell-<VERSION>.jar ____ ____ _ __ / ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| | \___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` | ___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| | |____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_| ____ |_| _ __|___/ __________ | _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \ | | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \ | |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / / |____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/ Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help". dataflow:>
Note | |
---|---|
The Spring Cloud Data Flow Shell is a Spring Boot application that connects to the Data Flow Server’s REST API and supports a DSL that simplifies the process of defining a stream or task and managing its lifecycle. Most of these samples use the shell. If you prefer, you can use the Data Flow UI localhost:9393/dashboard, (or wherever it the server is hosted) to perform equivalent operations. |
The source for the demo project is located in here. The sample is a Spring Boot application that demonstrates how to read data from a flat file, perform processing on the records, and store the transformed data into a database using Spring Batch.
The key classes for creating the batch job are:
BatchConfiguration.java
- this is where we define our batch job, the step and components that are used read, process, and write our data. In the sample we use a FlatFileItemReader
which reads a delimited file, a custom PersonItemProcessor
to transform the data, and a JdbcBatchItemWriter
to write our data to a database.Person.java
- the domain object representing the data we are reading and processing in our batch job. The sample data contains records made up of a persons first and last name.PersonItemProcessor.java
- this class is an ItemProcessor
implementation which receives records after they have been read and before they are written. This allows us to transform the data between these two steps. In our sample ItemProcessor
implementation, we simply transform the first and last name of each Person
to uppercase characters.Application.java
- the main entry point into the Spring Boot application which is used to launch the batch jobResource files are included to set up the database and provide sample data:
schema-all.sql
- this is the database schema that will be created when the application starts up. In this sample, an in-memory database is created on start up and destroyed when the application exits.data.csv
- sample data file containing person records used in the demoNote | |
---|---|
This example expects to use the Spring Cloud Data Flow Server’s embedded H2 database. If you wish to use another repository, be sure to add the correct dependencies to the pom.xml and update the schema-all.sql. |
Build the demo JAR
$ mvn clean package
Register the task
dataflow:>app register --name fileIngest --type task --uri file:///path/to/target/ingest-X.X.X.jar Successfully registered application 'task:fileIngest' dataflow:>
Create the task
dataflow:>task create fileIngestTask --definition fileIngest Created new task 'fileIngestTask' dataflow:>
Launch the task
dataflow:>task launch fileIngestTask --arguments "localFilePath=classpath:data.csv" Launched task 'fileIngestTask' dataflow:>
Inspect logs
The log file path for the launched task can be found in the local server output, for example:
2017-10-27 14:58:18.112 INFO 19485 --- [nio-9393-exec-6] o.s.c.d.spi.local.LocalTaskLauncher : launching task fileIngestTask-8932f73d-f17a-4bba-b44d-3fd9df042ac0 Logs will be in /var/folders/6x/tgtx9xbn0x16xq2sx1j2rld80000gn/T/spring-cloud-dataflow-983191515779755562/fileIngestTask-1509130698071/fileIngestTask-8932f73d-f17a-4bba-b44d-3fd9df042ac0
Verify Task execution details
dataflow:>task execution list ╔══════════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗ ║ Task Name │ID│ Start Time │ End Time │Exit Code║ ╠══════════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣ ║fileIngestTask│1 │Fri Oct 27 14:58:20 EDT 2017│Fri Oct 27 14:58:20 EDT 2017│0 ║ ╚══════════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝
Verify Job execution details
dataflow:>job execution list ╔═══╤═══════╤═════════╤════════════════════════════╤═════════════════════╤══════════════════╗ ║ID │Task ID│Job Name │ Start Time │Step Execution Count │Definition Status ║ ╠═══╪═══════╪═════════╪════════════════════════════╪═════════════════════╪══════════════════╣ ║1 │1 │ingestJob│Fri Oct 27 14:58:20 EDT 2017│1 │Created ║ ╚═══╧═══════╧═════════╧════════════════════════════╧═════════════════════╧══════════════════╝