The server application is run as a standalone application. All applications used for streams and tasks will be deployed on the YARN cluster that is targeted by the server.
These requirements are not something yarn runtime needs but generally what dataflow core needs.
Download the Spring Cloud Data Flow YARN distribution ZIP file which includes the Server and the Shell apps:
$ wget http://repo.spring.io/snapshot/org/springframework/cloud/dist/spring-cloud-dataflow-server-yarn-dist/1.0.0.BUILD-SNAPSHOT/spring-cloud-dataflow-server-yarn-dist-1.0.0.BUILD-SNAPSHOT.zip
Unzip the distribution ZIP file and change to the directory containing the deployment files.
$ cd spring-cloud-dataflow-server-yarn-1.0.0.BUILD-SNAPSHOT
Generic runtime settings can changed in config/servers.yml
.
Dedicated section Chapter 17, Configuring Runtime Settings and Environment contains detailed
information about configuration.
servers.yml
file is a central place to share common configuration as
it is added to Boot based jvm processes via option
-Dspring.config.location=servers.yml
.
If this is the first time deploying make sure the user that runs
the Server app has rights to create and write to /dataflow
directory in hdfs
. If there is an existing deployment on hdfs
remove it using:
$ hdfs dfs -rm -R /dataflow
Start the Spring Cloud Data Flow Server app for YARN
$ ./bin/dataflow-server-yarn
By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/stream-applications-rabbit-maven
YARN integration also allows you to store registered applications
directly in HDFS instead of relying on maven
or any other
resolution. Only thing to change during a registration is to use
hdfs
address as shown below.
dataflow:>app register --name ftp --type sink --uri hdfs:/dataflow/artifacts/repo/ftp-sink-kafka-1.0.0.RC1.jar
Create a stream:
dataflow:>stream create --name foostream --definition "time|log" --deploy
List streams:
dataflow:>stream list ╔═══════════╤═════════════════╤════════╗ ║Stream Name│Stream Definition│ Status ║ ╠═══════════╪═════════════════╪════════╣ ║foostream │time|log │deployed║ ╚═══════════╧═════════════════╧════════╝
After some time, destroy the stream:
dataflow:>stream destroy --name foostream
The YARN application is pushed and started automatically during a stream deployment process. Once all streams are destroyed the YARN application will exit.
Create and launch task:
dataflow:>task create --name footask --definition "timestamp" Created new task 'footask' dataflow:>task launch --name footask Launched task 'footask'
Overall app status can be seen from YARN Resource Manager UI or using Spring YARN CLI which gives more info about running containers within an app itself.
$ ./bin/dataflow-server-yarn-cli shell
When stream has been submitted YARN shows it as ACCEPTED
before its
turned to RUNNING
state.
$ submitted APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL ------------------------------ ------------ ----------------------- ------- -------- -------------- ---------- -------- ----------- --------------------- application_1461658614481_0001 jvalkealahti scdstream:app:foostream default DATAFLOW 26/04/16 16:27 N/A ACCEPTED UNDEFINED $ submitted APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL ------------------------------ ------------ ----------------------- ------- -------- -------------- ---------- ------- ----------- ------------------------- application_1461658614481_0001 jvalkealahti scdstream:app:foostream default DATAFLOW 26/04/16 16:27 N/A RUNNING UNDEFINED http://192.168.1.96:58580
More info about internals for stream apps can be queried by
clustersinfo
and clusterinfo
commands:
$ clustersinfo -a application_1461658614481_0001 CLUSTER ID -------------- foostream:log foostream:time $ clusterinfo -a application_1461658614481_0001 -c foostream:time CLUSTER STATE MEMBER COUNT ------------- ------------ RUNNING 1
After stream is undeployed YARN app should close itself automatically:
$ submitted -v APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL ------------------------------ ------------ ----------------------- ------- -------- -------------- -------------- -------- ----------- --------------------- application_1461658614481_0001 jvalkealahti scdstream:app:foostream default DATAFLOW 26/04/16 16:27 26/04/16 16:28 FINISHED SUCCEEDED
Launching a task will be shown in RUNNING
state while app is
executing its batch jobs:
$ submitted -v APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL ------------------------------ ------------ ----------------------- ------- -------- -------------- -------------- -------- ----------- ------------------------- application_1461658614481_0002 jvalkealahti scdtask:timestamp default DATAFLOW 26/04/16 16:29 N/A RUNNING UNDEFINED http://192.168.1.96:39561 application_1461658614481_0001 jvalkealahti scdstream:app:foostream default DATAFLOW 26/04/16 16:27 26/04/16 16:28 FINISHED SUCCEEDED $ submitted -v APPLICATION ID USER NAME QUEUE TYPE STARTTIME FINISHTIME STATE FINALSTATUS ORIGINAL TRACKING URL ------------------------------ ------------ ----------------------- ------- -------- -------------- -------------- -------- ----------- --------------------- application_1461658614481_0002 jvalkealahti scdtask:timestamp default DATAFLOW 26/04/16 16:29 26/04/16 16:29 FINISHED SUCCEEDED application_1461658614481_0001 jvalkealahti scdstream:app:foostream default DATAFLOW 26/04/16 16:27 26/04/16 16:28 FINISHED SUCCEEDED