1.1.0.M1
Copyright © 2013-2016 Pivotal Software, Inc.
Table of Contents
This project provides support for orchestrating the deployment of Spring Cloud Stream applications to Cloud Foundry.
Spring Cloud Data Flow is a cloud-native programming and operating model for composable data microservices on a structured platform. With Spring Cloud Data Flow, developers can create and orchestrate data pipelines for common use cases such as data ingest, real-time analytics, and data import/export.
The Spring Cloud Data Flow architecture consists of a server that deploys Streams. A future release will also support deploying Tasks. Streams are defined using a DSL or visually through the browser based designer UI. Streams are based on the Spring Cloud Stream programming model. The sections below describe more information about creating your own custom Streams.
For more details about the core architecture components and the supported features, please review Spring Cloud Data Flow’s core reference guide. There’re several samples available for reference.
Spring Cloud Stream is a framework for building message-driven microservice applications. Spring Cloud Stream builds upon Spring Boot to create standalone, production-grade Spring applications, and uses Spring Integration to provide connectivity to message brokers. It provides opinionated configuration of middleware from several vendors, introducing the concepts of persistent publish-subscribe semantics, consumer groups, and partitions.
For more details about the core framework components and the supported features, please review Spring Cloud Stream’s reference guide.
There’s a rich ecosystem of Spring Cloud Stream Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, we have generated RabbitMQ and Apache Kafka variants of these application-starters that are available for use from Maven Repo and Docker Hub as maven artifacts and docker images, respectively.
Do you have a requirement to develop custom applications? No problem. Refer to this guide to create custom stream applications. There’re several samples available for reference.
Spring Cloud Task makes it easy to create short-lived microservices. We provide capabilities that allow short-lived JVM processes to be executed on demand in a production environment.
For more details about the core framework components and the supported features, please review Spring Cloud Task’s reference guide.
There’s a rich ecosystem of Spring Cloud Task Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, the generated application-starters are available for use from Maven Repo. There are several samples available for reference.
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/$/spring-cloud-dataflow-docs/src/main/asciidoc/architecture.adoc[]
Spring Cloud Data Flow can be used to deploy modules in a Cloud Foundry environment. When doing so, the server application can either run itself on Cloud Foundry, or on another installation (e.g. a simple laptop).
The required configuration amounts to the same in either case, and is merely related to providing credentials to the Cloud Foundry instance so that the server can spawn applications itself. Any Spring Boot compatible configuration mechanism can be used (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, etc.), although some may prove more practicable than others when running on Cloud Foundry.
Note | |
---|---|
By default, the application registry in Spring Cloud Data Flow’s Cloud Foundry server is empty. It is intentionally designed to allow users to have the flexibility of choosing and registering applications, as they find appropriate for the given use-case requirement. Depending on the message-binder of choice, users can register between RabbitMQ or Apache Kafka based maven artifacts. |
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service rediscloud 30mb redis
A redis instance is required for analytics apps, and would typically be bound to such apps when you create an analytics stream using the per-app-binding feature.
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cloudamqp lemur rabbit
Rabbit is typically used as a messaging middleware between streaming apps and would be bound to each deployed app
thanks to the SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES
setting (see below).
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service p_mysql 100mb my_mysql
An RDBMS is used to persist Data Flow state, such as stream definitions and deployment ids. It can also be used for tasks to persist execution history.
wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.1.0.M1/spring-cloud-dataflow-server-cloudfoundry-1.1.0.M1.jar wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-shell/1.1.0.M1/spring-cloud-dataflow-shell-1.1.0.M1.jar
You can either deploy the server application on Cloud Foundry itself or on your local machine. The following two sections explain each way of running the server.
Push the server application on Cloud Foundry, configure it (see below) and start it.
Note | |
---|---|
You must use a unique name for your app; an app with the same name in the same organization will cause your deployment to fail |
cf push dataflow-server -m 1G --no-start -p spring-cloud-dataflow-server-cloudfoundry-1.1.0.M1.jar cf bind-service dataflow-server redis cf bind-service dataflow-server my_mysql
Important | |
---|---|
The recommended minimal memory setting for the server is 1G. Also, to obtain extra runtime information about which properties are available for apps, the server currently downloads those apps (typically Spring Boot uber-jars) in a local Maven repository. As such, you may want to increase the allocated disk size as well. |
Note | |
---|---|
If you are pushing to a space with multiple users, for example on PWS, there may already be a route taken for the
applicaiton name you have chosen. You can use the options |
Now we can configure the app. The following configuration is for Pivotal Web Services. You need to fill in {org}, {space}, {email} and {password} before running these commands.
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL https://api.run.pivotal.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG {org} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE {space} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN cfapps.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES rabbit cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME {email} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD {password} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION false
Warning | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Note | |
---|---|
If you are deploying in an environment that requires you to sign on using the Pivotal Single Sign-On Service, refer to the section Chapter 7, Authentication and Cloud Foundry for information on how to configure the server. |
Spring Cloud Data Flow server implementations (be it for Cloud Foundry, Mesos, YARN, or Kubernetes) do not have any default remote maven repository configured. This is intentionally designed to provide the flexibility for the users, so they can override and point to a remote repository of their choice. The out-of-the-box applications that are supported by Spring Cloud Data Flow are available in Spring’s repository, so if you want to use them, you must set it as the remote repository as listed below.
cf set-env dataflow-server MAVEN_REMOTE_REPOSITORIES_REPO1_URL https://repo.spring.io/libs-snapshot
where repo1
is an alias name for the remote repository.
You can also set other optional properties that alter the way Spring Cloud Data Flow will deploy stream and task apps:
The default memory and disk sizes for a deployed application can be configured. By default they are 1024 MB memory and 1024 MB disk. To change these, as an example to 512 and 2048 respectively, use
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_MEMORY 512 cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_DISK 2048
The default number of instances to deploy is set to 1, but can be overridden using
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_INSTANCES 1
You can set the buildpack that will be used to deploy each application. For example, to use the Java offline buildback, set the following environment variable
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK java_buildpack_offline
The health check mechanism used by Cloud Foundry to assert if apps are running can be customized. Current supported options
are port
(the default) and none
. Change the default like so:
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_HEALTH_CHECK none
Note | |
---|---|
These settings can be configured separately for stream and task apps. To alter settings for tasks, simply
substitute cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_MEMORY 512 |
Tip | |
---|---|
All the properties mentioned above are |
If you’d like to use config-server
to manage centralized configurations for all the applications orchestrated by
Spring Cloud Data Flow, you can set it up like the following.
cf set-env dataflow-server SPRING_APPLICATION_JSON '{"spring.cloud.dataflow.applicationProperties.stream.spring.cloud.config.uri": "http://<CONFIG_SERVER_URI>"}'
We are now ready to start the app.
cf start dataflow-server
Alternatively, you can run the Admin application locally on your machine which is described in the next section.
To run the server application locally, targeting your Cloud Foundry installation, you you need to configure the application either by passing in command line arguments (see below) or setting a number of environment variables.
To use environment variables set the following:
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL=https://api.run.pivotal.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG={org} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE={space} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN=cfapps.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME={email} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD={password} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION=false export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES=rabbit
You need to fill in {org}, {space}, {email} and {password} before running these commands.
Warning | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Now we are ready to start the server application:
java -jar spring-cloud-dataflow-server-cloudfoundry-1.1.0.M1.jar [--option1=value1] [--option2=value2] [etc.]
Tip | |
---|---|
Of course, all other parameterization options that were available when running the server on Cloud Foundry are
still available. This is particularly true for configuring defaults for applications. Just
substitute |
Tasks are enabled as an experimental feature in Spring Cloud Data Flow Cloud Foundry server. To enable running tasks, you can set the environment variable
export SPRING_CLOUD_DATAFLOW_FEATURES_EXPERIMENTAL_TASKSENABLED=true
or, as a command line argument when starting the data flow server --spring.cloud.dataflow.features.experimental.tasksEnabled=true
Run the shell and optionally target the Admin application if not running on the same host (will typically be the case if deployed on Cloud Foundry as explained here)
$ java -jar spring-cloud-dataflow-shell-1.1.0.M1.jar
server-unknown:>dataflow config server http://dataflow-server.cfapps.io Successfully targeted http://dataflow-server.cfapps.io dataflow:>
By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/stream-applications-rabbit-maven
You can now use the shell commands to list available applications (source/processors/sink) and create streams. For example:
dataflow:> stream create --name httptest --definition "http | log" --deploy
Note | |
---|---|
You will need to wait a little while until the apps are actually deployed successfully before posting data. Tail the log file for each application to verify the application has started. |
Now post some data. The URL will be unique to your deployment, the following is just an example
dataflow:> http post --target http://dataflow-nonconcentrative-knar-httptest-http.cfapps.io --data "hello world"
Look to see if hello world
ended up in log files for the log
application.
By default, the Data Flow server is unsecured and runs on an unencrypted HTTP connection. You can secure your REST endpoints,
as well as the Data Flow Dashboard by enabling HTTPS and requiring clients to authenticate. More details about securing the
REST endpoints and configuring to authenticate against an OAUTH backend (i.e: UAA/SSO running on Cloud Foundry), please
review the security section from the core reference guide. The security configurations can be configured in dataflow-server.yml
or passed as environment variables through cf set-env
commands.
To help avoid clashes with routes across spaces in Cloud Foundry, a naming strategy to provide a random prefix to a
deployed application is available and is enabled by default. The default configurations
are overridable and the respective properties can be set via cf set-env
commands.
For instance, if you’d like to disable the randmoization, you can override it through:
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_ENABLE_RANDOM_APP_NAME_PREFIX false
When deploying Spring Cloud Data Flow to Cloud Foundry, you can take advantage of the Spring Cloud Single Sign-On Connector, which provides Cloud Foundry specific auto-configuration support for OAuth 2.0, when used in conjunction with the Pivotal Single Sign-On Service.
Simply set security.basic.enabled
to true
and in Cloud Foundry bind the SSO
service to your Data Flow Server app and SSO will be enabled.
The following pieces of configuration must be provided. These are Spring Boot @ConfigurationProperties
so you can set
them as environment variables or by any other means that Spring Boot supports. Here is a listing in environment
variable format as that is an easy way to get started configuring Boot applications in Cloud Foundry.
# Default values cited after the equal sign. # Example values, typical for Pivotal Web Services, cited as a comment # url of the CF API (used when using cf login -a for example), e.g. https://api.run.pivotal.io # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL) spring.cloud.deployer.cloudfoundry.url= # name of the organization that owns the space above, e.g. youruser-org # (For Setting Env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG) spring.cloud.deployer.cloudfoundry.org= # name of the space into which modules will be deployed, e.g. development # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE) spring.cloud.deployer.cloudfoundry.space= # the root domain to use when mapping routes, e.g. cfapps.io # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN) spring.cloud.deployer.cloudfoundry.domain= # username and password of the user to use to create apps # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME and SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD) spring.cloud.deployer.cloudfoundry.username= spring.cloud.deployer.cloudfoundry.password= # Whether to allow self-signed certificates during SSL validation # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION) spring.cloud.deployer.cloudfoundry.skipSslValidation=false # Comma separated set of service instance names to bind to every stream app deployed. # Amongst other things, this should include a service that will be used # for Spring Cloud Stream binding, e.g. rabbit # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES) spring.cloud.deployer.cloudfoundry.stream.services= # Health check type to use for stream apps. Accepts 'none' and 'port' spring.cloud.deployer.cloudfoundry.stream.health-check= # Comma separated set of service instance names to bind to every task app deployed. # Amongst other things, this should include an RDBMS service that will be used # for Spring Cloud Task execution reporting, e.g. my_mysql # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES) spring.cloud.deployer.cloudfoundry.task.services= # Timeout to use, in seconds, when doing task related deployments. # (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_TASK_TIMEOUT) spring.cloud.deployer.cloudfoundry.task.taskTimeout=360
Note that you can set the following properties spring.cloud.deployer.cloudfoundry.services
,
spring.cloud.deployer.cloudfoundry.memory
, and spring.cloud.deployer.cloudfoundry.disk
as part of an individual
deployment request prefixed by the app.<name of application>
. For example
>stream create --name ticktock --definition "time | log" >stream deploy --name ticktock --properties "app.time.spring.cloud.deployer.cloudfoundry.memory=2048"
will deploy the time source with 2048MB of memory, while the log sink will use the default 1024MB.
If using Spring Cloud Config Server as a Cloud Foundry service, the easiest way to externalize the above configuration
and consume it from the Data Flow server is to use the spring-cloud-services-starter-config-client
dependency.
As this support is specific to Pivotal Cloud Foundry, it is not included by default. But building a Data Flow server
that embeds it is as simple as adding a dependency similar to
<dependency> <groupId>io.pivotal.spring.cloud</groupId> <artifactId>spring-cloud-services-starter-config-client</artifactId> <version>1.1.0.RELEASE</version> <scope>runtime</scope> </dependency>
to your own version of the Data Flow server, and building it yourself. Then follow the documentation for Config Server for Pivotal Cloud Foundry. For more details, please refer to Spring Cloud Services client-dependencies documentation.
When deploying streams in Cloud Foundry, you can take advantage of application specific service bindings, so not all services are globally configured for all the apps orchestrated by Spring Cloud Data Flow.
For instance, if you’d like to provide mysql
service binding only for the jdbc
application in the following stream
definition, you can pass the service binding as a deployment property.
dataflow:>stream create --name httptojdbc --definition "http | jdbc" dataflow:>stream deploy --name httptojdbc --properties "app.jdbc.spring.cloud.deployer.cloudfoundry.services=mysqlService"
Where, mysqlService
is the name of the service specifically only bound to jdbc
application and the http
application wouldn’t get the binding by this method. If you have more than one service to bind, they can be passed as comma separated items (eg: app.jdbc.spring.cloud.deployer.cloudfoundry.services=mysqlService,someService).
In addition to marketplace services, Cloud Foundry supports User Provided Services. Throughout this reference manual, regular services have been mentioned, but there is nothing precluding the use of UPSs as well, whether for use as the messaging middleware (e.g. if you’d like to use an external Apache Kafka installation) or for ad hoc usage by some of the stream apps (e.g. an Oracle Database).
Similar to Cloud Foundry’s blue-green deployments, you can perform rolling upgrades on the applications orchestrated by Spring Cloud Data Flow.
Let’s start with the following simple stream definition.
dataflow:>stream create --name foo --definition "time | log" --deploy
List Apps.
→ cf apps Getting apps in org test-org / space development as test@pivotal.io... OK name requested state instances memory disk urls foo-log started 1/1 1G 1G foo-log.cfapps.io foo-time started 1/1 1G 1G foo-time.cfapps.io
Let’s assume you’ve to make an enhancement to update the "logger" to append extra text in every log statement.
Log Sink
application starter with "Rabbit binder starter" from start-scs.cfapps.io/LogSinkConfiguration.class
loggingHandler.setLoggerName("TEST [" + this.properties.getName() + "]");
@SpringBootApplication @Import(LogSinkConfiguration.class) public class DemoApplication { @Autowired private LogSinkProperties properties; public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); } @Bean @ServiceActivator(inputChannel = Sink.INPUT) public LoggingHandler logSinkHandler() { LoggingHandler loggingHandler = new LoggingHandler(this.properties.getLevel().name()); loggingHandler.setExpression(this.properties.getExpression()); loggingHandler.setLoggerName("TEST [" + this.properties.getName() + "]"); return loggingHandler; } }
Let’s deploy the locally built application to Cloud Foundry
→ cf push foo-log-v2 -p demo-0.0.1-SNAPSHOT.jar -n foo-log-v2 --no-start
List Apps.
→ cf apps Getting apps in org test-org / space development as test@pivotal.io... OK name requested state instances memory disk urls foo-log started 1/1 1G 1G foo-log.cfapps.io foo-time started 1/1 1G 1G foo-time.cfapps.io foo-log-v2 stopped 1/1 1G 1G foo-log-v2.cfapps.io
The stream applications do not communicate via (Go)Router, so they aren’t generating HTTP traffic. Instead, they
communicate via the underlying messaging middleware such as Kafka or RabbitMQ. In order to rolling upgrade to route the
payload from old to the new version of the application, you’d have to replicate the SPRING_APPLICATION_JSON
environment
variable from the old application that includes spring.cloud.stream.bindings.input.destination
and spring.cloud.stream.bindings.input.group
credentials.
Note | |
---|---|
You can find the |
cf set-env foo-log-v2 SPRING_APPLICATION_JSON '{"spring.cloud.stream.bindings.input.destination":"foo.time","spring.cloud.stream.bindings.input.group":"foo"}'
Let’s start foo-log-v2
application.
cf start foo-log-v2
As soon as the application bootstraps, you’d now notice the payload being load balanced between two log application instances running on Cloud Foundry. Since they both share the same "destination" and "consumer group", they are now acting as competing consumers.
Old App Logs:
2016-08-08T17:11:08.94-0700 [APP/0] OUT 2016-08-09 00:11:08.942 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:08 2016-08-08T17:11:10.95-0700 [APP/0] OUT 2016-08-09 00:11:10.954 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:10 2016-08-08T17:11:12.94-0700 [APP/0] OUT 2016-08-09 00:11:12.944 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:12
New App Logs:
2016-08-08T17:11:07.94-0700 [APP/0] OUT 2016-08-09 00:11:07.945 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:07] 2016-08-08T17:11:09.92-0700 [APP/0] OUT 2016-08-09 00:11:09.925 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:09] 2016-08-08T17:11:11.94-0700 [APP/0] OUT 2016-08-09 00:11:11.941 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:11]
Deleting the old version foo-log
from the CF CLI would make all the payload consumed by the foo-log-v2
application. Now,
you’ve successfully upgraded an application in the streaming pipeline without bringing it down in entirety to do
an adjustment in it.
List Apps.
→ cf apps Getting apps in org test-org / space development as test@pivotal.io... OK name requested state instances memory disk urls foo-time started 1/1 1G 1G foo-time.cfapps.io foo-log-v2 started 1/1 1G 1G foo-log-v2.cfapps.io
Note | |
---|---|
A comprehensive canary analysis along with rolling upgrades will be supported via Spinnaker in future releases. |
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/$/spring-cloud-dataflow-docs/src/main/asciidoc/streams.adoc[] Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/$/spring-cloud-dataflow-docs/src/main/asciidoc/tasks.adoc[]
Spring Cloud Data Flow’s task functionality exposes new experimental capabilities within the Pivotal Cloud Foundry runtime. It’s important to note that the current underlying PCF capabilities are considered experimental, and therefore this functionality within Spring Cloud Data Flow is also considered experimental.
The task functionality depends on the latest versions of PCF for runtime support. This release requires PCF version 1.7.12 or higher to run tasks.
Because the task functionality is currently considered experimental within PCF, the tooling around it within the CF ecosystem is not complete. In order to interact with tasks via the PCF command line interface (CLI), you need to install a plugin: v3-cli-plugin. It’s important to note that this plugin is only compatible with the PCF CLI version 6.17.0+5d0be0a-2016-04-15. You can read more about the functionality the plugin provides in its README.
It’s also important to note that there is no Apps Manager support for tasks as of this release. When running applications as tasks through Spring Cloud Data Flow, the only way to view them within the context of CF is via the plugin mentioned above.
Running a task application within Spring Cloud Data Flow goes through a slightly different lifecycle than running a stream application. Both types of applications need to be registered with the appropriate artifact coordinates. Both need a definition created via the SCDF DSL. However, that’s where the similarities end.
With stream based applications, you "deploy" them with the intent that they run until they are undeployed. A stream definition is only deployed once (it can be scaled, but only deployed as one instance of the stream as a whole). However, tasks are launched. A single task definition can be launched many times. With each launch, they will start, execute, and shut down with PCF cleaning up the resources once the shutdown has occurred. The following sections outline the process of creating, launching, destroying, and viewing tasks.
Similar to streams, creating a task application is done via the SCDF DSL or through the dashboard. To create a task definition in SCDF, you’ve to either develop a task application or use one of the out-of-the-box task app-starters. The maven coordinates of the task application should be registered in SCDF. For more details on how to register task applications, review register task applications section from the core docs.
Let’s see an example that uses the out-of-the-box timestamp
task application.
dataflow:>task create --name foo --definition "timestamp" Created new task 'foo'
Note | |
---|---|
Tasks in SCDF do not require explicit deployment. They are required to be launched and with that there are different ways to launch them - refer to this section for more details. |
Unlike streams, tasks in SCDF requires an explicit launch trigger or it can be manually kicked-off.
dataflow:>task launch foo Launched task 'foo'
As previously mentioned, the v3-cli-plugin is the way to interact with tasks on PCF,
including viewing the logs. In order to view the logs as a task is executing use the
following command where foo
is the name of the task you are executing:
cf v3-logs foo Tailing logs for app foo... .... .... .... .... 2016-08-19T09:44:49.11-0700 [APP/TASK/bar1/0]OUT 2016-08-19 16:44:49.111 INFO 7 --- [ main] o.s.c.t.a.t.TimestampTaskApplication : Started TimestampTaskApplication in 2.734 seconds (JVM running for 3.288) 2016-08-19T09:44:49.13-0700 [APP/TASK/bar1/0]OUT Exit status 0 2016-08-19T09:44:49.19-0700 [APP/TASK/bar1/0]OUT Destroying container 2016-08-19T09:44:50.41-0700 [APP/TASK/bar1/0]OUT Successfully destroyed container
Note | |
---|---|
Logs are only viewable through the v3-cli-plugin as the app is running. Historic logs are not available. |
Listing tasks is as simple as:
dataflow:>task list ╔══════════════════════╤═════════════════════════╤═══════════╗ ║ Task Name │ Task Definition │Task Status║ ╠══════════════════════╪═════════════════════════╪═══════════╣ ║foo │timestamp │complete ║ ╚══════════════════════╧═════════════════════════╧═══════════╝
If you’d like to view the execution details of the launched task, you could do the following.
dataflow:>task execution list ╔════════════════════════╤══╤═════════════════════════╤═════════════════════════╤════════╗ ║ Task Name │ID│ Start Time │ End Time │ Exit ║ ║ │ │ │ │ Code ║ ╠════════════════════════╪══╪═════════════════════════╪═════════════════════════╪════════╣ ║foo:cloud: │1 │ Fri Aug 19 09:44:49 PDT │Fri Aug 19 09:44:49 PDT │0 ║ ╚════════════════════════╧══╧═════════════════════════╧═════════════════════════╧════════╝
Destroying the task application from SCDF removes the task definition from task repository.
dataflow:>task destroy foo Destroyed task 'foo' dataflow:>task list ╔═════════╤═══════════════╤═══════════╗ ║Task Name│Task Definition│Task Status║ ╚═════════╧═══════════════╧═══════════╝
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/$/spring-cloud-dataflow-docs/src/main/asciidoc/dashboard.adoc[] Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/$/spring-cloud-dataflow-docs/src/main/asciidoc/howto.adoc[]