Spring Cloud Data Flow for Cloud Foundry
1. Spring Cloud Data Flow
Spring Cloud Data Flow is a cloud-native programming and operating model for composable data microservices on a structured platform. With Spring Cloud Data Flow, developers can create and orchestrate data pipelines for common use cases such as data ingest, real-time analytics, and data import/export.
The Spring Cloud Data Flow architecture consists of a server that deploys Streams. A future release will also support deploying Tasks. Streams are defined using a DSL or visually through the browser based designer UI. Streams are based on the Spring Cloud Stream programming model. The sections below describe more information about creating your own custom Streams.
For more details about the core architecture components and the supported features, please review Spring Cloud Data Flow’s core reference guide. There’re several samples available for reference.
2. Spring Cloud Stream
Spring Cloud Stream is a framework for building message-driven microservice applications. Spring Cloud Stream builds upon Spring Boot to create standalone, production-grade Spring applications, and uses Spring Integration to provide connectivity to message brokers. It provides opinionated configuration of middleware from several vendors, introducing the concepts of persistent publish-subscribe semantics, consumer groups, and partitions.
For more details about the core framework components and the supported features, please review Spring Cloud Stream’s reference guide.
There’s a rich ecosystem of Spring Cloud Stream Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, we have generated RabbitMQ and Apache Kafka variants of these application-starters that are available for use from Maven Repo and Docker Hub as maven artifacts and docker images, respectively.
Do you have a requirement to develop custom applications? No problem. Refer to this guide to create custom stream applications. There’re several samples available for reference.
3. Spring Cloud Task
Spring Cloud Task makes it easy to create short-lived microservices. We provide capabilities that allow short-lived JVM processes to be executed on demand in a production environment.
For more details about the core framework components and the supported features, please review Spring Cloud Task’s reference guide.
There’s a rich ecosystem of Spring Cloud Task Application-Starters that can be used either as standalone data microservice applications or in Spring Cloud Data Flow. For convenience, the generated application-starters are available for use from Maven Repo. There are several samples available for reference.
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/architecture.adoc[]
Getting started
4. Deploying on Cloud Foundry
Spring Cloud Data Flow can be used to deploy modules in a Cloud Foundry environment. When doing so, the server application can either run itself on Cloud Foundry, or on another installation (e.g. a simple laptop).
The required configuration amounts to the same in either case, and is merely related to providing credentials to the Cloud Foundry instance so that the server can spawn applications itself. Any Spring Boot compatible configuration mechanism can be used (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, etc.), although some may prove more practicable than others when running on Cloud Foundry.
By default, the application registry in Spring Cloud Data Flow’s Cloud Foundry server is empty. It is intentionally designed to allow users to have the flexibility of choosing and registering applications, as they find appropriate for the given use-case requirement. Depending on the message-binder of choice, users can register between RabbitMQ or Apache Kafka based maven artifacts. |
4.1. Provision a Redis service instance on Cloud Foundry
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service rediscloud 30mb redis
A redis instance is required for analytics apps, and would typically be bound to such apps when you create an analytics stream using the per-app-binding feature.
4.2. Provision a Rabbit service instance on Cloud Foundry
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cloudamqp lemur rabbit
Rabbit is typically used as a messaging middleware between streaming apps and would be bound to each deployed app
thanks to the SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES
setting (see below).
4.3. Provision a MySQL service instance on Cloud Foundry
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cleardb spark my_mysql
An RDBMS is used to persist Data Flow state, such as stream definitions and deployment ids. It can also be used for tasks to persist execution history.
4.4. Download the Spring Cloud Data Flow Server and Shell apps
wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.1.0.RELEASE/spring-cloud-dataflow-server-cloudfoundry-1.1.0.RELEASE.jar
wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.1.2.RELEASE/spring-cloud-dataflow-shell-1.1.2.RELEASE.jar
4.5. Running the Server
You can either deploy the server application on Cloud Foundry itself or on your local machine. The following two sections explain each way of running the server.
4.5.1. Deploying and Running the Server app on Cloud Foundry
Push the server application on Cloud Foundry, configure it (see below) and start it.
You must use a unique name for your app; an app with the same name in the same organization will cause your deployment to fail |
cf push dataflow-server -m 2G -k 2G --no-start -p spring-cloud-dataflow-server-cloudfoundry-1.1.0.RELEASE.jar
cf bind-service dataflow-server redis
cf bind-service dataflow-server my_mysql
The recommended minimal memory setting for the server is 2G. Also, to push apps to PCF and obtain application property metadata, the server downloads applications to Maven repository hosted on the local disk. While you can specify up to 2G as a typical maximum value for disk space on a PCF installation, this can be increased to 10G. Read the maximum disk quota section for information on how to configure this PCF property. Also, the Data Flow server itself implements a Last Recently Used algorithm to free disk space when it falls below a low water mark value. |
If you are pushing to a space with multiple users, for example on PWS, there may already be a route taken for the
applicaiton name you have chosen. You can use the options --random-route to avoid this when pushing the app.
|
Now we can configure the app. The following configuration is for Pivotal Web Services. You need to fill in {org}, {space}, {email} and {password} before running these commands.
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL https://api.run.pivotal.io
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG {org}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE {space}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN cfapps.io
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES rabbit
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES my_mysql
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME {email}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD {password}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION false
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
If you are deploying in an environment that requires you to sign on using the Pivotal Single Sign-On Service, refer to the section Authentication and Cloud Foundry for information on how to configure the server. |
Spring Cloud Data Flow server implementations (be it for Cloud Foundry, Mesos, YARN, or Kubernetes) do not have any default remote maven repository configured. This is intentionally designed to provide the flexibility for the users, so they can override and point to a remote repository of their choice. The out-of-the-box applications that are supported by Spring Cloud Data Flow are available in Spring’s repository, so if you want to use them, you must set it as the remote repository as listed below.
cf set-env dataflow-server MAVEN_REMOTE_REPOSITORIES_REPO1_URL https://repo.spring.io/libs-snapshot
where repo1
is an alias name for the remote repository.
Configuring Defaults for Deployed Apps
You can also set other optional properties that alter the way Spring Cloud Data Flow will deploy stream and task apps:
-
The default memory and disk sizes for a deployed application can be configured. By default they are 1024 MB memory and 1024 MB disk. To change these, as an example to 512 and 2048 respectively, use
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_MEMORY 512 cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_DISK 2048
-
The default number of instances to deploy is set to 1, but can be overridden using
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_INSTANCES 1
-
You can set the buildpack that will be used to deploy each application. For example, to use the Java offline buildback, set the following environment variable
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK java_buildpack_offline
-
The health check mechanism used by Cloud Foundry to assert if apps are running can be customized. Current supported options are
port
(the default) andnone
. Change the default like so:cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_HEALTH_CHECK none
These settings can be configured separately for stream and task apps. To alter settings for tasks, simply
substitute
|
All the properties mentioned above are @ConfigurationProperties of the
Cloud Foundry deployer. See CloudFoundryDeploymentProperties.java for more information.
|
-
If you’d like to use
config-server
to manage centralized configurations for all the applications orchestrated by Spring Cloud Data Flow, you can set it up like the following.cf set-env dataflow-server SPRING_APPLICATION_JSON '{"spring.cloud.dataflow.applicationProperties.stream.spring.cloud.config.uri": "http://<CONFIG_SERVER_URI>"}'
We are now ready to start the app.
cf start dataflow-server
Alternatively, you can run the Admin application locally on your machine which is described in the next section.
4.5.2. Running the Server app locally
To run the server application locally, targeting your Cloud Foundry installation, you you need to configure the application either by passing in command line arguments (see below) or setting a number of environment variables.
To use environment variables set the following:
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL=https://api.run.pivotal.io
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG={org}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE={space}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN=cfapps.io
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME={email}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD={password}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION=false
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES=rabbit
# The following is for letting task apps write to their db.
# Note however that when the *server* is running locally, it can't access that db
# task related commands that show executions won't work then
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES=my_mysql
You need to fill in {org}, {space}, {email} and {password} before running these commands.
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Now we are ready to start the server application:
java -jar spring-cloud-dataflow-server-cloudfoundry-1.1.0.RELEASE.jar [--option1=value1] [--option2=value2] [etc.]
Of course, all other parameterization options that were available when running the server on Cloud Foundry are
still available. This is particularly true for configuring defaults for applications. Just
substitute cf set-env syntax with export .
|
The current underlying PCF task capabilities are considered experimental for PCF version versions less than 1.9. See Feature Togglers for how to disable task support in Data Flow. |
4.6. Running Spring Cloud Data Flow Shell locally
Run the shell and optionally target the Admin application if not running on the same host (will typically be the case if deployed on Cloud Foundry as explained here)
$ java -jar spring-cloud-dataflow-shell-1.1.2.RELEASE.jar
server-unknown:>dataflow config server http://dataflow-server.cfapps.io
Successfully targeted http://dataflow-server.cfapps.io
dataflow:>
By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/Avogadro-SR1-stream-applications-rabbit-maven
You can now use the shell commands to list available applications (source/processors/sink) and create streams. For example:
dataflow:> stream create --name httptest --definition "http --security.basic.enabled=false | log" --deploy
You will need to wait a little while until the apps are actually deployed successfully before posting data. Tail the log file for each application to verify the application has started. |
Now post some data. The URL will be unique to your deployment, the following is just an example
dataflow:> http post --target http://dataflow-AxwwAhK-httptest-http.cfapps.io --data "hello world"
Look to see if hello world
ended up in log files for the log
application.
To run a simple task application, you can register all the out-of-the-box task applications with the following command.
dataflow:>app import --uri http://bit.ly/Addison-GA-task-applications-maven
Now create a simple timestamp task.
dataflow:>task create mytask --definition "timestamp --format='yyyy'"
Tail the logs, e.g. cf logs mytask
and then launch the task in the UI or in the Data Flow Shell
dataflow:>task launch mytask
You will see the year 2016
printed in the logs. The execution status of the task is stored
in the database and you can retrieve information about the task execution using the shell commands
task execution list
and task execution status --id <ID_OF_TASK>
or though the Data Flow UI.
5. Security
By default, the Data Flow server is unsecured and runs on an unencrypted HTTP connection. You can secure your REST endpoints,
as well as the Data Flow Dashboard by enabling HTTPS and requiring clients to authenticate. More details about securing the
REST endpoints and configuring to authenticate against an OAUTH backend (i.e: UAA/SSO running on Cloud Foundry), please
review the security section from the core reference guide. The security configurations can be configured in dataflow-server.yml
or passed as environment variables through cf set-env
commands.
6. Application Names and Prefixes
To help avoid clashes with routes across spaces in Cloud Foundry, a naming strategy to provide a random prefix to a
deployed application is available and is enabled by default. The default configurations
are overridable and the respective properties can be set via cf set-env
commands.
For instance, if you’d like to disable the randomization, you can override it through:
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_ENABLE_RANDOM_APP_NAME_PREFIX false
6.1. Using Custom Routes
As an alternative to random name, or to get even more control over the hostname used by the deployed apps, one can use custom deployment properties, as such:
[[source]
dataflow:>stream create foo --definition "http | log" dataflow:>stream deploy foo --properties "app.http.spring.cloud.deployer.cloudfoundry.domain=mydomain.com, app.http.spring.cloud.deployer.cloudfoundry.host=myhost, app.http.spring.cloud.deployer.cloudfoundry.route-path=my-path"
This would result in the http
app being bound to the URL myhost.mydomain.com/my-path
. Note that this is an
example showing all customization options available. One can of course only leverage one or two out of the three.
7. Authentication and Cloud Foundry
When deploying Spring Cloud Data Flow to Cloud Foundry, you can take advantage of the Spring Cloud Single Sign-On Connector, which provides Cloud Foundry specific auto-configuration support for OAuth 2.0, when used in conjunction with the Pivotal Single Sign-On Service.
Simply set security.basic.enabled
to true
and in Cloud Foundry bind the SSO
service to your Data Flow Server app and SSO will be enabled.
8. Configuration Reference
The following pieces of configuration must be provided. These are Spring Boot @ConfigurationProperties
so you can set
them as environment variables or by any other means that Spring Boot supports. Here is a listing in environment
variable format as that is an easy way to get started configuring Boot applications in Cloud Foundry.
# Default values cited after the equal sign.
# Example values, typical for Pivotal Web Services, cited as a comment
# url of the CF API (used when using cf login -a for example), e.g. https://api.run.pivotal.io
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL)
spring.cloud.deployer.cloudfoundry.url=
# name of the organization that owns the space above, e.g. youruser-org
# (For Setting Env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG)
spring.cloud.deployer.cloudfoundry.org=
# name of the space into which modules will be deployed, e.g. development
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE)
spring.cloud.deployer.cloudfoundry.space=
# the root domain to use when mapping routes, e.g. cfapps.io
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN)
spring.cloud.deployer.cloudfoundry.domain=
# username and password of the user to use to create apps
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME and SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD)
spring.cloud.deployer.cloudfoundry.username=
spring.cloud.deployer.cloudfoundry.password=
# Whether to allow self-signed certificates during SSL validation
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION)
spring.cloud.deployer.cloudfoundry.skipSslValidation=false
# Comma separated set of service instance names to bind to every stream app deployed.
# Amongst other things, this should include a service that will be used
# for Spring Cloud Stream binding, e.g. rabbit
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES)
spring.cloud.deployer.cloudfoundry.stream.services=
# Health check type to use for stream apps. Accepts 'none' and 'port'
spring.cloud.deployer.cloudfoundry.stream.health-check=
# Comma separated set of service instance names to bind to every task app deployed.
# Amongst other things, this should include an RDBMS service that will be used
# for Spring Cloud Task execution reporting, e.g. my_mysql
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES)
spring.cloud.deployer.cloudfoundry.task.services=
# Timeout to use, in seconds, when doing blocking API calls to Cloud Foundry.
# (for setting env var use SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_API_TIMEOUT
and SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_API_TIMEOUT)
spring.cloud.deployer.cloudfoundry.stream.apiTimeout=360
spring.cloud.deployer.cloudfoundry.task.apiTimeout=360
Note that you can set the following properties spring.cloud.deployer.cloudfoundry.services
,
spring.cloud.deployer.cloudfoundry.buildpack
or the Spring Cloud Deployer standard
spring.cloud.deployer.memory
and spring.cloud.deployer.disk
as part of an individual deployment request prefixed by the app.<name of application>
. For example
>stream create --name ticktock --definition "time | log"
>stream deploy --name ticktock --properties "app.time.spring.cloud.deployer.memory=2g"
will deploy the time source with 2048MB of memory, while the log sink will use the default 1024MB.
8.1. Using Spring Cloud Config Server
If using Spring Cloud Config Server as a Cloud Foundry service, the easiest way to externalize the above configuration
and consume it from the Data Flow server is to use the spring-cloud-services-starter-config-client
dependency, which
is included in the standard distribution of the Spring Cloud Data Flow server for Cloud Foundry.
Follow the documentation for Config Server for Pivotal Cloud Foundry. For more details, please refer to Spring Cloud Services client-dependencies documentation.
9. Application Level Service Bindings
When deploying streams in Cloud Foundry, you can take advantage of application specific service bindings, so not all services are globally configured for all the apps orchestrated by Spring Cloud Data Flow.
For instance, if you’d like to provide mysql
service binding only for the jdbc
application in the following stream
definition, you can pass the service binding as a deployment property.
dataflow:>stream create --name httptojdbc --definition "http | jdbc"
dataflow:>stream deploy --name httptojdbc --properties "app.jdbc.spring.cloud.deployer.cloudfoundry.services=mysqlService"
Where, mysqlService
is the name of the service specifically only bound to jdbc
application and the http
application wouldn’t get the binding by this method. If you have more than one service to bind, they can be passed as comma separated items (eg: app.jdbc.spring.cloud.deployer.cloudfoundry.services=mysqlService,someService).
10. A Note About User Provided Services
In addition to marketplace services, Cloud Foundry supports User Provided Services. Throughout this reference manual, regular services have been mentioned, but there is nothing precluding the use of UPSs as well, whether for use as the messaging middleware (e.g. if you’d like to use an external Apache Kafka installation) or for ad hoc usage by some of the stream apps (e.g. an Oracle Database).
11. Application Rolling Upgrades
Similar to Cloud Foundry’s blue-green deployments, you can perform rolling upgrades on the applications orchestrated by Spring Cloud Data Flow.
Let’s start with the following simple stream definition.
dataflow:>stream create --name foo --definition "time | log" --deploy
List Apps.
→ cf apps
Getting apps in org test-org / space development as [email protected]...
OK
name requested state instances memory disk urls
foo-log started 1/1 1G 1G foo-log.cfapps.io
foo-time started 1/1 1G 1G foo-time.cfapps.io
Let’s assume you’ve to make an enhancement to update the "logger" to append extra text in every log statement.
-
Download the
Log Sink
application starter with "Rabbit binder starter" from start-scs.cfapps.io/ -
Load the downloaded project in an IDE
-
Import the
LogSinkConfiguration.class
-
Adapt the handler to add extra text:
loggingHandler.setLoggerName("TEST [" + this.properties.getName() + "]");
-
Build the application locally
@SpringBootApplication
@Import(LogSinkConfiguration.class)
public class DemoApplication {
@Autowired
private LogSinkProperties properties;
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
@Bean
@ServiceActivator(inputChannel = Sink.INPUT)
public LoggingHandler logSinkHandler() {
LoggingHandler loggingHandler = new LoggingHandler(this.properties.getLevel().name());
loggingHandler.setExpression(this.properties.getExpression());
loggingHandler.setLoggerName("TEST [" + this.properties.getName() + "]");
return loggingHandler;
}
}
Let’s deploy the locally built application to Cloud Foundry
→ cf push foo-log-v2 -p demo-0.0.1-SNAPSHOT.jar -n foo-log-v2 --no-start
List Apps.
→ cf apps
Getting apps in org test-org / space development as [email protected]...
OK
name requested state instances memory disk urls
foo-log started 1/1 1G 1G foo-log.cfapps.io
foo-time started 1/1 1G 1G foo-time.cfapps.io
foo-log-v2 stopped 1/1 1G 1G foo-log-v2.cfapps.io
The stream applications do not communicate via (Go)Router, so they aren’t generating HTTP traffic. Instead, they
communicate via the underlying messaging middleware such as Kafka or RabbitMQ. In order to rolling upgrade to route the
payload from old to the new version of the application, you’d have to replicate the SPRING_APPLICATION_JSON
environment
variable from the old application that includes spring.cloud.stream.bindings.input.destination
and spring.cloud.stream.bindings.input.group
credentials.
You can find the SPRING_APPLICATION_JSON of the old application via: "cf env foo-log" .
|
cf set-env foo-log-v2 SPRING_APPLICATION_JSON '{"spring.cloud.stream.bindings.input.destination":"foo.time","spring.cloud.stream.bindings.input.group":"foo"}'
Let’s start foo-log-v2
application.
cf start foo-log-v2
As soon as the application bootstraps, you’d now notice the payload being load balanced between two log application instances running on Cloud Foundry. Since they both share the same "destination" and "consumer group", they are now acting as competing consumers.
Old App Logs:
2016-08-08T17:11:08.94-0700 [APP/0] OUT 2016-08-09 00:11:08.942 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:08
2016-08-08T17:11:10.95-0700 [APP/0] OUT 2016-08-09 00:11:10.954 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:10
2016-08-08T17:11:12.94-0700 [APP/0] OUT 2016-08-09 00:11:12.944 INFO 19 --- [ foo.time.foo-1] log.sink : 08/09/16 00:11:12
New App Logs:
2016-08-08T17:11:07.94-0700 [APP/0] OUT 2016-08-09 00:11:07.945 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:07]
2016-08-08T17:11:09.92-0700 [APP/0] OUT 2016-08-09 00:11:09.925 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:09]
2016-08-08T17:11:11.94-0700 [APP/0] OUT 2016-08-09 00:11:11.941 INFO 26 --- [ foo.time.foo-1] TEST [log.sink : 08/09/16 00:11:11]
Deleting the old version foo-log
from the CF CLI would make all the payload consumed by the foo-log-v2
application. Now,
you’ve successfully upgraded an application in the streaming pipeline without bringing it down in entirety to do
an adjustment in it.
List Apps.
→ cf apps
Getting apps in org test-org / space development as [email protected]...
OK
name requested state instances memory disk urls
foo-time started 1/1 1G 1G foo-time.cfapps.io
foo-log-v2 started 1/1 1G 1G foo-log-v2.cfapps.io
A comprehensive canary analysis along with rolling upgrades will be supported via Spinnaker in future releases. |
12. Maximum Disk Quota Configuration
By default, every application in Cloud Foundry starts with 1G disk quota and this can be adjusted to a default maximum of 2G. The default maximum can also be overridden up to 10G via Pivotal Cloud Foundry’s (PCF) Ops Manager GUI.
This configuration is relevant for Spring Cloud Data Flow because every stream and task deployment is composed of applications (typically Spring Boot uber-jar’s) and those applications are resolved from a remote maven repository. After resolution, the application artifacts are downloaded to the local Maven Repository for caching/reuse. With this happening in the background, there is a possibility the default disk quota (1G) fills up rapidly; especially, when we are experimenting with streams that are made up of unique applications. In order to overcome this disk limitation and depending on your scaling requirements,you may want to change the default maximum from 2G to 10G. Let’s review the steps to change the default maximum disk quota allocation.
12.1. PCF’s Operations Manager Configuration
From PCF’s Ops Manager, Select "Pivotal Elastic Runtime" tile and navigate to "Application Developer Controls" tab. Change the "Maximum Disk Quota per App (MB)" setting from 2048 to 10240 (10G). Save the disk quota update and hit "Apply Changes" to complete the configuration override.
12.2. Scale Application
Once the disk quota change is applied successfully and assuming you’ve a running application,
you may scale the application with a new disk_limit
through CF CLI.
→ cf scale dataflow-server -k 10GB
Scaling app dataflow-server in org ORG / space SPACE as user...
OK
....
....
....
....
state since cpu memory disk details
#0 running 2016-10-31 03:07:23 PM 1.8% 497.9M of 1.1G 193.9M of 10G
→ cf apps
Getting apps in org ORG / space SPACE as user...
OK
name requested state instances memory disk urls
dataflow-server started 1/1 1.1G 10G dataflow-server.apps.io
12.3. Configuring target free disk percentage
Even when configuring the Data Flow server to use 10G of space, there is the possibility of exhausting the available space on the local disk. The server implements a least recently used (LRU) algorithm that will remove maven artifacts from the local maven repository. This is configured using the following configuration property, the default value is 25.
# The low water mark percentage, expressed as in integer between 0 and 100, that triggers cleanup of
# the local maven repository
# (for setting env var use SPRING_CLOUD_DATAFLOW_SERVER_CLOUDFOUNDRY_FREE_DISK_SPACE_PERCENTAGE)
spring.cloud.dataflow.server.cloudfoundry.freeDiskSpacePercentage=25
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/configuration.adoc[] Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/streams.adoc[] Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/tasks.adoc[]
Tasks on Cloud Foundry
13. Version Compatibility
The task functionality depends on the latest versions of PCF for runtime support. This release requires PCF version 1.7.12 or higher to run tasks. Tasks are an experimental feature in PCF 1.7 and 1.8 and a GA feature in PCF 1.9.
14. Tooling
Because the task functionality is experimental in PCF for versions less than 1.9, the tooling around it within the CF ecosystem is not complete. In order to interact with tasks via the PCF command line interface (CLI) for versions less than 1.9, you need to install a plugin: v3-cli-plugin. It’s important to note that this plugin is only compatible with the PCF CLI version 6.17.0+5d0be0a-2016-04-15. You can read more about the functionality the plugin provides in its README.
It’s also important to note that there is no Apps Manager support for tasks as of this release. When running applications as tasks through Spring Cloud Data Flow, the only way to view them within the context of CF is via the plugin mentioned above.
15. Task Database Schema
The database schema for Task applications was changed slighlty from the 1.0.x to 1.1.x version of Spring Cloud Task. Since Spring Cloud Data Flow automatically creates the database schema if it is not present upon server startup, you may need to update the schema if you ran a 1.0.x version of the Data Flow server and now are upgrading to the 1.1.x version. You can find the migration scripts here in the Spring Cloud Task Github repository. The documentation for Accessing Services with Diego SSH and this blog entry for connecting a GUI tools to the MySQL Service in PCF should help you to update the schema.
16. Running Task Applications
Running a task application within Spring Cloud Data Flow goes through a slightly different lifecycle than running a stream application. Both types of applications need to be registered with the appropriate artifact coordinates. Both need a definition created via the SCDF DSL. However, that’s where the similarities end.
With stream based applications, you "deploy" them with the intent that they run until they are undeployed. A stream definition is only deployed once (it can be scaled, but only deployed as one instance of the stream as a whole). However, tasks are launched. A single task definition can be launched many times. With each launch, they will start, execute, and shut down with PCF cleaning up the resources once the shutdown has occurred. The following sections outline the process of creating, launching, destroying, and viewing tasks.
16.1. Create a Task
Similar to streams, creating a task application is done via the SCDF DSL or through the dashboard. To create a task definition in SCDF, you’ve to either develop a task application or use one of the out-of-the-box task app-starters. The maven coordinates of the task application should be registered in SCDF. For more details on how to register task applications, review register task applications section from the core docs.
Let’s see an example that uses the out-of-the-box timestamp
task application.
dataflow:>task create --name foo --definition "timestamp"
Created new task 'foo'
Tasks in SCDF do not require explicit deployment. They are required to be launched and with that there are different ways to launch them - refer to this section for more details. |
16.2. Launch a Task
Unlike streams, tasks in SCDF requires an explicit launch trigger or it can be manually kicked-off.
dataflow:>task launch foo
Launched task 'foo'
16.3. View Task Logs
As previously mentioned, the v3-cli-plugin is the way to interact with tasks on PCF,
including viewing the logs. In order to view the logs as a task is executing use the
following command where foo
is the name of the task you are executing:
cf v3-logs foo
Tailing logs for app foo...
....
....
....
....
2016-08-19T09:44:49.11-0700 [APP/TASK/bar1/0]OUT 2016-08-19 16:44:49.111 INFO 7 --- [ main] o.s.c.t.a.t.TimestampTaskApplication : Started TimestampTaskApplication in 2.734 seconds (JVM running for 3.288)
2016-08-19T09:44:49.13-0700 [APP/TASK/bar1/0]OUT Exit status 0
2016-08-19T09:44:49.19-0700 [APP/TASK/bar1/0]OUT Destroying container
2016-08-19T09:44:50.41-0700 [APP/TASK/bar1/0]OUT Successfully destroyed container
Logs are only viewable through the v3-cli-plugin as the app is running. Historic logs are not available. |
16.4. List Tasks
Listing tasks is as simple as:
dataflow:>task list
╔══════════════════════╤═════════════════════════╤═══════════╗
║ Task Name │ Task Definition │Task Status║
╠══════════════════════╪═════════════════════════╪═══════════╣
║foo │timestamp │complete ║
╚══════════════════════╧═════════════════════════╧═══════════╝
16.5. List Task Executions
If you’d like to view the execution details of the launched task, you could do the following.
dataflow:>task execution list
╔════════════════════════╤══╤═════════════════════════╤═════════════════════════╤════════╗
║ Task Name │ID│ Start Time │ End Time │ Exit ║
║ │ │ │ │ Code ║
╠════════════════════════╪══╪═════════════════════════╪═════════════════════════╪════════╣
║foo:cloud: │1 │ Fri Aug 19 09:44:49 PDT │Fri Aug 19 09:44:49 PDT │0 ║
╚════════════════════════╧══╧═════════════════════════╧═════════════════════════╧════════╝
16.6. Destroy a Task
Destroying the task application from SCDF removes the task definition from task repository.
dataflow:>task destroy foo
Destroyed task 'foo'
dataflow:>task list
╔═════════╤═══════════════╤═══════════╗
║Task Name│Task Definition│Task Status║
╚═════════╧═══════════════╧═══════════╝
16.7. Deleting Task From Cloud Foundry
Currently Spring Cloud Data Flow does not delete tasks deployed on a Cloud Foundry instance once they have been pushed. The only way to do this now is via CLI on a Cloud Foundry instance version 1.9 or above. This is done in 2 steps:
-
Obtain a list of the apps via the
cf apps
command. -
Identify the task app to be deleted and execute the
cf delete <task-name>
command.
The task destroy <task-name> only deletes the definition and not the task
deployed on Cloud Foundry.
|
Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/dashboard.adoc[] Unresolved directive in index.adoc - include::https://raw.githubusercontent.com/spring-cloud/spring-cloud-dataflow/v1.1.2.RELEASE/spring-cloud-dataflow-docs/src/main/asciidoc/howto.adoc[]