Spring Cloud Data Flow can be used to deploy modules in a Cloud Foundry environment. When doing so, the server application can either run itself on Cloud Foundry, or on another installation (e.g. a simple laptop).
The required configuration amounts to the same in either case, and is merely related to providing credentials to the Cloud Foundry instance so that the server can spawn applications itself. Any Spring Boot compatible configuration mechanism can be used (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, etc.), although some may prove more practicable than others when running on Cloud Foundry.
Note | |
---|---|
By default, the application registry in Spring Cloud Data Flow’s Cloud Foundry server is empty. It is intentionally designed to allow users to have the flexibility of choosing and registering applications, as they find appropriate for the given use-case requirement. Depending on the message-binder of choice, users can register between RabbitMQ or Apache Kafka based maven artifacts. |
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service rediscloud 30mb redis
A redis instance is required for analytics apps, and would typically be bound to such apps when you create an analytics stream using the per-app-binding feature.
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cloudamqp lemur rabbit
Rabbit is typically used as a messaging middleware between streaming apps and would be bound to each deployed app
thanks to the SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES
setting (see below).
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cleardb spark my_mysql
An RDBMS is used to persist Data Flow state, such as stream definitions and deployment ids. It can also be used for tasks to persist execution history.
wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.3.0.M1/spring-cloud-dataflow-server-cloudfoundry-1.3.0.M1.jar wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-shell/1.3.0.M1/spring-cloud-dataflow-shell-1.3.0.M1.jar
You can either deploy the server application on Cloud Foundry itself or on your local machine. The following two sections explain each way of running the server.
Push the server application on Cloud Foundry, configure it (see below) and start it.
Note | |
---|---|
You must use a unique name for your app; an app with the same name in the same organization will cause your deployment to fail |
cf push dataflow-server -b java_buildpack -m 2G -k 2G --no-start -p spring-cloud-dataflow-server-cloudfoundry-1.3.0.M1.jar cf bind-service dataflow-server redis cf bind-service dataflow-server my_mysql
Important | |
---|---|
The recommended minimal memory setting for the server is 2G. Also, to push apps to PCF and obtain application property metadata, the server downloads applications to Maven repository hosted on the local disk. While you can specify up to 2G as a typical maximum value for disk space on a PCF installation, this can be increased to 10G. Read the maximum disk quota section for information on how to configure this PCF property. Also, the Data Flow server itself implements a Last Recently Used algorithm to free disk space when it falls below a low water mark value. |
Note | |
---|---|
If you are pushing to a space with multiple users, for example on PWS, there may already be a route taken for the
applicaiton name you have chosen. You can use the options |
Now we can configure the app. The following configuration is for Pivotal Web Services. You need to fill in {org}, {space}, {email} and {password} before running these commands.
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL https://api.run.pivotal.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG {org} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE {space} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN cfapps.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES rabbit cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES my_mysql cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME {email} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD {password} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION false
Warning | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Note | |
---|---|
If you are deploying in an environment that requires you to sign on using the Pivotal Single Sign-On Service, refer to the section ??? for information on how to configure the server. |
Spring Cloud Data Flow server implementations (be it for Cloud Foundry, Mesos, YARN, or Kubernetes) do not have any default remote maven repository configured. This is intentionally designed to provide the flexibility for the users, so they can override and point to a remote repository of their choice. The out-of-the-box applications that are supported by Spring Cloud Data Flow are available in Spring’s repository, so if you want to use them, set it as the remote repository as listed below.
cf set-env dataflow-server SPRING_APPLICATION_JSON '{"maven": { "remote-repositories": { "repo1": { "url": "https://repo.spring.io/libs-release" } } } }'
where repo1
is the alias name for the remote repository.
Note | |
---|---|
If you need to configure multiple Maven repositories, a proxy, or authorization for a private repository, see Maven Configuration. |
You can also set other optional properties that alter the way Spring Cloud Data Flow will deploy stream and task apps:
The default memory and disk sizes for a deployed application can be configured. By default they are 1024 MB memory and 1024 MB disk. To change these, as an example to 512 and 2048 respectively, use
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_MEMORY 512 cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_DISK 2048
The default number of instances to deploy is set to 1, but can be overridden using
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_INSTANCES 1
You can set the buildpack that will be used to deploy each application. For example, to use the Java offline buildback, set the following environment variable
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK java_buildpack_offline
The health check mechanism used by Cloud Foundry to assert if apps are running can be customized. Current supported options
are port
(the default) and none
. Change the default like so:
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_HEALTH_CHECK none
Note | |
---|---|
These settings can be configured separately for stream and task apps. To alter settings for tasks, simply
substitute cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_MEMORY 512 |
Tip | |
---|---|
All the properties mentioned above are |
We are now ready to start the app.
cf start dataflow-server
Alternatively, you can run the Admin application locally on your machine which is described in the next section.
To run the server application locally, targeting your Cloud Foundry installation, you you need to configure the application either by passing in command line arguments (see below) or setting a number of environment variables.
To use environment variables set the following:
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL=https://api.run.pivotal.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG={org} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE={space} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN=cfapps.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME={email} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD={password} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION=false export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES=rabbit # The following is for letting task apps write to their db. # Note however that when the *server* is running locally, it can't access that db # task related commands that show executions won't work then export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES=my_mysql
You need to fill in {org}, {space}, {email} and {password} before running these commands.
Warning | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Now we are ready to start the server application:
java -jar spring-cloud-dataflow-server-cloudfoundry-1.3.0.M1.jar [--option1=value1] [--option2=value2] [etc.]
Tip | |
---|---|
Of course, all other parameterization options that were available when running the server on Cloud Foundry are
still available. This is particularly true for configuring defaults for applications. Just
substitute |
Note | |
---|---|
The current underlying PCF task capabilities are considered experimental for PCF version versions less than 1.9. See Feature Togglers for how to disable task support in Data Flow. |
As an alternative to setting environment variables via cf set-env
command, you can curate all the relevant env-var’s
in manifest.yml
file and use cf push
command to provision the server.
Following is a sample template to provision the server on PCFDev.
--- applications: - name: data-flow-server host: data-flow-server memory: 2G disk_quota: 2G instances: 1 path: {PATH TO SERVER UBER-JAR} env: SPRING_APPLICATION_NAME: data-flow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL: https://api.local.pcfdev.io SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG: pcfdev-org SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE: pcfdev-space SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN: local.pcfdev.io SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME: admin SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD: admin SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES: rabbit SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_TASK_SERVICES: mysql SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION: true SPRING_APPLICATION_JSON {"maven": { "remote-repositories": { "repo1": { "url": "https://repo.spring.io/libs-release"} } } } services: - mysql
Once you’re ready with the relevant properties in this file, you can issue cf push
command from the directory where
this file is stored.
Run the shell and optionally target the Admin application if not running on the same host (will typically be the case if deployed on Cloud Foundry as explained here)
$ java -jar spring-cloud-dataflow-shell-1.3.0.M1.jar
server-unknown:>dataflow config server http://dataflow-server.cfapps.io Successfully targeted http://dataflow-server.cfapps.io dataflow:>
By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/Avogadro-SR1-stream-applications-rabbit-maven
A Note about application URIs | |
---|---|
While Spring Cloud Data Flow for Cloud Foundry leverages the core Data Flow project, and as such theoretically supports
registering apps using any scheme, the use of When deploying apps using Data Flow for Cloud Foundry, a typical choice is to use |
You can now use the shell commands to list available applications (source/processors/sink) and create streams. For example:
dataflow:> stream create --name httptest --definition "http | log" --deploy
Note | |
---|---|
You will need to wait a little while until the apps are actually deployed successfully before posting data. Tail the log file for each application to verify the application has started. |
Now post some data. The URL will be unique to your deployment, the following is just an example
dataflow:> http post --target http://dataflow-AxwwAhK-httptest-http.cfapps.io --data "hello world"
Look to see if hello world
ended up in log files for the log
application.
To run a simple task application, you can register all the out-of-the-box task applications with the following command.
dataflow:>app import --uri http://bit.ly/Addison-GA-task-applications-maven
Now create a simple timestamp task.
dataflow:>task create mytask --definition "timestamp --format='yyyy'"
Tail the logs, e.g. cf logs mytask
and then launch the task in the UI or in the Data Flow Shell
dataflow:>task launch mytask
You will see the year 2017
printed in the logs. The execution status of the task is stored
in the database and you can retrieve information about the task execution using the shell commands
task execution list
and task execution status --id <ID_OF_TASK>
or though the Data Flow UI.