12. Deploying on Cloud Foundry

Spring Cloud Data Flow can be used to deploy modules in a Cloud Foundry environment. When doing so, the server application can either run itself on Cloud Foundry, or on another installation (e.g. a simple laptop).

The required configuration amounts to the same in either case, and is merely related to providing credentials to the Cloud Foundry instance so that the server can spawn applications itself. Any Spring Boot compatible configuration mechanism can be used (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, etc.), although some may prove more practicable than others when running on Cloud Foundry.

[Note]Note

By default, the application registry in Spring Cloud Data Flow’s Cloud Foundry server is empty. It is intentionally designed to allow users to have the flexibility of choosing and registering applications, as they find appropriate for the given use-case requirement. Depending on the message-binder of choice, users can register between RabbitMQ or Apache Kafka based maven artifacts.

12.1 Provision a Redis service instance on Cloud Foundry.

Use cf marketplace to discover which plans are available to you, depending on the details of your Cloud Foundry setup. For example when using Pivotal Web Services:

cf create-service rediscloud 30mb redis

A redis instance is required for analytics apps, and would typically be bound to such apps when you create an analytics stream using the per-app-binding feature.

12.2 Provision a Rabbit service instance on Cloud Foundry.

Use cf marketplace to discover which plans are available to you, depending on the details of your Cloud Foundry setup. For example when using Pivotal Web Services:

cf create-service cloudamqp lemur rabbit

Rabbit is typically used as a messaging middleware between streaming apps and would be bound to each deployed app thanks to the SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES setting (see below).

12.3 Provision a MySQL service instance on Cloud Foundry.

Use cf marketplace to discover which plans are available to you, depending on the details of your Cloud Foundry setup. For example when using Pivotal Web Services:

cf create-service p_mysql 100mb my_mysql

An RDBMS is used to persist Data Flow state, such as stream definitions and deployment ids. It can also be used for tasks to persist execution history.

12.4 Download the Spring Cloud Data Flow Server and Shell apps:

wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.0.0.RELEASE/spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar
wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.0.RELEASE/spring-cloud-dataflow-shell-1.0.0.RELEASE.jar

You can either deploy the server application on Cloud Foundry itself or on your local machine. The following two sections explain each way of running the server.

12.5 Deploying the Server app on Cloud Foundry

Push the server application on Cloud Foundry, configure it (see below) and start it.

[Note]Note

You must use a unique name for your app; an app with the same name in the same organization will cause your deployment to fail

cf push dataflow-server --no-start -p spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar
cf bind-service dataflow-server redis
cf bind-service dataflow-server my_mysql
[Note]Note

If you are pushing to a space with multiple users, for example on PWS, there may already be a route taken for the applicaiton name you have chosen. You can use the options --random-route to avoid this when pushing the app.

Now we can configure the app. The following configuration is for Pivotal Web Services. You need to fill in {org}, {space}, {email} and {password} before running these commands.

[Note]Note

Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production.

[Note]Note

If you are deploying in an environment that requires you to sign on using the Pivotal Single Sign-On Service, refer to the section Chapter 15, Authentication and Cloud Foundry for information on how to configure the server.

cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL https://api.run.pivotal.io
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG {org}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE {space}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN cfapps.io
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES rabbit
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME {email}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD {password}
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION false

Spring Cloud Data Flow server implementations (cf, mesos, yarn, or kubernetes) do not have 'any' default remote maven repository configured. This is intentionally designed to provide the flexibility for the users, so they can override and point to a remote repository of their choice. The out-of-the-box applications that are supported by Spring Cloud Data Flow are available in Spring’s repository, so if you want to use them, you must set it as the remote repository as listed below.

cf set-env dataflow-server MAVEN_REMOTE_REPOSITORIES_REPO1_URL https://repo.spring.io/libs-snapshot

where repo1 is the alias name for the remote repository.

You can also set other optional properties for deployment to Cloud Foundry.

  • You can set the buildpack that will be used to deploy the application. For example, to use the Java offline buildback, set the following environment variable
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK java_buildpack_offline
  • If you’d like to use config-server to manage centralized configurations for all the applications orchestrated by Spring Cloud Data Flow, you can set it up like the following.
cf set-env dataflow-server SPRING_APPLICATION_JSON '{"spring.cloud.dataflow.applicationProperties.stream.spring.cloud.config.uri": "http://<CONFIG_SERVER_URI>"}'
  • The default memory and disk sizes for a deployed application can also be configured. By default they are 1024 MB memory and 1024 MB disk. Thse are controlled by setting an integer value, representing the number of MB, to the following properties, spring.cloud.deployer.cloudfoundry.stream.memory and spring.cloud.deployer.cloudfoundry.stream.disk. The default number of instances to deploy is set to 1, but can be overridden using with the spring.cloud.deployer.cloudfoundry.stream.instances property. All these properties are @ConfigurationProperties of the Cloud Foundry deployer. See CloudFoundryDeploymentProperties.java for more information.

We are now ready to start the app.

cf start dataflow-server

Alternatively, you can run the Admin application locally on your machine which is described in the next section.

12.6 Running the Server app locally

To run the server application locally, targeting your Cloud Foundry installation, you you need to configure the application either by passing in command line arguments (see below) or setting a number of environment variables.

To use environment variables set the following:

export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL=https://api.run.pivotal.io
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG={org}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE={space}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN=cfapps.io
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME={email}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD={password}
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION=false

export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES=rabbit

You need to fill in {org}, {space}, {email} and {password} before running these commands.

[Note]Note

Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production.

Now we are ready to start the server application:

java -jar spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar [--option1=value1] [--option2=value2] [etc.]

12.7 Running Tasks

Tasks are enabled as experimental feature in Spring Cloud Data Flow Cloud Foundry server. To enable running tasks, you can set the environment variable,

export SPRING_CLOUD_DATAFLOW_FEATURES_EXPERIMENTAL_TASKSENABLED=true

or, as a command line argument when starting the data flow server --spring.cloud.dataflow.features.experimental.tasksEnabled=true

12.8 Running Spring Cloud Data Flow Shell locally

Run the shell and optionally target the Admin application if not running on the same host (will typically be the case if deployed on Cloud Foundry as explained here)

$ java -jar spring-cloud-dataflow-shell-1.0.0.RELEASE.jar
server-unknown:>dataflow config server http://dataflow-server.cfapps.io
Successfully targeted http://dataflow-server.cfapps.io
dataflow:>

By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.

dataflow:>app import --uri http://bit.ly/stream-applications-rabbit-maven

You can now use the shell commands to list available applications (source/processors/sink) and create streams. For example:

dataflow:> stream create --name httptest --definition "http | log" --deploy
[Note]Note

You will need to wait a little while until the apps are actually deployed successfully before posting data. Tail the log file for each application to verify the application has started.

Now post some data. The URL will be unique to your deployment, the following is just an example

dataflow:> http post --target http://dataflow-nonconcentrative-knar-httptest-http.cfapps.io --data "hello world"

Look to see if hello world ended up in log files for the log application.