Spring Cloud Data Flow can be used to deploy modules in a Cloud Foundry environment. When doing so, the server application can either run itself on Cloud Foundry, or on another installation (e.g. a simple laptop).
The required configuration amounts to the same in either case, and is merely related to providing credentials to the Cloud Foundry instance so that the server can spawn applications itself. Any Spring Boot compatible configuration mechanism can be used (passing program arguments, editing configuration files before building the application, using Spring Cloud Config, using environment variables, etc.), although some may prove more practicable than others when running on Cloud Foundry.
Note | |
---|---|
By default, the application registry in Spring Cloud Data Flow’s Cloud Foundry server is empty. It is intentionally designed to allow users to have the flexibility of choosing and registering applications, as they find appropriate for the given use-case requirement. Depending on the message-binder of choice, users can register between RabbitMQ or Apache Kafka based maven artifacts. |
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service rediscloud 30mb redis
A redis instance is required for analytics apps, and would typically be bound to such apps when you create an analytics stream using the per-app-binding feature.
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service cloudamqp lemur rabbit
Rabbit is typically used as a messaging middleware between streaming apps and would be bound to each deployed app
thanks to the SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES
setting (see below).
Use cf marketplace
to discover which plans are available to you, depending on the details of your Cloud Foundry setup.
For example when using Pivotal Web Services:
cf create-service p_mysql 100mb my_mysql
An RDBMS is used to persist Data Flow state, such as stream definitions and deployment ids. It can also be used for tasks to persist execution history.
wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-cloudfoundry/1.0.0.RELEASE/spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.0.RELEASE/spring-cloud-dataflow-shell-1.0.0.RELEASE.jar
You can either deploy the server application on Cloud Foundry itself or on your local machine. The following two sections explain each way of running the server.
Push the server application on Cloud Foundry, configure it (see below) and start it.
Note | |
---|---|
You must use a unique name for your app; an app with the same name in the same organization will cause your deployment to fail |
cf push dataflow-server --no-start -p spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar cf bind-service dataflow-server redis cf bind-service dataflow-server my_mysql
Note | |
---|---|
If you are pushing to a space with multiple users, for example on PWS, there may already be a route taken for the
applicaiton name you have chosen. You can use the options |
Now we can configure the app. The following configuration is for Pivotal Web Services. You need to fill in {org}, {space}, {email} and {password} before running these commands.
Note | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Note | |
---|---|
If you are deploying in an environment that requires you to sign on using the Pivotal Single Sign-On Service, refer to the section Chapter 15, Authentication and Cloud Foundry for information on how to configure the server. |
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL https://api.run.pivotal.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG {org} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE {space} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN cfapps.io cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES rabbit cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME {email} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD {password} cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION false
Spring Cloud Data Flow server implementations (cf, mesos, yarn, or kubernetes) do not have 'any' default remote maven repository configured. This is intentionally designed to provide the flexibility for the users, so they can override and point to a remote repository of their choice. The out-of-the-box applications that are supported by Spring Cloud Data Flow are available in Spring’s repository, so if you want to use them, you must set it as the remote repository as listed below.
cf set-env dataflow-server MAVEN_REMOTE_REPOSITORIES_REPO1_URL https://repo.spring.io/libs-snapshot
where repo1
is the alias name for the remote repository.
You can also set other optional properties for deployment to Cloud Foundry.
cf set-env dataflow-server SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_BUILDPACK java_buildpack_offline
config-server
to manage centralized configurations for all the applications orchestrated by
Spring Cloud Data Flow, you can set it up like the following.cf set-env dataflow-server SPRING_APPLICATION_JSON '{"spring.cloud.dataflow.applicationProperties.stream.spring.cloud.config.uri": "http://<CONFIG_SERVER_URI>"}'
spring.cloud.deployer.cloudfoundry.stream.memory
and spring.cloud.deployer.cloudfoundry.stream.disk
.
The default number of instances to deploy is set to 1, but can be overridden using with the
spring.cloud.deployer.cloudfoundry.stream.instances
property. All these properties are @ConfigurationProperties
of the
Cloud Foundry deployer. See CloudFoundryDeploymentProperties.java for more information.We are now ready to start the app.
cf start dataflow-server
Alternatively, you can run the Admin application locally on your machine which is described in the next section.
To run the server application locally, targeting your Cloud Foundry installation, you you need to configure the application either by passing in command line arguments (see below) or setting a number of environment variables.
To use environment variables set the following:
export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_URL=https://api.run.pivotal.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_ORG={org} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SPACE={space} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_DOMAIN=cfapps.io export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_USERNAME={email} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_PASSWORD={password} export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_SKIP_SSL_VALIDATION=false export SPRING_CLOUD_DEPLOYER_CLOUDFOUNDRY_STREAM_SERVICES=rabbit
You need to fill in {org}, {space}, {email} and {password} before running these commands.
Note | |
---|---|
Only set 'Skip SSL Validation' to true if you’re running on a Cloud Foundry instance using self-signed certs (e.g. in development). Do not use for production. |
Now we are ready to start the server application:
java -jar spring-cloud-dataflow-server-cloudfoundry-1.0.0.RELEASE.jar [--option1=value1] [--option2=value2] [etc.]
Tasks are enabled as experimental
feature in Spring Cloud Data Flow Cloud Foundry server. To enable running tasks, you
can set the environment variable,
export SPRING_CLOUD_DATAFLOW_FEATURES_EXPERIMENTAL_TASKSENABLED=true
or, as a command line argument when starting the data flow server --spring.cloud.dataflow.features.experimental.tasksEnabled=true
Run the shell and optionally target the Admin application if not running on the same host (will typically be the case if deployed on Cloud Foundry as explained here)
$ java -jar spring-cloud-dataflow-shell-1.0.0.RELEASE.jar
server-unknown:>dataflow config server http://dataflow-server.cfapps.io Successfully targeted http://dataflow-server.cfapps.io dataflow:>
By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/stream-applications-rabbit-maven
You can now use the shell commands to list available applications (source/processors/sink) and create streams. For example:
dataflow:> stream create --name httptest --definition "http | log" --deploy
Note | |
---|---|
You will need to wait a little while until the apps are actually deployed successfully before posting data. Tail the log file for each application to verify the application has started. |
Now post some data. The URL will be unique to your deployment, the following is just an example
dataflow:> http post --target http://dataflow-nonconcentrative-knar-httptest-http.cfapps.io --data "hello world"
Look to see if hello world
ended up in log files for the log
application.