In this getting started guide, the Data Flow Server is deployed to the Kubernetes cluster. This means that we need to make available an RDBMS service for stream and task repositories, app registry plus a transport option of either Kafka or Rabbit MQ.
Deploy a Kubernetes cluster.
The Kubernetes Getting Started guide lets you choose among many deployment options so you can pick one that you are most comfortable using. We have successfully used the Vagrant option from a downloaded Kubernetes release.
We have also used the Minikube project to run a local Kubernetes cluster for testing.
The rest of this getting started guide assumes that you have a working Kubernetes cluster and a kubectl
command line.
Create a Kafka service on the Kubernetes cluster.
The Kafka service will be used for messaging between modules in the stream. You can instead use Rabbit MQ, but, in order to simplify, we only show the Kafka configurations in this guide. There are sample replication controller and service YAML files in the spring-cloud-dataflow-server-kubernetes
repository that you can use as a starting point as they have the required metadata set for service discovery by the modules.
$ git clone https://github.com/spring-cloud/spring-cloud-dataflow-server-kubernetes $ cd spring-cloud-dataflow-server-kubernetes $ kubectl create -f src/etc/kubernetes/kafka-controller.yml $ kubectl create -f src/etc/kubernetes/kafka-service.yml
You can use the command kubectl get pods
to verify that the controller and service is running. Use the command kubectl get services
to check on the state of the service. Use the commands kubectl delete svc kafka
and kubectl delete rc kafka
to clean up afterwards.
Create a MySQL service on the Kubernetes cluster.
We are using MySQL for this guide, but you could use Postgres or H2 database instead. We include JDBC drivers for all three of these databases, you would just have to adjust the database URL and driver class name settings.
Before creating the MySQL service we need to create a persistent disk and modify the password in the config file. To create a persistent disk you can use the following command:
$ gcloud compute disks create mysql-disk --size 200 --type pd-standard
Modify the password in the src/etc/kubernetes/mysql-controller.yml
file inside the spring-cloud-dataflow-server-kubernetes
repository. Then run the following commands to start the database service:
$ kubectl create -f src/etc/kubernetes/mysql-controller.yml $ kubectl create -f src/etc/kubernetes/mysql-service.yml
Again, you can use the command kubectl get pods
to verify that the controller is running. Note that it can take a minute or so until there is an external IP address for the MySQL server. Use the command kubectl get services
to check on the state of the service and look for when there is a value under the EXTERNAL_IP column. Use the commands kubectl delete svc mysql
and kubectl delete rc mysql
to clean up afterwards. Use the EXTERNAL_IP address to connect to the database and create a test
database that we can use for our testing. Use your favorit SQL developer tool for this:
CREATE DATABASE test;
Update configuration files with values needed to connect to Kubernetes and MySQL.
The Data Flow Server uses the fabric8 Java client library to connect to the Kubernetes cluster. We are using environment variables to set the values needed when deploying the Data Flow server to Kubernetes. The settings are specified in the src/etc/kubernetes/scdf-controller.yml
file. Modify <<mysql-username>>
, <<mysql-password>>
and DB schema name to match what you used when creating the service.
This approach supports using one Data Flow Server instance per Kubernetes namespace.
Deploy the Spring Cloud Data Flow Server for Kubernetes using the Docker image and the configuration settings you just modified.
$ kubectl create -f src/etc/kubernetes/scdf-controller.yml $ kubectl create -f src/etc/kubernetes/scdf-service.yml
![]() | Note |
---|---|
We haven’t tuned the memory use of the OOTB apps yet, so to be on the safe side we are increasing the memory for the pods by providing the following property: |
Use the kubectl get svc
command to locate the EXTERNAL_IP address assigned to scdf
, we use that to connect from the shell.
$ kubectl get svc NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kafka 10.103.248.211 <none> 9092/TCP 14d kubernetes 10.103.240.1 <none> 443/TCP 16d mysql 10.103.251.179 104.154.246.220 3306/TCP 10d scdf 10.103.246.82 130.211.203.246 9393/TCP 4m zk 10.103.243.29 <none> 2181/TCP 14d
Download and run the Spring Cloud Data Flow shell.
wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.1.RELEASE/spring-cloud-dataflow-shell-1.0.1.RELEASE.jar $ java -jar spring-cloud-dataflow-shell-1.0.1.RELEASE.jar
Configure the Data Flow server URI with the following command (use the IP address from previous step and at the moment we are using port 9393):
____ ____ _ __ / ___| _ __ _ __(_)_ __ __ _ / ___| | ___ _ _ __| | \___ \| '_ \| '__| | '_ \ / _` | | | | |/ _ \| | | |/ _` | ___) | |_) | | | | | | | (_| | | |___| | (_) | |_| | (_| | |____/| .__/|_| |_|_| |_|\__, | \____|_|\___/ \__,_|\__,_| ____ |_| _ __|___/ __________ | _ \ __ _| |_ __ _ | ___| | _____ __ \ \ \ \ \ \ | | | |/ _` | __/ _` | | |_ | |/ _ \ \ /\ / / \ \ \ \ \ \ | |_| | (_| | || (_| | | _| | | (_) \ V V / / / / / / / |____/ \__,_|\__\__,_| |_| |_|\___/ \_/\_/ /_/_/_/_/_/ 1.0.1.RELEASE Welcome to the Spring Cloud Data Flow shell. For assistance hit TAB or type "help". server-unknown:>dataflow config server --uri http://130.211.203.246:9393 Successfully targeted http://130.211.203.246:9393 dataflow:>
Register the Kafka version of the time
and log
apps using the shell and also register the timestamp
app.
dataflow:>app register --type source --name time --uri docker:springcloudstream/time-source-kafka:latest dataflow:>app register --type sink --name log --uri docker:springcloudstream/log-sink-kafka:latest dataflow:>app register --type task --name timestamp --uri docker:springcloudtask/timestamp-task:latest
Alternatively, if you would like to register all out-of-the-box stream applications built with the Kafka binder in bulk, you can with the following command. For more details, review how to register applications.
dataflow:>app import --uri http://bit.ly/stream-applications-kafka-docker
Deploy a simple stream in the shell
dataflow:>stream create --name ticktock --definition "time | log" --deploy
You can use the command kubectl get pods
to check on the state of the pods corresponding to this stream. We can run this from the shell by running it as an OS command by adding a "!" before the command.
dataflow:>! kubectl get pods command is:kubectl get pods NAME READY STATUS RESTARTS AGE kafka-d207a 1/1 Running 0 50m ticktock-log-qnk72 1/1 Running 0 2m ticktock-time-r65cn 1/1 Running 0 2m
Look at the logs for the pod deployed for the log sink.
$ kubectl logs -f ticktock-log-qnk72 ... 2015-12-28 18:50:02.897 INFO 1 --- [ main] o.s.c.s.module.log.LogSinkApplication : Started LogSinkApplication in 10.973 seconds (JVM running for 50.055) 2015-12-28 18:50:08.561 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:08 2015-12-28 18:50:09.556 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:09 2015-12-28 18:50:10.557 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:10 2015-12-28 18:50:11.558 INFO 1 --- [hannel-adapter1] log.sink : 2015-12-28 18:50:11
![]() | Note |
---|---|
If you need to specify any of the app specific configuration properties then you must use "long-form" of them including the app specific prefix like |
![]() | Note |
---|---|
If you need to be able to connect from outside of the Kubernetes cluster to an app that you deploy, like the |
To register the http-source
and use it in a stream where you can post data to it, you can use the following commands:
dataflow:>app register --type source --name http --uri docker:springcloudstream/http-source-kafka:latest dataflow:>stream create --name test --definition "http | log" dataflow:>stream deploy test --properties "app.http.spring.cloud.deployer.kubernetes.createLoadBalancer=true"
Now, look up the external IP address for the http
app (it can sometimes take a minute or two for the external IP to get assigned):
dataflow:>! kubectl get service command is:kubectl get service NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE kafka 10.103.240.92 <none> 9092/TCP 7m kubernetes 10.103.240.1 <none> 443/TCP 4h test-http 10.103.251.157 130.211.200.96 8080/TCP 58s test-log 10.103.240.28 <none> 8080/TCP 59s zk 10.103.247.25 <none> 2181/TCP 7m
Next, post some data to the test-http
app:
dataflow:>http post --target http://130.211.200.96:8080 --data "Hello"
Finally, look at the logs for the test-log
pod:
dataflow:>! kubectl get pods command is:kubectl get pods NAME READY STATUS RESTARTS AGE kafka-o20qq 1/1 Running 0 9m test-http-9obkq 1/1 Running 0 2m test-log-ysiz3 1/1 Running 0 2m dataflow:>! kubectl logs test-log-ysiz3 command is:kubectl logs test-log-ysiz3 ... 2016-04-27 16:54:29.789 INFO 1 --- [ main] o.s.c.s.b.k.KafkaMessageChannelBinder$3 : started inbound.test.http.test 2016-04-27 16:54:29.799 INFO 1 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 0 2016-04-27 16:54:29.799 INFO 1 --- [ main] o.s.c.support.DefaultLifecycleProcessor : Starting beans in phase 2147482647 2016-04-27 16:54:29.895 INFO 1 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http) 2016-04-27 16:54:29.896 INFO 1 --- [ kafka-binder-] log.sink : Hello
A useful command to help in troubleshooting issues, such as a container that has a fatal error starting up, add the options --previous
to view last terminated container log. You can also get more detailed information about the pods by using the kubctl describe
like:
kubectl describe pods/ticktock-log-qnk72
Destroy the stream
dataflow:>stream destroy --name ticktock
Create a task and launch it
Let’s create a simple task definition and launch it.
dataflow:>task create task1 --definition "timestamp" dataflow:>task launch task1
We can now list the tasks and executions using these commands:
dataflow:>task list ╔═════════╤═══════════════╤═══════════╗ ║Task Name│Task Definition│Task Status║ ╠═════════╪═══════════════╪═══════════╣ ║task1 │timestamp │running ║ ╚═════════╧═══════════════╧═══════════╝ dataflow:>task execution list ╔═════════╤══╤════════════════════════════╤════════════════════════════╤═════════╗ ║Task Name│ID│ Start Time │ End Time │Exit Code║ ╠═════════╪══╪════════════════════════════╪════════════════════════════╪═════════╣ ║task1 │1 │Fri Jun 03 18:12:05 EDT 2016│Fri Jun 03 18:12:05 EDT 2016│0 ║ ╚═════════╧══╧════════════════════════════╧════════════════════════════╧═════════╝
Destroy the task
dataflow:>task destroy --name task1