1. Installation

In this section we will install the Spring Cloud Data Flow Server on a Kubernetes cluster. Spring Cloud Data Flow depends on a few services and their availability. For example, we need an RDBMS service for the app registry, stream/task repositories and task management. For streaming pipelines, we also need a transport option such as Apache Kafka or Rabbit MQ. In addition to this, we need a Redis service if the analytics features are in use.

[Important]Important

This guide describes setting up an environment for testing Spring Cloud Data Flow on Google Kubernetes Engine and is not meant to be a definitive guide for setting up a production environment. Feel free to adjust the suggestions to fit your test set-up. Please remember that a production environment requires much more consideration for persistent storage of message queues, high availability, security etc.

[Note]Note

Currently, only apps registered with a --uri property pointing to a Docker resource are supported by the Data Flow Server for Kubernetes.

Note that we do support Maven resources for the --metadata-uri property.

E.g. the below app registration is valid:

dataflow:>app register --type source --name time --uri docker://springcloudstream/time-source-rabbit:1.3.0.RELEASE --metadata-uri maven://org.springframework.cloud.stream.app:time-source-rabbit:jar:metadata:1.3.0.RELEASE

but any app registered with a Maven, HTTP or File resource for the executable jar (using a --uri property prefixed with maven://, http:// or file://) is not supported.

1.1 Create a Kubernetes cluster

The Kubernetes Picking the Right Solution guide lets you choose among many options so you can pick one that you are most comfortable using.

All our testing is done using the Google Kubernetes Engine that is part of the Google Cloud Platform. That is a also the target platform for this section. We have also successfully deployed using Minikube and we will note where you need to adjust for deploying on Minikube.

[Note]Note

When starting Minikube you should allocate some extra resources since we will be deploying several services. We have used minikube start --cpus=4 --memory=4096 to start.

The rest of this getting started guide assumes that you have a working Kubernetes cluster and a kubectl command line utility. See the docs for installation instructions: Installing and Setting up kubectl.

1.2 Deploying using kubectl

  1. Get the Kubernetes configuration files.

    There are sample deployment and service YAML files in the https://github.com/spring-cloud/spring-cloud-dataflow-server-kubernetes repository that you can use as a starting point. They have the required metadata set for service discovery by the different apps and services deployed. To check out the code enter the following commands:

    $ git clone https://github.com/spring-cloud/spring-cloud-dataflow-server-kubernetes
    $ cd spring-cloud-dataflow-server-kubernetes
    $ git checkout master
  2. Deploy Rabbit MQ.

    The Rabbit MQ service will be used for messaging between modules in the stream. You could also use Kafka, but, in order to simplify, we only show the Rabbit MQ configurations in this guide.

    Run the following commands to start the Rabbit MQ service:

    $ kubectl create -f src/kubernetes/rabbitmq/

    You can use the command kubectl get all -l app=rabbitmq to verify that the deployment, pod and service resources are running. Use the command kubectl delete all -l app=rabbitmq to clean up afterwards.

  3. Deploy MySQL.

    We are using MySQL for this guide, but you could use Postgres or H2 database instead. We include JDBC drivers for all three of these databases, you would just have to adjust the database URL and driver class name settings.

    [Important]Important

    You can modify the password in the src/kubernetes/mysql/mysql-deployment.yaml files if you prefer to be more secure. If you do modify the password you will also have to provide it base64 encoded in the src/kubernetes/mysql/mysql-secrets.yaml file.

    Run the following commands to start the MySQL service:

    $ kubectl create -f src/kubernetes/mysql/

    You can use the command kubectl get all -l app=mysql to verify that the deployment, pod and service resources are running. Use the command kubectl delete all,pvc,secrets -l app=mysql to clean up afterwards.

  4. Deploy Redis.

    The Redis service will be used for the analytics functionality. Run the following commands to start the Redis service:

    $ kubectl create -f src/kubernetes/redis/
    [Note]Note

    If you don’t need the analytics functionality you can turn this feature off by changing SPRING_CLOUD_DATAFLOW_FEATURES_ANALYTICS_ENABLED to false in the src/kubernetes/server/server-deployment.yml file. If you don’t install the Redis service then you should also remove the Redis configuration settings in src/kubernetes/server/server-config-kafka.yml mentioned below.

    You can use the command kubectl get all -l app=redis to verify that the deployment, pod and service resources are running. Use the command kubectl delete all -l app=redis to clean up afterwards.

  5. Deploy the Metrics Collector.

    The Metrics Collector will provide message rates for all deployed stream apps. These message rates will be visible in the Dashboard UI. Run the following commands to start the Metrics Collector:

    $ kubectl create -f src/kubernetes/metrics/metrics-deployment-rabbit.yaml
    $ kubectl create -f src/kubernetes/metrics/metrics-svc.yaml

    You can use the command kubectl get all -l app=metrics to verify that the deployment, pod and service resources are running. Use the command kubectl delete all -l app=metrics to clean up afterwards.

  6. Deploy Skipper

    This is an optional step. Deploy Skipper if you want the added features of upgrading and rolling back Streams since Data Flow delegates to Skipper for those features. For more details, review the reference guide for a complete overview and the feature capabilities. See the section Section 1.3, “Deploy Skipper” for details.

  7. Deploy the Data Flow Server.

    [Important]Important

    You should specify the version of the Spring Cloud Data Flow server that you want to deploy.

    The deployment is defined in the src/kubernetes/server/server-deployment.yaml file. To control what version of the Spring Cloud Data Flow server that gets deployed you should modify the tag used for the Docker image in the container spec:

        spec:
          containers:
          - name: scdf-server
            image: springcloud/spring-cloud-dataflow-server-kubernetes:latest    1
            imagePullPolicy: Always

    1

    change latest to the version you would like. This document is based on the 1.3.0.M3 version so the recommended image tag to use for this is latest.

    The Data Flow Server uses the Fabric8 Java client library to connect to the Kubernetes cluster. We are using environment variables to set the values needed when deploying the Data Flow server to Kubernetes. We are also using the Fabric8 Spring Cloud integration with Kubernetes library to access Kubernetes ConfigMap and Secrets settings. The ConfigMap settings are specified in the src/kubernetes/server/server-config-rabbit.yaml file and the secrets are in the src/kubernetes/mysql/mysql-secrets.yaml file. If you modified the password for MySQL you should have changed it in the src/kubernetes/mysql/mysql-secrets.yaml file. Any secrets have to be provided base64 encoded.

    [Note]Note

    We are now configuring the Data Flow server with file based security and the default user is 'user' with a password of 'password'. Feel free to change this in the src/kubernetes/server/server-config-rabbit.yaml file.

    [Note]Note

    The default memory for the pods is set to 1024Mi. Update the value in the src/kubernetes/server/server-deployment.yaml file if you expect most of your apps to require more memory.

  8. Deploy the Spring Cloud Data Flow Server for Kubernetes using the Docker image and the configuration settings.

    $ kubectl create -f src/kubernetes/server/server-config-rabbit.yaml
    $ kubectl create -f src/kubernetes/server/server-svc.yaml
    $ kubectl create -f src/kubernetes/server/server-deployment.yaml

    You can use the command kubectl get all -l app=scdf-server to verify that the deployment, pod and service resources are running. Use the command kubectl delete all,cm -l app=scdf-server to clean up afterwards.

    Use the kubectl get svc scdf-server command to locate the EXTERNAL_IP address assigned to scdf-server, we will use that later to connect from the shell.

    $ kubectl get svc
    NAME         CLUSTER-IP       EXTERNAL-IP       PORT(S)    AGE
    scdf-server  10.103.246.82    130.211.203.246   80/TCP     4m

    So the URL you need to use is in this case 130.211.203.246

    If you are using Minikube then you don’t have an external load balancer and the EXTERNAL-IP will show as <pending>. You need to use the NodePort assigned for the scdf-server service. Use this command to look up the URL to use:

    $ minikube service --url scdf-server
    http://192.168.99.100:31991

1.3 Deploy Skipper

This is an optional step. Deploy Skipper if you want the added features of upgrading and rolling back Streams since Data Flow delegates to Skipper for those features.

The Deployment resource for Skipper is shown below:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: skipper
  labels:
    app: skipper
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: skipper
    spec:
      containers:
      - name: skipper
        image: springcloud/spring-cloud-skipper-server:1.0.0.BUILD-SNAPSHOT
        imagePullPolicy: Always
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 1.0
            memory: 1024Mi
          requests:
            cpu: 0.5
            memory: 640Mi
        env:
        - name: SPRING_APPLICATION_JSON
          value: "{\"spring.cloud.skipper.server.enable.local.platform\" : false, \"spring.cloud.skipper.server.platform.kubernetes.accounts.minikube.environmentVariables\" : \"SPRING_RABBITMQ_HOST=${RABBITMQ_SERVICE_HOST},SPRING_RABBITMQ_PORT=${RABBITMQ_SERVICE_PORT}\",\"spring.cloud.skipper.server.platform.kubernetes.accounts.minikube.memory\" : \"1024Mi\",\"spring.cloud.skipper.server.platform.kubernetes.accounts.minikube.createDeployment\" : true}"
[Note]Note

Skipper includes the concept of platforms, so it is important to define the "accounts" based on the project preferences. In the above YAML file, the accounts map to minikube as the platform. This can be modified, and of course, you can have any number of platform definitions. More details are in Spring Cloud Skipper reference guide.

[Note]Note

If you’d like to change the version of Skipper server, you can do so by updating the image from springcloud/spring-cloud-skipper-server:1.0.0.BUILD-SNAPSHOT to a desired docker tag.

[Note]Note

If you’d like to orchestrate stream processing pipelines with Apache Kafka as the messaging middleware, you must change the value for

"{\"spring.cloud.skipper.server.platform.kubernetes.accounts.minikube.environmentVariables\" :
\"SPRING_CLOUD_STREAM_KAFKA_BINDER_BROKERS=${KAFKA_SERVICE_HOST}:${KAFKA_SERVICE_PORT},
SPRING_CLOUD_STREAM_KAFKA_BINDER_ZK_NODES=${KAFKA_ZK_SERVICE_HOST}:${KAFKA_ZK_SERVICE_PORT}\"}"

The resource for the Skipper service is shown below:

apiVersion: v1
kind: Service
metadata:
  name: skipper
  labels:
    app: skipper
spec:
  # If you are running k8s on a local dev box or using minikube, you can use type NodePort instead
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 7577 # port used by 'skpr' (i.e., 7577)
  selector:
    app: skipper

Run the following commands to start Skipper as the companion server for Spring Cloud Data Flow:

$ kubectl create -f src/kubernetes/skipper/skipper-deployment.yaml
$ kubectl create -f src/kubernetes/skipper/skipper-svc.yaml

You can use the command kubectl get all -l app=skipper to verify that the deployment, pod and service resources are running. Use the command kubectl delete all -l app=skipper to clean up afterwards.

Use the kubectl get svc scdf-server command to locate the EXTERNAL_IP address assigned to scdf-server, we will use that later to connect from the shell.

$ kubectl get svc
NAME         CLUSTER-IP       EXTERNAL-IP       PORT(S)    AGE
skipper      10.103.246.83    130.211.203.247   80/TCP     4m

So the URL you need to use is in this case is: 130.211.203.247

If you are using Minikube then you don’t have an external load balancer and the EXTERNAL-IP will show as <pending>. You need to use the NodePort assigned for the skipper service. Use this command to look up the URL to use:

$ minikube service --url skipper
http://192.168.99.100:32060