5. Deploying Streams on Mesos and Marathon

In this getting started guide, the Data Flow Server is run as a standalone application outside the Mesos cluster. A future version will provide support for the Data Flow Server itself to run on Mesos.

  1. Deploy a Mesos and Marathon cluster.

    The Mesosphere getting started guide provides a number of options for you to deploy a cluster. Many of the options listed there need some additional work to get going. For example, many Vagrant provisioned VMs are using deprecated versions of the Docker client. We have included some brief instructions for setting up a single-node cluster with Vagrant in Appendix A, Test Cluster. In addition to this we have also used the Playa Mesos Vagrant setup. For those that want to setup a distributed cluster quickly, there is also an option to spin up a cluster on AWS using Mesosphere’s Datacenter Operation System on Amazon Web Services.

    The rest of this getting started guide assumes that you have a working Mesos and Marathon cluster and know the Marathon endpoint URL.

  2. Create a Rabbit MQ service on the Mesos cluster.

    The rabbitmq service will be used for messaging between applications in the stream. There is a sample application JSON file for Rabbit MQ in the spring-cloud-dataflow-server-mesos repository that you can use as a starting point. The service discovery mechanism is currently disabled so you need to look up the host and port to use for the connection. Depending on how large your cluster is, you way want to tweek the CPU and/or memory values.

    Using the above JSON file and an Mesos and Marathon cluster installed you can deploy a Rabbit MQ application instance by issuing the following command

    curl -X POST http://192.168.33.10:8080/v2/apps -d @rabbitmq.json -H "Content-type: application/json"

    Note the @ symbol to reference a file and that we are using the Marathon endpoint URL of 192.168.33.10:8080. Your endpoint might be different based on the configuration used for your installation of Mesos and Marathon. Using the Marathon and Mesos UIs you can verify that rabbitmq service is running on the cluster.

  3. Download the Spring Cloud Data Flow Server for Mesos and Marathon.

    $ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-server-mesos/1.0.0.RC1/spring-cloud-dataflow-server-mesos-1.0.0.RC1.jar
  4. Using the Marathon GUI, look up the host and port for the rabbitmq application. In our case it was 192.168.33.10:31916. For the deployed apps to be able to connect to Rabbit MQ we need to provide the following property when we start the server:

    --spring.cloud.deployer.mesos.marathon.environmentVariables='SPRING_RABBITMQ_HOST=192.168.33.10,SPRING_RABBITMQ_PORT=31916'
  5. Now, run the Spring Cloud Data Flow Server for Mesos and Marathon passing in this host/port configuration.

    $ java -jar spring-cloud-dataflow-server-mesos-1.0.0.RC1.jar --spring.cloud.deployer.mesos.marathon.apiEndpoint=http://192.168.33.10:8080 --spring.cloud.deployer.mesos.marathon.memory=768 --spring.cloud.deployer.mesos.marathon.environmentVariables='SPRING_RABBITMQ_HOST=192.168.33.10,SPRING_RABBITMQ_PORT=31916'

    You can pass in properties to set default values for memory and cpu resource request. For example --spring.cloud.deployer.mesos.marathon.memory=768 will by default allocate additional memory for the application vs. the default value of 512. You can see all the available options in the MarathonAppDeployerProperties.java file.

  6. Download and run the Spring Cloud Data Flow shell.

    $ wget http://repo.spring.io/milestone/org/springframework/cloud/spring-cloud-dataflow-shell/1.0.0.RC1/spring-cloud-dataflow-shell-1.0.0.RC1.jar
    
    $ java -jar spring-cloud-dataflow-shell-1.0.0.RC1.jar
  7. By default, the application registry will be empty. If you would like to register all out-of-the-box stream applications built with the RabbitMQ binder in bulk, you can with the following command. For more details, review how to register applications.

    dataflow:>app import --uri http://bit.ly/stream-applications-rabbit-docker
  8. Deploy a simple stream in the shell

    [Note]Note

    If you need to specify any of the app specific configuration properties then you must use "long-form" of them including the app specific prefix like --jdbc.tableName=TEST_DATA. This is due to the server not being able to access the metadata for the Docker based starter apps. You will also not see the configuration properties listed when using the app info command or in the Dashboard GUI.

    dataflow:>stream create --name ticktock --definition "time | log" --deploy

    In the Mesos UI you can then look at the logs for the log sink.

    2016-04-26 18:13:03.001  INFO 1 --- [           main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 8080 (http)
    2016-04-26 18:13:03.004  INFO 1 --- [           main] o.s.c.s.a.l.s.r.LogSinkRabbitApplication : Started LogSinkRabbitApplication in 7.766 seconds (JVM running for 8.24)
    2016-04-26 18:13:54.443  INFO 1 --- [nio-8080-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring FrameworkServlet 'dispatcherServlet'
    2016-04-26 18:13:54.445  INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization started
    2016-04-26 18:13:54.459  INFO 1 --- [nio-8080-exec-1] o.s.web.servlet.DispatcherServlet        : FrameworkServlet 'dispatcherServlet': initialization completed in 14 ms
    2016-04-26 18:14:09.088  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:09
    2016-04-26 18:14:10.077  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:10
    2016-04-26 18:14:11.080  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:11
    2016-04-26 18:14:12.083  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:12
    2016-04-26 18:14:13.090  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:13
    2016-04-26 18:14:14.091  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:14
    2016-04-26 18:14:15.093  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:15
    2016-04-26 18:14:16.095  INFO 1 --- [time.ticktock-1] log.sink                                 : 04/26/16 18:14:16
  9. Destroy the stream

    dataflow:>stream destroy --name ticktock