16. Deploying on AMBARI

Ambari basically automates YARN installation instead of requiring user to do it manually. Also a lot of other configuration steps are automated as much as possible to easy overall installation process.

There is no difference on components deployed into ambari comparing of a manual usage with a separate YARN cluster. With ambari we simply package needed dataflow components into a rpm package so that it can be managed as an ambari service. After that ambari really only manage a runtime configuration of those components.

16.1 Install Ambari Server

Generally it is only needed to install scdf-plugin-hdp plugin into ambari server which adds needed service definitions.

[root@ambari-1 ~]# yum -y install ambari-server
[root@ambari-1 ~]# ambari-server setup -s
[root@ambari-1 ~]# wget -nv http://repo.spring.io/yum-snapshot-local/scdf/1.0/scdf-snapshot-1.0.repo -O /etc/yum.repos.d/scdf-snapshot-1.0.repo
[root@ambari-1 ~]# yum -y install scdf-plugin-hdp
[root@ambari-1 ~]# ambari-server start
[Note]Note

Ambari plugin only works for redhat6/redhat7 and related centos based systems for now.

16.2 Deploy Data Flow

When you create your cluste and choose a stack, make sure that redhat6 or/and redhat7 sections contains repository named SCDF-1.0 and that it points to repo.spring.io/yum-snapshot-local/scdf/1.0.

From services choose Spring Cloud Data Flow and Kafka. Hdfs, Yarn and Zookeeper are forced dependencies.

[Note]Note

With Kafka you can do "one-click" installation while using Rabbit you need to provide appropriate connection settings as Rabbit is not part of a Ambari managed service.

Then in Customize Services what is really left for user to do is to customise settings if needed. Everything else is automatically configured. Technically it also allows you to switch to use rabbit by leaving Kafka out and defining rabbit settings there. But generally use of Kafka is a good choice.

[Note]Note

We also install H2 DB as service so that it can be accessed from every node.

16.3 Using Configuration

servers.yml file is also used to store common configuration with Ambari. Settings in Advanced scdf-site and Custom scdf-site are used to dynamically create a this file which is then copied over to hdfs when needed application files are deployd.

[Important]Important

If ambari configuration is modified, you need to delete /dataflow/apps/stream/app and /dataflow/apps/task/app directories from hdfs for new settings to get applied.