18. How YARN Deployment Works

When YARN application is deployed into a YARN cluster it consists of two parts, Application Master and Containers. Application master is a control program responsible of handling applications lifecycle and allocation of containers. Containers are then where a real heavy lifting is done. In case of a stream there is always minimum of 3 containers, one for application master, one for sink and one for source. When running tasks there is always one application master and one container running a particular task.

Needed application files are pushed into hdfs automatically when needed. After stream and task is used once hdfs directory structure would like like shown above.

/dataflow/apps
/dataflow/apps/stream
/dataflow/apps/stream/app
/dataflow/apps/stream/app/application.properties
/dataflow/apps/stream/app/servers.yml
/dataflow/apps/stream/app/spring-cloud-deployer-yarn-appdeployerappmaster-1.0.0.BUILD-SNAPSHOT.jar
/dataflow/apps/task
/dataflow/apps/task/app
/dataflow/apps/task/app/application.properties
/dataflow/apps/task/app/servers.yml
/dataflow/apps/task/app/spring-cloud-deployer-yarn-tasklauncherappmaster-1.0.0.BUILD-SNAPSHOT.jar
[Note]Note

/dataflow/apps can deleted in case application version is changed or configuration related to servers.yml is modified. Once created these files are not overridden.

Application artifacts are cached under /dataflow/artifacts/cache directory.

/dataflow/artifacts
/dataflow/artifacts/cache
/dataflow/artifacts/cache/hdfs-sink-rabbit-1.0.0.RC1.jar
/dataflow/artifacts/cache/time-source-rabbit-1.0.0.RC1.jar
/dataflow/artifacts/cache/timestamp-task-1.0.0.RC1.jar
[Important]Important

Artifact caching is happening on two levels, firstly on a local disk where server is running, and secondly in a hdfs cache directory. If working with snapshots or own development, it may be required to wipe out /dataflow/artifacts/cache directory and do a server restart.