When YARN application is deployed into a YARN cluster it consists of two parts, Application Master and Containers. Application master is a control program responsible of handling applications lifecycle and allocation of containers. Containers are then where a real heavy lifting is done. In case of a stream there is always minimum of 3 containers, one for application master, one for sink and one for source. When running tasks there is always one application master and one container running a particular task.
Needed application files are pushed into hdfs automatically when needed. After stream and task is used once hdfs directory structure would like like shown above.
/dataflow/apps /dataflow/apps/stream /dataflow/apps/stream/app /dataflow/apps/stream/app/application.properties /dataflow/apps/stream/app/servers.yml /dataflow/apps/stream/app/spring-cloud-deployer-yarn-appdeployerappmaster-1.0.0.BUILD-SNAPSHOT.jar /dataflow/apps/task /dataflow/apps/task/app /dataflow/apps/task/app/application.properties /dataflow/apps/task/app/servers.yml /dataflow/apps/task/app/spring-cloud-deployer-yarn-tasklauncherappmaster-1.0.0.BUILD-SNAPSHOT.jar
Note | |
---|---|
|
Application artifacts are cached under /dataflow/artifacts/cache
directory.
/dataflow/artifacts /dataflow/artifacts/cache /dataflow/artifacts/cache/hdfs-sink-rabbit-1.0.0.RC1.jar /dataflow/artifacts/cache/time-source-rabbit-1.0.0.RC1.jar /dataflow/artifacts/cache/timestamp-task-1.0.0.RC1.jar