This section describes how settings related to running YARN application can be modified.
Stream and task processes for application master and containers can be further tuned by setting memory and cpu settings. Also java options allow to define actual jvm options.
spring: cloud: deployer: yarn: app: streamappmaster: memory: 512m virtualCores: 1 javaOpts: "-Xms512m -Xmx512m" streamcontainer: priority: 5 memory: 256m virtualCores: 1 javaOpts: "-Xms64m -Xmx256m" taskappmaster: memory: 512m virtualCores: 1 javaOpts: "-Xms512m -Xmx512m" taskcontainer: priority: 10 memory: 256m virtualCores: 1 javaOpts: "-Xms64m -Xmx256m"
Base directory where all needed files are kept defaults to /dataflow
and can be changed using baseDir
property.
spring: cloud: deployer: yarn: app: baseDir: /dataflow
Spring Cloud Data Flow app registration is based on URI’s with various
different endpoints. As mentioned in section Chapter 18, How YARN Deployment Works all
applications are first stored into hdfs before application container
is launched. Server can use http
, file
, http
and maven
based
uris as well direct hdfs
uris.
It is possible to place these applications directly into HDFS and register application based on that URI.
Logging for all components is done centrally via servers.yml
file
using normal Spring Boot properties.
logging: level: org.apache.hadoop: INFO org.springframework.yarn: INFO
YARN Nodemanager is continously tracking how much memory is used by individual YARN containers. If containers are using more memory than what the configuration allows, containers are simply killed by a Nodemanager. Application master controlling the app lifecycle is given a little more freedom meaning that Nodemanager is not that aggressive when making a desicion when a container should be killed.
Important | |
---|---|
These are global cluster settings and cannot be changed during an application deployment. |
Lets take a quick look of memory related settings in YARN cluster and in YARN applications. Below xml config is what a default vanilla Apache Hadoop uses for memory related settings. Other distributions may have different defaults.
2.1
or bugs in
a OS is causing wrong calculation of a used virtual memory.Defines a minimum allocated memory for container.
Note | |
---|---|
This setting also indirectly defines what is the actual physical
memory limit requested during a container allocation. Actual physical
memory limit is always going to be multiple of this setting rounded to
upper bound. For example if this setting is left to default |