17. Configuring Runtime Settings and Environment

This section describes how settings related to running YARN application can be modified.

17.1 Configuring Application Resources

Stream and task processes for application master and containers can be further tuned by setting memory and cpu settings. Also java options allow to define actual jvm options.

spring:
  cloud:
    deployer:
      yarn:
        app:
          streamappmaster:
            memory: 512m
            virtualCores: 1
            javaOpts: "-Xms512m -Xmx512m"
          streamcontainer:
            priority: 5
            memory: 256m
            virtualCores: 1
            javaOpts: "-Xms64m -Xmx256m"
          taskappmaster:
            memory: 512m
            virtualCores: 1
            javaOpts: "-Xms512m -Xmx512m"
          taskcontainer:
            priority: 10
            memory: 256m
            virtualCores: 1
            javaOpts: "-Xms64m -Xmx256m"

17.2 Configure Base Directory

Base directory where all needed files are kept defaults to /dataflow and can be changed using baseDir property.

spring:
  cloud:
    deployer:
      yarn:
        app:
          baseDir: /dataflow

17.3 Pre-populate Applications

Spring Cloud Data Flow app registration is based on URI’s with various different endpoints. As mentioned in section Chapter 18, How YARN Deployment Works all applications are first stored into hdfs before application container is launched. Server can use http, file, http and maven based uris as well direct hdfs uris.

It is possible to place these applications directly into HDFS and register application based on that URI.

17.4 Configure Logging

Logging for all components is done centrally via servers.yml file using normal Spring Boot properties.

logging:
  level:
    org.apache.hadoop: INFO
    org.springframework.yarn: INFO

17.5 Global YARN Memory Settings

YARN Nodemanager is continously tracking how much memory is used by individual YARN containers. If containers are using more memory than what the configuration allows, containers are simply killed by a Nodemanager. Application master controlling the app lifecycle is given a little more freedom meaning that Nodemanager is not that aggressive when making a desicion when a container should be killed.

[Important]Important

These are global cluster settings and cannot be changed during an application deployment.

Lets take a quick look of memory related settings in YARN cluster and in YARN applications. Below xml config is what a default vanilla Apache Hadoop uses for memory related settings. Other distributions may have different defaults.

yarn.nodemanager.pmem-check-enabled
Enables a check for physical memory of a process. This check if enabled is directly tracking amount of memory requested for a YARN container.
yarn.nodemanager.vmem-check-enabled
Enables a check for virtual memory of a process. This setting is one which is usually causing containers of a custom YARN applications to get killed by a node manager. Usually the actual ratio between physical and virtual memory is higher than a default 2.1 or bugs in a OS is causing wrong calculation of a used virtual memory.
yarn.nodemanager.vmem-pmem-ratio
Defines a ratio of allowed virtual memory compared to physical memory. This ratio simply defines how much virtual memory a process can use but the actual tracked size is always calculated from a physical memory limit.
yarn.scheduler.minimum-allocation-mb

Defines a minimum allocated memory for container.

[Note]Note

This setting also indirectly defines what is the actual physical memory limit requested during a container allocation. Actual physical memory limit is always going to be multiple of this setting rounded to upper bound. For example if this setting is left to default 1024 and container is requested with 512M, 1024M is going to be used. However if requested size is 1100M, actual size is set to 2048M.

yarn.scheduler.maximum-allocation-mb
Defines a maximum allocated memory for container.
yarn.nodemanager.resource.memory-mb
Defines how much memory a node controlled by a node manager is allowed to allocate. This setting should be set to amount of which OS is able give to YARN managed processes in a way which doesn’t cause OS to swap, etc.