Factory to create client IPC classes.
yarn.ipc.client.factory.class
Factory to create server IPC classes.
yarn.ipc.server.factory.class
Factory to create serializeable records.
yarn.ipc.record.factory.class
RPC class implementation
yarn.ipc.rpc.class
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
The hostname of the RM.
yarn.resourcemanager.hostname
0.0.0.0
The address of the applications manager interface in the RM.
yarn.resourcemanager.address
${yarn.resourcemanager.hostname}:8032
The actual address the server will bind to. If this optional address is
set, the RPC and webapp servers will bind to this address and the port specified in
yarn.resourcemanager.address and yarn.resourcemanager.webapp.address, respectively. This
is most useful for making RM listen to all interfaces by setting to 0.0.0.0.
yarn.resourcemanager.bind-host
If set to true, then ALL container updates will be automatically sent to
the NM in the next heartbeat
yarn.resourcemanager.auto-update.containers
false
The number of threads used to handle applications manager requests.
yarn.resourcemanager.client.thread-count
50
Number of threads used to launch/cleanup AM.
yarn.resourcemanager.amlauncher.thread-count
50
Retry times to connect with NM.
yarn.resourcemanager.nodemanager-connect-retries
10
Timeout in milliseconds when YARN dispatcher tries to drain the
events. Typically, this happens when service is stopping. e.g. RM drains
the ATS events dispatcher when stopping.
yarn.dispatcher.drain-events.timeout
300000
The threshold used to trigger the logging of event types
and counts in RM's main event dispatcher. Default length is 5000,
which means RM will print events info when the queue size cumulatively
reaches 5000 every time. Such info can be used to reveal what kind of events
that RM is stuck at processing mostly, it can help to
narrow down certain performance issues.
yarn.dispatcher.print-events-info.threshold
5000
Resource manager dispatcher thread cpu monitor sampling rate.
Units are samples per minute. This controls how often to sample
the cpu utilization of the resource manager dispatcher thread.
The cpu utilization is displayed on the RM UI as scheduler busy %.
Set this to zero to disable the dispatcher thread monitor. Defaults
to 60 samples per minute.
yarn.dispatcher.cpu-monitor.samples-per-min
60
The expiry interval for application master reporting.
yarn.am.liveness-monitor.expiry-interval-ms
600000
The Kerberos principal for the resource manager.
yarn.resourcemanager.principal
The address of the scheduler interface.
yarn.resourcemanager.scheduler.address
${yarn.resourcemanager.hostname}:8030
Number of threads to handle scheduler interface.
yarn.resourcemanager.scheduler.client.thread-count
50
Specify which handler will be used to process PlacementConstraints.
Acceptable values are: `placement-processor`, `scheduler` and `disabled`.
For a detailed explanation of these values, please refer to documentation.
yarn.resourcemanager.placement-constraints.handler
disabled
Number of times to retry placing of rejected SchedulingRequests
yarn.resourcemanager.placement-constraints.retry-attempts
3
Constraint Placement Algorithm to be used.
yarn.resourcemanager.placement-constraints.algorithm.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.algorithm.DefaultPlacementAlgorithm
Placement Algorithm Requests Iterator to be used.
yarn.resourcemanager.placement-constraints.algorithm.iterator
SERIAL
Threadpool size for the Algorithm used for placement constraint processing.
yarn.resourcemanager.placement-constraints.algorithm.pool-size
1
Threadpool size for the Scheduler invocation phase of placement constraint processing.
yarn.resourcemanager.placement-constraints.scheduler.pool-size
1
Comma separated class names of ApplicationMasterServiceProcessor
implementations. The processors will be applied in the order
they are specified.
yarn.resourcemanager.application-master-service.processors
This configures the HTTP endpoint for YARN Daemons.The following
values are supported:
- HTTP_ONLY : Service is provided only on http
- HTTPS_ONLY : Service is provided only on https
yarn.http.policy
HTTP_ONLY
The http address of the RM web application.
If only a host is provided as the value,
the webapp will be served on a random port.
yarn.resourcemanager.webapp.address
${yarn.resourcemanager.hostname}:8088
The https address of the RM web application.
If only a host is provided as the value,
the webapp will be served on a random port.
yarn.resourcemanager.webapp.https.address
${yarn.resourcemanager.hostname}:8090
The Kerberos keytab file to be used for spnego filter for the RM web
interface.
yarn.resourcemanager.webapp.spnego-keytab-file
The Kerberos principal to be used for spnego filter for the RM web
interface.
yarn.resourcemanager.webapp.spnego-principal
Add button to kill application in the RM Application view.
yarn.resourcemanager.webapp.ui-actions.enabled
true
To enable RM web ui2 application.
yarn.webapp.ui2.enable
false
Enable tools section in all ui1 webapp.
yarn.webapp.ui1.tools.enable
true
Explicitly provide WAR file path for ui2 if needed.
yarn.webapp.ui2.war-file-path
Enable services rest api on ResourceManager.
yarn.webapp.api-service.enable
false
yarn.resourcemanager.resource-tracker.address
${yarn.resourcemanager.hostname}:8031
yarn.resourcemanager.resource-tracker.nm.ip-hostname-check
false
Are acls enabled.
yarn.acl.enable
false
Are reservation acls enabled.
yarn.acl.reservation-enable
false
ACL of who can be admin of the YARN cluster.
yarn.admin.acl
*
The address of the RM admin interface.
yarn.resourcemanager.admin.address
${yarn.resourcemanager.hostname}:8033
Number of threads used to handle RM admin interface.
yarn.resourcemanager.admin.client.thread-count
1
Maximum time to wait to establish connection to
ResourceManager.
yarn.resourcemanager.connect.max-wait.ms
900000
How often to try connecting to the
ResourceManager.
yarn.resourcemanager.connect.retry-interval.ms
30000
The default maximum number of application attempts, if unset by
the user. Each application master can specify its individual maximum number of application
attempts via the API, but the individual number cannot be more than the global upper bound in
yarn.resourcemanager.am.global.max-attempts. The default number is set to 2, to
allow at least one retry for AM.
yarn.resourcemanager.am.max-attempts
2
How often to check that containers are still alive.
yarn.resourcemanager.container.liveness-monitor.interval-ms
600000
The keytab for the resource manager.
yarn.resourcemanager.keytab
/etc/krb5.keytab
Flag to enable override of the default kerberos authentication
filter with the RM authentication filter to allow authentication using
delegation tokens(fallback to kerberos if the tokens are missing). Only
applicable when the http authentication type is kerberos.
yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled
true
Flag to enable cross-origin (CORS) support in the RM. This flag
requires the CORS filter initializer to be added to the filter initializers
list in core-site.xml.
yarn.resourcemanager.webapp.cross-origin.enabled
false
How long to wait until a node manager is considered dead.
yarn.nm.liveness-monitor.expiry-interval-ms
600000
Path to file with nodes to include.
yarn.resourcemanager.nodes.include-path
Path to file with nodes to exclude.
yarn.resourcemanager.nodes.exclude-path
The expiry interval for node IP caching. -1 disables the caching
yarn.resourcemanager.node-ip-cache.expiry-interval-secs
-1
Number of threads to handle resource tracker calls.
yarn.resourcemanager.resource-tracker.client.thread-count
50
The class to use as the resource scheduler.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
The minimum allocation for every container request at the RM
in MBs. Memory requests lower than this will be set to the value of this
property. Additionally, a node manager that is configured to have less memory
than this value will be shut down by the resource manager.
yarn.scheduler.minimum-allocation-mb
1024
The maximum allocation for every container request at the RM
in MBs. Memory requests higher than this will throw an
InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-mb
8192
The minimum allocation for every container request at the RM
in terms of virtual CPU cores. Requests lower than this will be set to the
value of this property. Additionally, a node manager that is configured to
have fewer virtual cores than this value will be shut down by the resource
manager.
yarn.scheduler.minimum-allocation-vcores
1
The maximum allocation for every container request at the RM
in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-vcores
4
Used by node labels. If set to true, the port should be included in the
node name. Only usable if your scheduler supports node labels.
yarn.scheduler.include-port-in-node-name
false
Enable RM to recover state after starting. If true, then
yarn.resourcemanager.store.class must be specified.
yarn.resourcemanager.recovery.enabled
false
Should RM fail fast if it encounters any errors. By defalt, it
points to ${yarn.fail-fast}. Errors include:
1) exceptions when state-store write/read operations fails.
yarn.resourcemanager.fail-fast
${yarn.fail-fast}
Should YARN fail fast if it encounters any errors.
This is a global config for all other components including RM,NM etc.
If no value is set for component-specific config (e.g yarn.resourcemanager.fail-fast),
this value will be the default.
yarn.fail-fast
false
Enable RM work preserving recovery. This configuration is private
to YARN for experimenting the feature.
yarn.resourcemanager.work-preserving-recovery.enabled
true
Set the amount of time RM waits before allocating new
containers on work-preserving-recovery. Such wait period gives RM a chance
to settle down resyncing with NMs in the cluster on recovery, before assigning
new containers to applications.
yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms
10000
The class to use as the persistent store.
If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
is used, the store is implicitly fenced; meaning a single ResourceManager
is able to use the store at any point in time. More details on this
implicit fencing, along with setting up appropriate ACLs is discussed
under yarn.resourcemanager.zk-state-store.root-node.acl.
yarn.resourcemanager.store.class
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
When automatic failover is enabled, number of zookeeper
operation retry times in ActiveStandbyElector
yarn.resourcemanager.ha.failover-controller.active-standby-elector.zk.retries
The maximum number of completed applications RM state
store keeps, less than or equals to ${yarn.resourcemanager.max-completed-applications}.
By default, it equals to ${yarn.resourcemanager.max-completed-applications}.
This ensures that the applications kept in the state store are consistent with
the applications remembered in RM memory.
Any values larger than ${yarn.resourcemanager.max-completed-applications} will
be reset to ${yarn.resourcemanager.max-completed-applications}.
Note that this value impacts the RM recovery performance. Typically,
a smaller value indicates better performance on RM recovery.
yarn.resourcemanager.state-store.max-completed-applications
${yarn.resourcemanager.max-completed-applications}
Full path of the ZooKeeper znode where RM state will be
stored. This must be supplied when using
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
as the value for yarn.resourcemanager.store.class
yarn.resourcemanager.zk-state-store.parent-path
/rmstore
ACLs to be used for the root znode when using ZKRMStateStore in an HA
scenario for fencing.
ZKRMStateStore supports implicit fencing to allow a single
ResourceManager write-access to the store. For fencing, the
ResourceManagers in the cluster share read-write-admin privileges on the
root node, but the Active ResourceManager claims exclusive create-delete
permissions.
By default, when this property is not set, we use the ACLs from
yarn.resourcemanager.zk-acl for shared admin access and
rm-address:random-number for username-based exclusive create-delete
access.
This property allows users to set ACLs of their choice instead of using
the default mechanism. For fencing to work, the ACLs should be
carefully set differently on each ResourceManager such that all the
ResourceManagers have shared admin access and the Active ResourceManager
takes over (exclusively) the create-delete access.
yarn.resourcemanager.zk-state-store.root-node.acl
URI pointing to the location of the FileSystem path where
RM state will be stored. This must be supplied when using
org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
as the value for yarn.resourcemanager.store.class
yarn.resourcemanager.fs.state-store.uri
${hadoop.tmp.dir}/yarn/system/rmstore
the number of retries to recover from IOException in
FileSystemRMStateStore.
yarn.resourcemanager.fs.state-store.num-retries
0
Retry interval in milliseconds in FileSystemRMStateStore.
yarn.resourcemanager.fs.state-store.retry-interval-ms
1000
Local path where the RM state will be stored when using
org.apache.hadoop.yarn.server.resourcemanager.recovery.LeveldbRMStateStore
as the value for yarn.resourcemanager.store.class
yarn.resourcemanager.leveldb-state-store.path
${hadoop.tmp.dir}/yarn/system/rmstore
The time in seconds between full compactions of the leveldb
database. Setting the interval to zero disables the full compaction
cycles.
yarn.resourcemanager.leveldb-state-store.compaction-interval-secs
3600
Enable RM high-availability. When enabled,
(1) The RM starts in the Standby mode by default, and transitions to
the Active mode when prompted to.
(2) The nodes in the RM ensemble are listed in
yarn.resourcemanager.ha.rm-ids
(3) The id of each RM either comes from yarn.resourcemanager.ha.id
if yarn.resourcemanager.ha.id is explicitly specified or can be
figured out by matching yarn.resourcemanager.address.{id} with local address
(4) The actual physical addresses come from the configs of the pattern
- {rpc-config}.{id}
yarn.resourcemanager.ha.enabled
false
Enable automatic failover.
By default, it is enabled only when HA is enabled
yarn.resourcemanager.ha.automatic-failover.enabled
true
Enable embedded automatic failover.
By default, it is enabled only when HA is enabled.
The embedded elector relies on the RM state store to handle fencing,
and is primarily intended to be used in conjunction with ZKRMStateStore.
yarn.resourcemanager.ha.automatic-failover.embedded
true
The base znode path to use for storing leader information,
when using ZooKeeper based leader election.
yarn.resourcemanager.ha.automatic-failover.zk-base-path
/yarn-leader-election
Index at which last section of application id (with each section
separated by _ in application id) will be split so that application znode
stored in zookeeper RM state store will be stored as two different znodes
(parent-child). Split is done from the end.
For instance, with no split, appid znode will be of the form
application_1352994193343_0001. If the value of this config is 1, the
appid znode will be broken into two parts application_1352994193343_000
and 1 respectively with former being the parent node.
application_1352994193343_0002 will then be stored as 2 under the parent
node application_1352994193343_000. This config can take values from 0 to 4.
0 means there will be no split. If configuration value is outside this
range, it will be treated as config value of 0(i.e. no split). A value
larger than 0 (up to 4) should be configured if you are storing a large number
of apps in ZK based RM state store and state store operations are failing due to
LenError in Zookeeper.
yarn.resourcemanager.zk-appid-node.split-index
0
Index at which the RM Delegation Token ids will be split so
that the delegation token znodes stored in the zookeeper RM state store
will be stored as two different znodes (parent-child). The split is done
from the end. For instance, with no split, a delegation token znode will
be of the form RMDelegationToken_123456789. If the value of this config is
1, the delegation token znode will be broken into two parts:
RMDelegationToken_12345678 and 9 respectively with former being the parent
node. This config can take values from 0 to 4. 0 means there will be no
split. If the value is outside this range, it will be treated as 0 (i.e.
no split). A value larger than 0 (up to 4) should be configured if you are
running a large number of applications, with long-lived delegation tokens
and state store operations (e.g. failover) are failing due to LenError in
Zookeeper.
yarn.resourcemanager.zk-delegation-token-node.split-index
0
Specifies the maximum size of the data that can be stored
in a znode. Value should be same or less than jute.maxbuffer configured
in zookeeper. Default value configured is 1MB.
yarn.resourcemanager.zk-max-znode-size.bytes
1048576
Name of the cluster. In a HA setting,
this is used to ensure the RM participates in leader
election for this cluster and ensures it does not affect
other clusters
yarn.resourcemanager.cluster-id
The range of values above base epoch that the RM will use before
wrapping around
yarn.resourcemanager.epoch.range
0
The list of RM nodes in the cluster when HA is
enabled. See description of yarn.resourcemanager.ha
.enabled for full details on how this is used.
yarn.resourcemanager.ha.rm-ids
The id (string) of the current RM. When HA is enabled, this
is an optional config. The id of current RM can be set by explicitly
specifying yarn.resourcemanager.ha.id or figured out by matching
yarn.resourcemanager.address.{id} with local address
See description of yarn.resourcemanager.ha.enabled
for full details on how this is used.
yarn.resourcemanager.ha.id
When HA is enabled, the class to be used by Clients, AMs and
NMs to failover to the Active RM. It should extend
org.apache.hadoop.yarn.client.RMFailoverProxyProvider
yarn.client.failover-proxy-provider
org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider
When HA is not enabled, the class to be used by Clients, AMs and
NMs to retry connecting to the Active RM. It should extend
org.apache.hadoop.yarn.client.RMFailoverProxyProvider
yarn.client.failover-no-ha-proxy-provider
org.apache.hadoop.yarn.client.DefaultNoHARMFailoverProxyProvider
When HA is enabled, the max number of times
FailoverProxyProvider should attempt failover. When set,
this overrides the yarn.resourcemanager.connect.max-wait.ms. When
not set, this is inferred from
yarn.resourcemanager.connect.max-wait.ms.
yarn.client.failover-max-attempts
When HA is enabled, the sleep base (in milliseconds) to be
used for calculating the exponential delay between failovers. When set,
this overrides the yarn.resourcemanager.connect.* settings. When
not set, yarn.resourcemanager.connect.retry-interval.ms is used instead.
yarn.client.failover-sleep-base-ms
When HA is enabled, the maximum sleep time (in milliseconds)
between failovers. When set, this overrides the
yarn.resourcemanager.connect.* settings. When not set,
yarn.resourcemanager.connect.retry-interval.ms is used instead.
yarn.client.failover-sleep-max-ms
When HA is enabled, the number of retries per
attempt to connect to a ResourceManager. In other words,
it is the ipc.client.connect.max.retries to be used during
failover attempts
yarn.client.failover-retries
0
When HA is enabled, the number of retries per
attempt to connect to a ResourceManager on socket timeouts. In other
words, it is the ipc.client.connect.max.retries.on.timeouts to be used
during failover attempts
yarn.client.failover-retries-on-socket-timeouts
0
The maximum number of completed applications RM keeps.
yarn.resourcemanager.max-completed-applications
1000
Interval at which the delayed token removal thread runs
yarn.resourcemanager.delayed.delegation-token.removal-interval-ms
30000
Maximum size in bytes for configurations that can be provided
by application to RM for delegation token renewal.
By experiment, its roughly 128 bytes per key-value pair.
The default value 12800 allows roughly 100 configs, may be less.
yarn.resourcemanager.delegation-token.max-conf-size-bytes
12800
If true, ResourceManager will always try to cancel delegation
tokens after the application completes, even if the client sets
shouldCancelAtEnd false. References to delegation tokens are tracked,
so they will not be canceled until all sub-tasks are done using them.
yarn.resourcemanager.delegation-token.always-cancel
false
If true, ResourceManager will have proxy-user privileges.
Use case: In a secure cluster, YARN requires the user hdfs delegation-tokens to
do localization and log-aggregation on behalf of the user. If this is set to true,
ResourceManager is able to request new hdfs delegation tokens on behalf of
the user. This is needed by long-running-service, because the hdfs tokens
will eventually expire and YARN requires new valid tokens to do localization
and log-aggregation. Note that to enable this use case, the corresponding
HDFS NameNode has to configure ResourceManager as the proxy-user so that
ResourceManager can itself ask for new tokens on behalf of the user when
tokens are past their max-life-time.
yarn.resourcemanager.proxy-user-privileges.enabled
false
Interval for the roll over for the master key used to generate
application tokens
yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs
86400
Interval for the roll over for the master key used to generate
container tokens. It is expected to be much greater than
yarn.nm.liveness-monitor.expiry-interval-ms and
yarn.resourcemanager.rm.container-allocation.expiry-interval-ms. Otherwise the
behavior is undefined.
yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs
86400
The heart-beat interval in milliseconds for every NodeManager in the cluster.
yarn.resourcemanager.nodemanagers.heartbeat-interval-ms
1000
Enables heart-beat interval scaling. The NodeManager
heart-beat interval will scale based on the difference between the CPU
utilization on the node and the cluster-wide average CPU utilization.
yarn.resourcemanager.nodemanagers.heartbeat-interval-scaling-enable
false
If heart-beat interval scaling is enabled, this is the
minimum heart-beat interval in milliseconds
yarn.resourcemanager.nodemanagers.heartbeat-interval-min-ms
1000
If heart-beat interval scaling is enabled, this is the
maximum heart-beat interval in milliseconds
yarn.resourcemanager.nodemanagers.heartbeat-interval-max-ms
1000
If heart-beat interval scaling is enabled, this controls
the degree of adjustment when speeding up heartbeat intervals.
At 1.0, 20% less than average CPU utilization will result in a 20%
decrease in heartbeat interval.
yarn.resourcemanager.nodemanagers.heartbeat-interval-speedup-factor
1.0
If heart-beat interval scaling is enabled, this controls
the degree of adjustment when slowing down heartbeat intervals.
At 1.0, 20% greater than average CPU utilization will result in a 20%
increase in heartbeat interval.
yarn.resourcemanager.nodemanagers.heartbeat-interval-slowdown-factor
1.0
The Number of consecutive missed heartbeats after which node will be
skipped from scheduling
yarn.scheduler.skip.node.multiplier
2
The minimum allowed version of a connecting nodemanager. The valid values are
NONE (no version checking), EqualToRM (the nodemanager's version is equal to
or greater than the RM version), or a Version String.
yarn.resourcemanager.nodemanager.minimum.version
NONE
Enable a set of periodic monitors (specified in
yarn.resourcemanager.scheduler.monitor.policies) that affect the
scheduler.
yarn.resourcemanager.scheduler.monitor.enable
false
The list of SchedulingEditPolicy classes that interact with
the scheduler. A particular module may be incompatible with the
scheduler, other policies, or a configuration of either.
yarn.resourcemanager.scheduler.monitor.policies
org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
The class to use as the configuration provider.
If org.apache.hadoop.yarn.LocalConfigurationProvider is used,
the local configuration will be loaded.
If org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider is used,
the configuration which will be loaded should be uploaded to remote File system first.
yarn.resourcemanager.configuration.provider-class
org.apache.hadoop.yarn.LocalConfigurationProvider
The value specifies the file system (e.g. HDFS) path where ResourceManager
loads configuration if yarn.resourcemanager.configuration.provider-class
is set to org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider.
yarn.resourcemanager.configuration.file-system-based-store
/yarn/conf
The setting that controls whether yarn system metrics is
published to the Timeline server (version one) or not, by RM.
This configuration is now deprecated in favor of
yarn.system-metrics-publisher.enabled.
yarn.resourcemanager.system-metrics-publisher.enabled
false
The setting that controls whether yarn system metrics is
published on the Timeline service or not by RM And NM.
yarn.system-metrics-publisher.enabled
false
The setting that controls whether yarn container events are
published to the timeline service or not by RM. This configuration setting
is for ATS V2.
yarn.rm.system-metrics-publisher.emit-container-events
false
Number of worker threads that send the yarn system metrics
data.
yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size
10
This setting enables/disables timeline server v1 publisher to publish timeline events in batch.
yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.enable-batch
false
The size of timeline server v1 publisher sending events in one request.
yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.batch-size
1000
When enable batch publishing in timeline server v1, we must avoid that the
publisher waits for a batch to be filled up and hold events in buffer for long
time. So we add another thread which send event's in the buffer periodically.
This config sets the interval of the cyclical sending thread.
yarn.resourcemanager.system-metrics-publisher.timeline-server-v1.interval-seconds
60
Number of diagnostics/failure messages can be saved in RM for
log aggregation. It also defines the number of diagnostics/failure
messages can be shown in log aggregation web ui.
yarn.resourcemanager.max-log-aggregation-diagnostics-in-memory
10
RM DelegationTokenRenewer thread count
yarn.resourcemanager.delegation-token-renewer.thread-count
50
RM secret key update interval in ms
yarn.resourcemanager.delegation.key.update-interval
86400000
RM delegation token maximum lifetime in ms
yarn.resourcemanager.delegation.token.max-lifetime
604800000
RM delegation token update interval in ms
yarn.resourcemanager.delegation.token.renew-interval
86400000
RM DelegationTokenRenewer thread timeout
yarn.resourcemanager.delegation-token-renewer.thread-timeout
60s
Default maximum number of retries for each RM DelegationTokenRenewer thread
yarn.resourcemanager.delegation-token-renewer.thread-retry-max-attempts
10
Time interval between each RM DelegationTokenRenewer thread retry attempt
yarn.resourcemanager.delegation-token-renewer.thread-retry-interval
60s
Thread pool size for RMApplicationHistoryWriter.
yarn.resourcemanager.history-writer.multi-threaded-dispatcher.pool-size
10
Comma-separated list of values (in minutes) for schedule queue related
metrics.
yarn.resourcemanager.metrics.runtime.buckets
60,300,1440
Interval for the roll over for the master key used to generate
NodeManager tokens. It is expected to be set to a value much larger
than yarn.nm.liveness-monitor.expiry-interval-ms.
yarn.resourcemanager.nm-tokens.master-key-rolling-interval-secs
86400
Flag to enable the ResourceManager reservation system.
yarn.resourcemanager.reservation-system.enable
false
The Java class to use as the ResourceManager reservation system.
By default, is set to
org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacityReservationSystem
when using CapacityScheduler and is set to
org.apache.hadoop.yarn.server.resourcemanager.reservation.FairReservationSystem
when using FairScheduler.
yarn.resourcemanager.reservation-system.class
The plan follower policy class name to use for the ResourceManager
reservation system.
By default, is set to
org.apache.hadoop.yarn.server.resourcemanager.reservation.CapacitySchedulerPlanFollower
is used when using CapacityScheduler, and is set to
org.apache.hadoop.yarn.server.resourcemanager.reservation.FairSchedulerPlanFollower
when using FairScheduler.
yarn.resourcemanager.reservation-system.plan.follower
Step size of the reservation system in ms
yarn.resourcemanager.reservation-system.planfollower.time-step
1000
The expiry interval for a container
yarn.resourcemanager.rm.container-allocation.expiry-interval-ms
600000
Flag to enable/disable resource profiles
yarn.resourcemanager.resource-profiles.enabled
false
If resource profiles is enabled, source file for the profiles
yarn.resourcemanager.resource-profiles.source-file
resource-profiles.json
The hostname of the NM.
yarn.nodemanager.hostname
0.0.0.0
The address of the container manager in the NM.
yarn.nodemanager.address
${yarn.nodemanager.hostname}:0
The actual address the server will bind to. If this optional address is
set, the RPC and webapp servers will bind to this address and the port specified in
yarn.nodemanager.address and yarn.nodemanager.webapp.address, respectively. This is
most useful for making NM listen to all interfaces by setting to 0.0.0.0.
yarn.nodemanager.bind-host
Environment variables that should be forwarded from the NodeManager's
environment to the container's, specified as a comma separated list of
VARNAME=value pairs.
To define environment variables individually, you can specify
multiple properties of the form yarn.nodemanager.admin-env.VARNAME,
where VARNAME is the name of the environment variable. This is the only
way to add a variable when its value contains commas.
yarn.nodemanager.admin-env
MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX
* PATH components that will be prepended to the user's path.
* If this is defined and the user does not define PATH, NM will also
* append ":$PATH" to prevent this from eclipsing the PATH defined in
* the container. This feature is only available for Linux.
yarn.nodemanager.force.path
Environment variables that containers may override rather than use NodeManager's default.
yarn.nodemanager.env-whitelist
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ
who will execute(launch) the containers.
yarn.nodemanager.container-executor.class
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
Comma separated List of container state transition listeners.
yarn.nodemanager.container-state-transition-listener.classes
Number of threads container manager uses.
yarn.nodemanager.container-manager.thread-count
20
Number of threads collector service uses.
yarn.nodemanager.collector-service.thread-count
5
Number of threads used in cleanup.
yarn.nodemanager.delete.thread-count
4
How long the container executor should wait for the exit code file to
appear after a reacquired container has exited.
yarn.nodemanager.container-executor.exit-code-file.timeout-ms
2000
At the NM, the policy to determine whether to queue an
OPPORTUNISTIC container or not.
If set to BY_QUEUE_LEN, uses the queue capacity, as set by
yarn.nodemanager.opportunistic-containers-max-queue-length
to limit how many containers to accept/queue.
If set to BY_RESOURCES, limits the number of containers
accepted based on the resource capacity of the node.
yarn.nodemanager.opportunistic-containers-queue-policy
BY_QUEUE_LEN
Max number of OPPORTUNISTIC containers to queue at the
nodemanager (NM). If the value is 0 or negative,
NMs do not allow any OPPORTUNISTIC containers.
If the value is positive, the NM caps the number of OPPORTUNISTIC
containers that can be queued at the NM.
yarn.nodemanager.opportunistic-containers-max-queue-length
0
Number of seconds after an application finishes before the nodemanager's
DeletionService will delete the application's localized file directory
and log directory.
To diagnose YARN application problems, set this property's value large
enough (for example, to 600 = 10 minutes) to permit examination of these
directories. After changing the property's value, you must restart the
nodemanager in order for it to have an effect.
The roots of YARN applications' work directories is configurable with
the yarn.nodemanager.local-dirs property (see below), and the roots
of the YARN applications' log directories is configurable with the
yarn.nodemanager.log-dirs property (see also below).
yarn.nodemanager.delete.debug-delay-sec
0
Keytab for NM.
yarn.nodemanager.keytab
/etc/krb5.keytab
List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
yarn.nodemanager.local-dirs
${hadoop.tmp.dir}/nm-local-dir
It limits the maximum number of files which will be localized
in a single local directory. If the limit is reached then sub-directories
will be created and new files will be localized in them. If it is set to
a value less than or equal to 36 [which are sub-directories (0-9 and then
a-z)] then NodeManager will fail to start. For example; [for public
cache] if this is configured with a value of 40 ( 4 files +
36 sub-directories) and the local-dir is "/tmp/local-dir1" then it will
allow 4 files to be created directly inside "/tmp/local-dir1/filecache".
For files that are localized further it will create a sub-directory "0"
inside "/tmp/local-dir1/filecache" and will localize files inside it
until it becomes full. If a file is removed from a sub-directory that
is marked full, then that sub-directory will be used back again to
localize files.
yarn.nodemanager.local-cache.max-files-per-directory
8192
Address where the localizer IPC is.
yarn.nodemanager.localizer.address
${yarn.nodemanager.hostname}:8040
Address where the collector service IPC is.
yarn.nodemanager.collector-service.address
${yarn.nodemanager.hostname}:8048
The setting that controls whether yarn container events are
published to the timeline service or not by NM. This configuration setting
is for ATS V2.
yarn.nodemanager.emit-container-events
true
Interval in between cache cleanups.
yarn.nodemanager.localizer.cache.cleanup.interval-ms
600000
Target size of localizer cache in MB, per nodemanager. It is
a target retention size that only includes resources with PUBLIC and
PRIVATE visibility and excludes resources with APPLICATION visibility
yarn.nodemanager.localizer.cache.target-size-mb
10240
Number of threads to handle localization requests.
yarn.nodemanager.localizer.client.thread-count
5
Number of threads to use for localization fetching.
yarn.nodemanager.localizer.fetch.thread-count
4
yarn.nodemanager.container-localizer.java.opts
-Xmx256m
The log level for container localizer while it is an independent process.
yarn.nodemanager.container-localizer.log.level
INFO
Where to store container logs. An application's localized log directory
will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
Individual containers' log directories will be below this, in directories
named container_{$contid}. Each container directory will contain the files
stderr, stdin, and syslog generated by that container.
yarn.nodemanager.log-dirs
${yarn.log.dir}/userlogs
The permissions settings used for the creation of container
directories when using DefaultContainerExecutor. This follows
standard user/group/all permissions format.
yarn.nodemanager.default-container-executor.log-dirs.permissions
710
Whether to enable log aggregation. Log aggregation collects
each container's logs and moves these logs onto a file-system, for e.g.
HDFS, after the application completes. Users can configure the
"yarn.nodemanager.remote-app-log-dir" and
"yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
where these logs are moved to. Users can access the logs via the
Application Timeline Server.
yarn.log-aggregation-enable
false
Whether to enable application placement based on user ID passed via
application tags. When it is enabled, userid=<userId>
pattern will be checked and if found, the application will be placed
onto the found user's queue,
if the original user has enough rights on the passed user's queue.
yarn.resourcemanager.application-tag-based-placement.enable
false
Comma separated list of users who can use the application tag based
placement, if it is enabled.
yarn.resourcemanager.application-tag-based-placement.username.whitelist
How long to keep aggregation logs before deleting them. -1 disables.
Be careful set this too small and you will spam the name node.
yarn.log-aggregation.retain-seconds
-1
How long to wait between aggregated log retention checks.
If set to 0 or a negative value then the value is computed as one-tenth
of the aggregated log retention time. Be careful set this too small and
you will spam the name node.
yarn.log-aggregation.retain-check-interval-seconds
-1
The log files created under NM Local Directories
will be logged if it exceeds the configured bytes. This
only takes effect if log4j level is at least Debug.
yarn.log-aggregation.debug.filesize
104857600
Specify which log file controllers we will support. The first
file controller we add will be used to write the aggregated logs.
This comma separated configuration will work with the configuration:
yarn.log-aggregation.file-controller.%s.class which defines the supported
file controller's class. By default, the TFile controller would be used.
The user could override this configuration by adding more file controllers.
To support back-ward compatibility, make sure that we always
add TFile file controller.
yarn.log-aggregation.file-formats
TFile
Class that supports TFile read and write operations.
yarn.log-aggregation.file-controller.TFile.class
org.apache.hadoop.yarn.logaggregation.filecontroller.tfile.LogAggregationTFileController
How long for ResourceManager to wait for NodeManager to report its
log aggregation status. If waiting time of which the log aggregation
status is reported from NodeManager exceeds the configured value, RM
will report log aggregation status for this NodeManager as TIME_OUT.
This configuration will be used in NodeManager as well to decide
whether and when to delete the cached log aggregation status.
yarn.log-aggregation-status.time-out.ms
600000
Time in seconds to retain user logs. Only applicable if
log aggregation is disabled
yarn.nodemanager.log.retain-seconds
10800
Where to aggregate logs to.
yarn.nodemanager.remote-app-log-dir
/tmp/logs
The remote log dir will be created at
{yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
yarn.nodemanager.remote-app-log-dir-suffix
logs
If set to true, the older application log directory
will be considered while fetching application logs.
yarn.nodemanager.remote-app-log-dir-include-older
true
If the NodeManager creates the remote-app-log-dir folder,
it will be created with this groupname.
yarn.nodemanager.remote-app-log-dir.groupname
Generate additional logs about container launches.
Currently, this creates a copy of the launch script and lists the
directory contents of the container work dir. When listing directory
contents, we follow symlinks to a max-depth of 5(including symlinks
which point to outside the container work dir) which may lead to a
slowness in launching containers.
yarn.nodemanager.log-container-debug-info.enabled
true
Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
yarn.nodemanager.resource.memory-mb
-1
Amount of physical memory, in MB, that is reserved
for non-YARN processes. This configuration is only used if
yarn.nodemanager.resource.detect-hardware-capabilities is set
to true and yarn.nodemanager.resource.memory-mb is -1. If set
to -1, this amount is calculated as
20% of (system memory - 2*HADOOP_HEAPSIZE)
yarn.nodemanager.resource.system-reserved-memory-mb
-1
Whether YARN CGroups memory tracking is enabled.
yarn.nodemanager.resource.memory.enabled
false
Whether YARN CGroups strict memory enforcement is enabled.
yarn.nodemanager.resource.memory.enforced
true
If memory limit is enforced, this the percentage of soft limit
compared to the memory assigned to the container. If there is memory
pressure container memory usage will be pushed back to its soft limit
by swapping out memory.
yarn.nodemanager.resource.memory.cgroups.soft-limit-percentage
90.0
Container swappiness is the likelihood a page will be swapped
out compared to be kept in memory. Value is between 0-100.
yarn.nodemanager.resource.memory.cgroups.swappiness
0
Whether physical memory limits will be enforced for
containers.
yarn.nodemanager.pmem-check-enabled
true
Whether virtual memory limits will be enforced for
containers.
yarn.nodemanager.vmem-check-enabled
true
Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage
is allowed to exceed this allocation by this ratio.
yarn.nodemanager.vmem-pmem-ratio
2.1
Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
yarn.nodemanager.resource.cpu-vcores
-1
Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
yarn.nodemanager.resource.count-logical-processors-as-cores
false
Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The
number of vcores will be calculated as
number of CPUs * multiplier.
yarn.nodemanager.resource.pcores-vcores-multiplier
1.0
Thread pool size for LogAggregationService in Node Manager.
yarn.nodemanager.logaggregation.threadpool-size-max
100
Percentage of CPU that can be allocated
for containers. This setting allows users to limit the amount of
CPU that YARN containers use. Currently functional only
on Linux using cgroups. The default is to use 100% of CPU.
yarn.nodemanager.resource.percentage-physical-cpu-limit
100
Enable auto-detection of node capabilities such as
memory and CPU.
yarn.nodemanager.resource.detect-hardware-capabilities
false
NM Webapp address.
yarn.nodemanager.webapp.address
${yarn.nodemanager.hostname}:8042
The https adddress of the NM web application.
yarn.nodemanager.webapp.https.address
0.0.0.0:8044
The Kerberos keytab file to be used for spnego filter for the NM web
interface.
yarn.nodemanager.webapp.spnego-keytab-file
The Kerberos principal to be used for spnego filter for the NM web
interface.
yarn.nodemanager.webapp.spnego-principal
How often to monitor the node and the containers.
If 0 or negative, monitoring is disabled.
yarn.nodemanager.resource-monitor.interval-ms
3000
Class that calculates current resource utilization.
yarn.nodemanager.resource-calculator.class
Enable container monitor
yarn.nodemanager.container-monitor.enabled
true
How often to monitor containers. If not set, the value for
yarn.nodemanager.resource-monitor.interval-ms will be used.
If 0 or negative, container monitoring is disabled.
yarn.nodemanager.container-monitor.interval-ms
Flag to enable the container log monitor which enforces
container log directory size limits.
yarn.nodemanager.container-log-monitor.enable
false
How often to check the usage of a container's log directories
in milliseconds
yarn.nodemanager.container-log-monitor.interval-ms
60000
The disk space limit, in bytes, for a single
container log directory
yarn.nodemanager.container-log-monitor.dir-size-limit-bytes
1000000000
The disk space limit, in bytes, for all of a container's
logs
yarn.nodemanager.container-log-monitor.total-size-limit-bytes
10000000000
Class that calculates containers current resource utilization.
If not set, the value for yarn.nodemanager.resource-calculator.class will
be used.
yarn.nodemanager.container-monitor.resource-calculator.class
The nodemanager health check scripts to run.
yarn.nodemanager.health-checker.scripts
script
Health check script time out period.
yarn.nodemanager.health-checker.timeout-ms
1200000
Whether or not to run the node health script
before the NM starts up.
yarn.nodemanager.health-checker.run-before-startup
false
Frequency of running node health scripts.
yarn.nodemanager.health-checker.interval-ms
600000
Frequency of running disk health checker code.
yarn.nodemanager.disk-health-checker.interval-ms
120000
The minimum fraction of number of disks to be healthy for the
nodemanager to launch new containers. This correspond to both
yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
are less number of healthy local-dirs (or log-dirs) available, then
new containers will not be launched on this node.
yarn.nodemanager.disk-health-checker.min-healthy-disks
0.25
Enable/Disable the disk utilisation percentage
threshold for disk health checker.
yarn.nodemanager.disk-health-checker.disk-utilization-threshold.enabled
true
Enable/Disable the minimum disk free
space threshold for disk health checker.
yarn.nodemanager.disk-health-checker.disk-free-space-threshold.enabled
true
The maximum percentage of disk space utilization allowed after
which a disk is marked as bad. Values can range from 0.0 to 100.0.
If the value is greater than or equal to 100, the nodemanager will check
for full disk. This applies to yarn.nodemanager.local-dirs and
yarn.nodemanager.log-dirs when
yarn.nodemanager.disk-health-checker.disk-utilization-threshold.enabled is true.
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage
90.0
The low threshold percentage of disk space used when a bad disk is
marked as good. Values can range from 0.0 to 100.0. This applies to
yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
Note that if its value is more than yarn.nodemanager.disk-health-checker.
max-disk-utilization-per-disk-percentage or not set, it will be set to the same value as
yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage.
yarn.nodemanager.disk-health-checker.disk-utilization-watermark-low-per-disk-percentage
The minimum space in megabytes that must be available on a disk for
it to be used. If space on a disk falls below this threshold, it will be marked
as bad. This applies to yarn.nodemanager.local-dirs and
yarn.nodemanager.log-dirs when
yarn.nodemanager.disk-health-checker.disk-free-space-threshold.enabled is true.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb
0
The minimum space in megabytes that must be available on a bad
disk for it to be marked as good. This value should not be less
than yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb.
If it is less than yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb,
or it is not set, it will be set to the
same value as yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb.
This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
yarn.nodemanager.disk-health-checker.min-free-space-per-disk-watermark-high-mb
0
The path to the Linux container executor.
yarn.nodemanager.linux-container-executor.path
The class which should help the LCE handle resources.
yarn.nodemanager.linux-container-executor.resources-handler.class
org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler
The cgroups hierarchy under which to place YARN proccesses (cannot contain commas).
If yarn.nodemanager.linux-container-executor.cgroups.mount is false
(that is, if cgroups have been pre-configured) and the YARN user has write
access to the parent directory, then the directory will be created.
If the directory already exists, the administrator has to give YARN
write permissions to it recursively.
This property only applies when the LCE resources handler is set to
CgroupsLCEResourcesHandler.
yarn.nodemanager.linux-container-executor.cgroups.hierarchy
/hadoop-yarn
Whether the LCE should attempt to mount cgroups if not found.
This property only applies when the LCE resources handler is set to
CgroupsLCEResourcesHandler.
yarn.nodemanager.linux-container-executor.cgroups.mount
false
This property sets the path from which YARN will read the
CGroups configuration. YARN has built-in functionality to discover the
system CGroup mount paths, so use this property only if YARN's automatic
mount path discovery does not work.
The path specified by this property must exist before the NodeManager is
launched.
If yarn.nodemanager.linux-container-executor.cgroups.mount is set to true,
YARN will first try to mount the CGroups at the specified path before
reading them.
If yarn.nodemanager.linux-container-executor.cgroups.mount is set to
false, YARN will read the CGroups at the specified path.
If this property is empty, YARN tries to detect the CGroups location.
Please refer to NodeManagerCgroups.html in the documentation for further
details.
This property only applies when the LCE resources handler is set to
CgroupsLCEResourcesHandler.
yarn.nodemanager.linux-container-executor.cgroups.mount-path
Delay in ms between attempts to remove linux cgroup
yarn.nodemanager.linux-container-executor.cgroups.delete-delay-ms
20
This determines which of the two modes that LCE should use on
a non-secure cluster. If this value is set to true, then all containers
will be launched as the user specified in
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user. If
this value is set to false, then containers will run as the user who
submitted the application.
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users
true
The UNIX user that containers will run as when
Linux-container-executor is used in nonsecure mode (a use case for this
is using cgroups) if the
yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users is
set to true.
yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user
nobody
The allowed pattern for UNIX user names enforced by
Linux-container-executor when used in nonsecure mode (use case for this
is using cgroups). The default value is taken from /usr/sbin/adduser
yarn.nodemanager.linux-container-executor.nonsecure-mode.user-pattern
^[_.A-Za-z0-9][-@_.A-Za-z0-9]{0,255}?[$]?$
This flag determines whether apps should run with strict resource limits
or be allowed to consume spare resources if they need them. For example, turning the
flag on will restrict apps to use only their share of CPU, even if the node has spare
CPU cycles. The default value is false i.e. use available resources. Please note that
turning this flag on may reduce job throughput on the cluster. This setting does
not apply to other subsystems like memory.
yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage
false
Comma separated list of runtimes that are allowed when using
LinuxContainerExecutor. The allowed values are default, docker, runc, and
javasandbox.
yarn.nodemanager.runtime.linux.allowed-runtimes
default
Default container runtime to use.
yarn.nodemanager.runtime.linux.type
This configuration setting determines the capabilities
assigned to docker containers when they are launched. While these may not
be case-sensitive from a docker perspective, it is best to keep these
uppercase. To run without any capabilities, set this value to
"none" or "NONE"
yarn.nodemanager.runtime.linux.docker.capabilities
CHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE
Default docker image to be used when the docker runtime is
selected.
yarn.nodemanager.runtime.linux.docker.image-name
Default option to decide whether to pull the latest image
or not.
yarn.nodemanager.runtime.linux.docker.image-update
false
This configuration setting determines if privileged docker
containers are allowed on this cluster. Privileged containers are granted
the complete set of capabilities and are not subject to the limitations
imposed by the device cgroup controller. In other words, privileged
containers can do almost everything that the host can do. Use with
extreme care.
yarn.nodemanager.runtime.linux.docker.privileged-containers.allowed
false
This configuration setting determines who is allowed to run
privileged docker containers on this cluster. Use with extreme care.
yarn.nodemanager.runtime.linux.docker.privileged-containers.acl
The set of networks allowed when launching containers using the
DockerContainerRuntime.
yarn.nodemanager.runtime.linux.docker.allowed-container-networks
host,none,bridge
The network used when launching containers using the
DockerContainerRuntime when no network is specified in the request
. This network must be one of the (configurable) set of allowed container
networks.
yarn.nodemanager.runtime.linux.docker.default-container-network
host
The set of runtimes allowed when launching containers using the
DockerContainerRuntime.
yarn.nodemanager.runtime.linux.docker.allowed-container-runtimes
runc
This configuration setting determines whether the host's PID
namespace is allowed for docker containers on this cluster.
Use with care.
yarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed
false
Property to enable docker user remapping
yarn.nodemanager.runtime.linux.docker.enable-userremapping.allowed
true
lower limit for acceptable uids of user remapped user
yarn.nodemanager.runtime.linux.docker.userremapping-uid-threshold
1
lower limit for acceptable gids of user remapped user
yarn.nodemanager.runtime.linux.docker.userremapping-gid-threshold
1
Whether or not users are allowed to request that Docker
containers honor the debug deletion delay. This is useful for
troubleshooting Docker container related launch failures.
yarn.nodemanager.runtime.linux.docker.delayed-removal.allowed
false
A configurable value to pass to the Docker Stop command. This value
defines the number of seconds between the docker stop command sending
a SIGTERM and a SIGKILL.
yarn.nodemanager.runtime.linux.docker.stop.grace-period
10
The default list of read-only mounts to be bind-mounted
into all Docker containers that use DockerContainerRuntime.
yarn.nodemanager.runtime.linux.docker.default-ro-mounts
The default list of read-write mounts to be bind-mounted
into all Docker containers that use DockerContainerRuntime.
yarn.nodemanager.runtime.linux.docker.default-rw-mounts
The default list of tmpfs mounts to be mounted into all Docker
containers that use DockerContainerRuntime.
yarn.nodemanager.runtime.linux.docker.default-tmpfs-mounts
The runC image tag to manifest plugin
class to be used.
yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.ImageTagToManifestPlugin
The runC manifest to resources plugin class to
be used.
yarn.nodemanager.runtime.linux.runc.manifest-to-resources-plugin
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.runc.HdfsManifestToResourcesPlugin
The HDFS location under which the oci image manifests, layers,
and configs directories exist.
yarn.nodemanager.runtime.linux.runc.image-toplevel-dir
/runc-root
Target count of layer mounts that we should keep on disk
at one time.
yarn.nodemanager.runtime.linux.runc.layer-mounts-to-keep
100
The interval in seconds between executions of
reaping layer mounts.
yarn.nodemanager.runtime.linux.runc.layer-mounts-interval-secs
600
Image to be used if no other image is specified.
yarn.nodemanager.runtime.linux.runc.image-name
Allow or disallow privileged containers.
yarn.nodemanager.runtime.linux.runc.privileged-containers.allowed
false
The set of networks allowed when launching containers
using the RuncContainerRuntime.
yarn.nodemanager.runtime.linux.runc.allowed-container-networks
host,none,bridge
The set of runtimes allowed when launching containers
using the RuncContainerRuntime.
yarn.nodemanager.runtime.linux.runc.allowed-container-runtimes
runc
ACL list for users allowed to run privileged
containers.
yarn.nodemanager.runtime.linux.runc.privileged-containers.acl
Allow host pid namespace for runC containers.
Use with care.
yarn.nodemanager.runtime.linux.runc.host-pid-namespace.allowed
false
The default list of read-only mounts to be bind-mounted
into all runC containers that use RuncContainerRuntime.
yarn.nodemanager.runtime.linux.runc.default-ro-mounts
The default list of read-write mounts to be bind-mounted
into all runC containers that use RuncContainerRuntime.
yarn.nodemanager.runtime.linux.runc.default-rw-mounts
Path to the seccomp profile to use with runC
containers
yarn.nodemanager.runtime.linux.runc.seccomp-profile
The HDFS location where the runC image tag to hash
file exists.
yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin.hdfs-hash-file
/runc-root/image-tag-to-hash
The local file system location where the runC image tag
to hash file exists.
yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin.local-hash-file
The interval in seconds between refreshing the hdfs image tag
to hash cache.
yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin.cache-refresh-interval-secs
60
The number of manifests to cache in the image tag
to hash cache.
yarn.nodemanager.runtime.linux.runc.image-tag-to-manifest-plugin.num-manifests-to-cache
10
The timeout value in seconds for the values in
the stat cache.
yarn.nodemanager.runtime.linux.runc.hdfs-manifest-to-resources-plugin.stat-cache-timeout-interval-secs
360
The size of the stat cache which stores stats of the
layers and config.
yarn.nodemanager.runtime.linux.runc.hdfs-manifest-to-resources-plugin.stat-cache-size
500
The mode in which the Java Container Sandbox should run detailed by
the JavaSandboxLinuxContainerRuntime.
yarn.nodemanager.runtime.linux.sandbox-mode
disabled
Permissions for application local directories.
yarn.nodemanager.runtime.linux.sandbox-mode.local-dirs.permissions
read
Location for non-default java policy file.
yarn.nodemanager.runtime.linux.sandbox-mode.policy
The group which will run by default without the java security
manager.
yarn.nodemanager.runtime.linux.sandbox-mode.whitelist-group
This flag determines whether memory limit will be set for the Windows Job
Object of the containers launched by the default container executor.
yarn.nodemanager.windows-container.memory-limit.enabled
false
This flag determines whether CPU limit will be set for the Windows Job
Object of the containers launched by the default container executor.
yarn.nodemanager.windows-container.cpu-limit.enabled
false
Interval of time the linux container executor should try cleaning up
cgroups entry when cleaning up a container.
yarn.nodemanager.linux-container-executor.cgroups.delete-timeout-ms
1000
T-file compression types used to compress aggregated logs.
yarn.nodemanager.log-aggregation.compression-type
none
The kerberos principal for the node manager.
yarn.nodemanager.principal
A comma separated list of services where service name should only
contain a-zA-Z0-9_ and can not start with numbers
yarn.nodemanager.aux-services
Boolean indicating whether loading aux services from a manifest
is enabled. If enabled, aux services may be dynamically modified through
reloading the manifest via filesystem changes or a REST API. When
enabled, aux services configuration properties unrelated to the manifest
will be ignored.
yarn.nodemanager.aux-services.manifest.enabled
false
A file containing auxiliary service specifications.
yarn.nodemanager.aux-services.manifest
Length of time in ms to wait between reloading aux services
manifest. If 0 or less, manifest will not be reloaded.
yarn.nodemanager.aux-services.manifest.reload-ms
0
No. of ms to wait between sending a SIGTERM and SIGKILL to a container
yarn.nodemanager.sleep-delay-before-sigkill.ms
250
Max time to wait for a process to come up when trying to cleanup a container
yarn.nodemanager.process-kill-wait.ms
5000
The minimum allowed version of a resourcemanager that a nodemanager will connect to.
The valid values are NONE (no version checking), EqualToNM (the resourcemanager's version is
equal to or greater than the NM version), or a Version String.
yarn.nodemanager.resourcemanager.minimum.version
NONE
Maximum size of contain's diagnostics to keep for relaunching
container case.
yarn.nodemanager.container-diagnostics-maximum-size
10000
Minimum container restart interval in milliseconds.
yarn.nodemanager.container-retry-minimum-interval-ms
1000
Max number of threads in NMClientAsync to process container
management events
yarn.client.nodemanager-client-async.thread-pool-max-size
500
Max time to wait to establish a connection to NM
yarn.client.nodemanager-connect.max-wait-ms
180000
Time interval between each attempt to connect to NM
yarn.client.nodemanager-connect.retry-interval-ms
10000
Max time to wait for NM to connect to RM.
When not set, proxy will fall back to use value of
yarn.resourcemanager.connect.max-wait.ms.
yarn.nodemanager.resourcemanager.connect.max-wait.ms
Time interval between each NM attempt to connect to RM.
When not set, proxy will fall back to use value of
yarn.resourcemanager.connect.retry-interval.ms.
yarn.nodemanager.resourcemanager.connect.retry-interval.ms
Maximum number of proxy connections to cache for node managers. If set
to a value greater than zero then the cache is enabled and the NMClient
and MRAppMaster will cache the specified number of node manager proxies.
There will be at max one proxy per node manager. Ex. configuring it to a
value of 5 will make sure that client will at max have 5 proxies cached
with 5 different node managers. These connections for these proxies will
be timed out if idle for more than the system wide idle timeout period.
Note that this could cause issues on large clusters as many connections
could linger simultaneously and lead to a large number of connection
threads. The token used for authentication will be used only at
connection creation time. If a new token is received then the earlier
connection should be closed in order to use the new token. This and
(yarn.client.nodemanager-client-async.thread-pool-max-size) are related
and should be in sync (no need for them to be equal).
If the value of this property is zero then the connection cache is
disabled and connections will use a zero idle timeout to prevent too
many connection threads on large clusters.
yarn.client.max-cached-nodemanagers-proxies
0
Enable the node manager to recover after starting
yarn.nodemanager.recovery.enabled
false
The local filesystem directory in which the node manager will
store state when recovery is enabled.
yarn.nodemanager.recovery.dir
${hadoop.tmp.dir}/yarn-nm-recovery
The time in seconds between full compactions of the NM state
database. Setting the interval to zero disables the full compaction
cycles.
yarn.nodemanager.recovery.compaction-interval-secs
3600
Whether the nodemanager is running under supervision. A
nodemanager that supports recovery and is running under supervision
will not try to cleanup containers as it exits with the assumption
it will be immediately be restarted and recover containers.
yarn.nodemanager.recovery.supervised
false
Adjustment to the container OS scheduling priority. In Linux, passed
directly to the nice command. If unspecified then containers are launched
without any explicit OS priority.
yarn.nodemanager.container-executor.os.sched.priority.adjustment
Flag to enable container metrics
yarn.nodemanager.container-metrics.enable
true
Container metrics flush period in ms. Set to -1 for flush on completion.
yarn.nodemanager.container-metrics.period-ms
-1
The delay time ms to unregister container metrics after completion.
yarn.nodemanager.container-metrics.unregister-delay-ms
10000
Class used to calculate current container resource utilization.
yarn.nodemanager.container-monitor.process-tree.class
Flag to enable NodeManager disk health checker
yarn.nodemanager.disk-health-checker.enable
true
Number of threads to use in NM log cleanup. Used when log aggregation
is disabled.
yarn.nodemanager.log.deletion-threads-count
4
The Windows group that the windows-container-executor should run as.
yarn.nodemanager.windows-secure-container-executor.group
yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
The kerberos principal for the proxy, if the proxy is not
running as part of the RM.
yarn.web-proxy.principal
Keytab for WebAppProxy, if the proxy is not running as part of
the RM.
yarn.web-proxy.keytab
The address for the web proxy as HOST:PORT, if this is not
given then the proxy will run as part of the RM
yarn.web-proxy.address
The actual address the web proxy will bind to. If this optional
address is set, it overrides only the hostname portion of yarn.web-proxy.address.
This is useful for making the web proxy server listen on all interfaces by setting
it to 0.0.0.0
yarn.web-proxy.bind-host
Enable the web proxy connection timeout, default is enabled.
yarn.resourcemanager.proxy.timeout.enabled
true
The web proxy connection timeout.
yarn.resourcemanager.proxy.connection.timeout
60000
CLASSPATH for YARN applications. A comma-separated list
of CLASSPATH entries. When this value is empty, the following default
CLASSPATH for YARN applications would be used.
For Linux:
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
For Windows:
%HADOOP_CONF_DIR%,
%HADOOP_COMMON_HOME%/share/hadoop/common/*,
%HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,
%HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,
%HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,
%HADOOP_YARN_HOME%/share/hadoop/yarn/*,
%HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*
yarn.application.classpath
Indicate what is the current version of the running
timeline service. For example, if "yarn.timeline-service.version" is 1.5,
and "yarn.timeline-service.enabled" is true, it means the cluster will and
should bring up the timeline service v.1.5 (and nothing else).
On the client side, if the client uses the same version of timeline service,
it should succeed. If the client chooses to use a smaller version in spite of this,
then depending on how robust the compatibility story is between versions,
the results may vary.
yarn.timeline-service.version
1.0f
In the server side it indicates whether timeline service is enabled or not.
And in the client side, users can enable it to indicate whether client wants
to use timeline service. If its enabled in the client side along with
security, then yarn client tries to fetch the delegation tokens for the
timeline server.
yarn.timeline-service.enabled
false
The hostname of the timeline service web application.
yarn.timeline-service.hostname
0.0.0.0
This is default address for the timeline server to start the
RPC server.
yarn.timeline-service.address
${yarn.timeline-service.hostname}:10200
The http address of the timeline service web application.
yarn.timeline-service.webapp.address
${yarn.timeline-service.hostname}:8188
The https address of the timeline service web application.
yarn.timeline-service.webapp.https.address
${yarn.timeline-service.hostname}:8190
The actual address the server will bind to. If this optional address is
set, the RPC and webapp servers will bind to this address and the port specified in
yarn.timeline-service.address and yarn.timeline-service.webapp.address, respectively.
This is most useful for making the service listen to all interfaces by setting to
0.0.0.0.
yarn.timeline-service.bind-host
Defines the max number of applications could be fetched using REST API or
application history protocol and shown in timeline server web ui.
yarn.timeline-service.generic-application-history.max-applications
10000
Store class name for timeline store.
yarn.timeline-service.store-class
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
Enable age off of timeline store data.
yarn.timeline-service.ttl-enable
true
Time to live for timeline store data in milliseconds.
yarn.timeline-service.ttl-ms
604800000
Store file name for leveldb timeline store.
yarn.timeline-service.leveldb-timeline-store.path
${hadoop.tmp.dir}/yarn/timeline
Length of time to wait between deletion cycles of leveldb timeline store in milliseconds.
yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms
300000
Size of read cache for uncompressed blocks for leveldb timeline store in bytes.
yarn.timeline-service.leveldb-timeline-store.read-cache-size
104857600
Size of cache for recently read entity start times for leveldb timeline store in number of entities.
yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size
10000
Size of cache for recently written entity start times for leveldb timeline store in number of entities.
yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size
10000
Handler thread count to serve the client RPC requests.
yarn.timeline-service.handler-thread-count
10
yarn.timeline-service.http-authentication.type
simple
Defines authentication used for the timeline server HTTP endpoint.
Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
yarn.timeline-service.http-authentication.simple.anonymous.allowed
true
Indicates if anonymous requests are allowed by the timeline server when using
'simple' authentication.
The Kerberos principal for the timeline server.
yarn.timeline-service.principal
The Kerberos keytab for the timeline server.
yarn.timeline-service.keytab
/etc/krb5.keytab
Comma separated list of UIs that will be hosted
yarn.timeline-service.ui-names
Default maximum number of retries for timeline service client
and value -1 means no limit.
yarn.timeline-service.client.max-retries
30
Client policy for whether timeline operations are non-fatal.
Should the failure to obtain a delegation token be considered an application
failure (option = false), or should the client attempt to continue to
publish information without it (option=true)
yarn.timeline-service.client.best-effort
false
Default retry time interval for timeline servive client.
yarn.timeline-service.client.retry-interval-ms
1000
The time period for which timeline v2 client will wait for draining
leftover entities after stop.
yarn.timeline-service.client.drain-entities.timeout.ms
2000
Enable timeline server to recover state after starting. If
true, then yarn.timeline-service.state-store-class must be specified.
yarn.timeline-service.recovery.enabled
false
Store class name for timeline state store.
yarn.timeline-service.state-store-class
org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore
Store file name for leveldb state store.
yarn.timeline-service.leveldb-state-store.path
${hadoop.tmp.dir}/yarn/timeline
yarn.timeline-service.entity-group-fs-store.cache-store-class
org.apache.hadoop.yarn.server.timeline.MemoryTimelineStore
Caching storage timeline server v1.5 is using.
yarn.timeline-service.entity-group-fs-store.active-dir
/tmp/entity-file-history/active
HDFS path to store active application’s timeline data
yarn.timeline-service.entity-group-fs-store.done-dir
/tmp/entity-file-history/done/
HDFS path to store done application’s timeline data
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes
Plugins that can translate a timeline entity read request into
a list of timeline entity group ids, separated by commas.
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classpath
Classpath for all plugins defined in
yarn.timeline-service.entity-group-fs-store.group-id-plugin-classes.
yarn.timeline-service.entity-group-fs-store.summary-store
Summary storage for ATS v1.5
org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore
yarn.timeline-service.entity-group-fs-store.scan-interval-seconds
Scan interval for ATS v1.5 entity group file system storage reader.This
value controls how frequent the reader will scan the HDFS active directory
for application status.
60
yarn.timeline-service.entity-group-fs-store.cleaner-interval-seconds
Scan interval for ATS v1.5 entity group file system storage cleaner.This
value controls how frequent the reader will scan the HDFS done directory
for stale application data.
3600
yarn.timeline-service.entity-group-fs-store.retain-seconds
How long the ATS v1.5 entity group file system storage will keep an
application's data in the done directory.
604800
yarn.timeline-service.entity-group-fs-store.leveldb-cache-read-cache-size
Read cache size for the leveldb cache storage in ATS v1.5 plugin storage.
10485760
yarn.timeline-service.entity-group-fs-store.app-cache-size
Size of the reader cache for ATS v1.5 reader. This value controls how many
entity groups the ATS v1.5 server should cache. If the number of active
read entity groups is greater than the number of caches items, some reads
may return empty data. This value must be greater than 0.
10
yarn.timeline-service.client.fd-flush-interval-secs
Flush interval for ATS v1.5 writer. This value controls how frequent
the writer will flush the HDFS FSStream for the entity/domain.
10
yarn.timeline-service.client.fd-clean-interval-secs
Scan interval for ATS v1.5 writer. This value controls how frequent
the writer will scan the HDFS FSStream for the entity/domain.
If the FSStream is stale for a long time, this FSStream will be close.
60
yarn.timeline-service.client.fd-retain-secs
How long the ATS v1.5 writer will keep an FSStream open.
If this fsstream does not write anything for this configured time,
it will be close.
300
yarn.timeline-service.writer.class
Storage implementation ATS v2 will use for the TimelineWriter service.
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineWriterImpl
yarn.timeline-service.reader.class
Storage implementation ATS v2 will use for the TimelineReader service.
org.apache.hadoop.yarn.server.timelineservice.storage.HBaseTimelineReaderImpl
yarn.timeline-service.client.internal-timers-ttl-secs
How long the internal Timer Tasks can be alive in writer. If there is no
write operation for this configured time, the internal timer tasks will
be close.
420
The setting that controls how often the timeline collector
flushes the timeline writer.
yarn.timeline-service.writer.flush-interval-seconds
60
The setting that decides the capacity of the queue to hold
asynchronous timeline entities.
yarn.timeline-service.writer.async.queue.capacity
100
Time period till which the application collector will be alive
in NM, after the application master container finishes.
yarn.timeline-service.app-collector.linger-period.ms
60000
Time line V2 client tries to merge these many number of
async entities (if available) and then call the REST ATS V2 API to submit.
yarn.timeline-service.timeline-client.number-of-async-entities-to-merge
10
The setting that controls how long the final value
of a metric of a completed app is retained before merging into
the flow sum. Up to this time after an application is completed
out-of-order values that arrive can be recognized and discarded at the
cost of increased storage.
yarn.timeline-service.hbase.coprocessor.app-final-value-retention-milliseconds
259200000
The setting that controls how often in-memory app level
aggregation is kicked off in timeline collector.
yarn.timeline-service.app-aggregation-interval-secs
15
The default hdfs location for flowrun coprocessor jar.
yarn.timeline-service.hbase.coprocessor.jar.hdfs.location
/hbase/coprocessor/hadoop-yarn-server-timelineservice.jar
The value of this parameter sets the prefix for all tables that are part of
timeline service in the hbase storage schema. It can be set to "dev."
or "staging." if it is to be used for development or staging instances.
This way the data in production tables stays in a separate set of tables
prefixed by "prod.".
yarn.timeline-service.hbase-schema.prefix
prod.
Optional URL to an hbase-site.xml configuration file to be
used to connect to the timeline-service hbase cluster. If empty or not
specified, then the HBase configuration will be loaded from the classpath.
When specified the values in the specified configuration file will override
those from the ones that are present on the classpath.
yarn.timeline-service.hbase.configuration.file
Removes the UUID if represent and limit the the flowname length with
the given value for ATSv2. In case the value is negative or 0,
it only removes the UUID and does not limit the flow name.
yarn.timeline-service.flowname.max-size
0
Whether the shared cache is enabled
yarn.sharedcache.enabled
false
The root directory for the shared cache
yarn.sharedcache.root-dir
/sharedcache
The level of nested directories before getting to the checksum
directories. It must be non-negative.
yarn.sharedcache.nested-level
3
The implementation to be used for the SCM store
yarn.sharedcache.store.class
org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore
The implementation to be used for the SCM app-checker
yarn.sharedcache.app-checker.class
org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker
A resource in the in-memory store is considered stale
if the time since the last reference exceeds the staleness period.
This value is specified in minutes.
yarn.sharedcache.store.in-memory.staleness-period-mins
10080
Initial delay before the in-memory store runs its first check
to remove dead initial applications. Specified in minutes.
yarn.sharedcache.store.in-memory.initial-delay-mins
10
The frequency at which the in-memory store checks to remove
dead initial applications. Specified in minutes.
yarn.sharedcache.store.in-memory.check-period-mins
720
The address of the admin interface in the SCM (shared cache manager)
yarn.sharedcache.admin.address
0.0.0.0:8047
The number of threads used to handle SCM admin interface (1 by default)
yarn.sharedcache.admin.thread-count
1
The address of the web application in the SCM (shared cache manager)
yarn.sharedcache.webapp.address
0.0.0.0:8788
The frequency at which a cleaner task runs.
Specified in minutes.
yarn.sharedcache.cleaner.period-mins
1440
Initial delay before the first cleaner task is scheduled.
Specified in minutes.
yarn.sharedcache.cleaner.initial-delay-mins
10
The time to sleep between processing each shared cache
resource. Specified in milliseconds.
yarn.sharedcache.cleaner.resource-sleep-ms
0
The address of the node manager interface in the SCM
(shared cache manager)
yarn.sharedcache.uploader.server.address
0.0.0.0:8046
The number of threads used to handle shared cache manager
requests from the node manager (50 by default)
yarn.sharedcache.uploader.server.thread-count
50
The address of the client interface in the SCM
(shared cache manager)
yarn.sharedcache.client-server.address
0.0.0.0:8045
The number of threads used to handle shared cache manager
requests from clients (50 by default)
yarn.sharedcache.client-server.thread-count
50
The algorithm used to compute checksums of files (SHA-256 by
default)
yarn.sharedcache.checksum.algo.impl
org.apache.hadoop.yarn.sharedcache.ChecksumSHA256Impl
The replication factor for the node manager uploader for the
shared cache (10 by default)
yarn.sharedcache.nm.uploader.replication.factor
10
The number of threads used to upload files from a node manager
instance (20 by default)
yarn.sharedcache.nm.uploader.thread-count
20
ACL protocol for use in the Timeline server.
security.applicationhistory.protocol.acl
Set to true for MiniYARNCluster unit tests
yarn.is.minicluster
false
Set for MiniYARNCluster unit tests to control resource monitoring
yarn.minicluster.control-resource-monitoring
false
Set to false in order to allow MiniYARNCluster to run tests without
port conflicts.
yarn.minicluster.fixed.ports
false
Set to false in order to allow the NodeManager in MiniYARNCluster to
use RPC to talk to the RM.
yarn.minicluster.use-rpc
false
As yarn.nodemanager.resource.memory-mb property but for the NodeManager
in a MiniYARNCluster.
yarn.minicluster.yarn.nodemanager.resource.memory-mb
4096
Enable node labels feature
yarn.node-labels.enabled
false
URI for NodeLabelManager. The default value is
/tmp/hadoop-yarn-${user}/node-labels/ in the local filesystem.
yarn.node-labels.fs-store.root-dir
Set configuration type for node labels. Administrators can specify
"centralized", "delegated-centralized" or "distributed".
yarn.node-labels.configuration-type
centralized
When "yarn.node-labels.configuration-type" is configured with "distributed"
in RM, Administrators can configure in NM the provider for the
node labels by configuring this parameter. Administrators can
configure "config", "script" or the class name of the provider. Configured
class needs to extend
org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeLabelsProvider.
If "config" is configured, then "ConfigurationNodeLabelsProvider" and if
"script" is configured, then "ScriptNodeLabelsProvider" will be used.
yarn.nodemanager.node-labels.provider
When "yarn.nodemanager.node-labels.provider" is configured with "config",
"Script" or the configured class extends AbstractNodeLabelsProvider, then
periodically node labels are retrieved from the node labels provider. This
configuration is to define the interval period.
If -1 is configured then node labels are retrieved from provider only
during initialization. Defaults to 10 mins.
yarn.nodemanager.node-labels.provider.fetch-interval-ms
600000
Interval at which NM syncs its node labels with RM. NM will send its loaded
labels every x intervals configured, along with heartbeat to RM.
yarn.nodemanager.node-labels.resync-interval-ms
120000
When "yarn.nodemanager.node-labels.provider" is configured with "config"
then ConfigurationNodeLabelsProvider fetches the partition label from this
parameter.
yarn.nodemanager.node-labels.provider.configured-node-partition
When "yarn.nodemanager.node-labels.provider" is configured with "Script"
then this configuration provides the timeout period after which it will
interrupt the script which queries the Node labels. Defaults to 20 mins.
yarn.nodemanager.node-labels.provider.fetch-timeout-ms
1200000
When node labels "yarn.node-labels.configuration-type" is
of type "delegated-centralized", administrators should configure
the class for fetching node labels by ResourceManager. Configured
class needs to extend
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.
RMNodeLabelsMappingProvider.
yarn.resourcemanager.node-labels.provider
When "yarn.node-labels.configuration-type" is configured with
"delegated-centralized", then node labels of all nodes
are updated by periodically retrieving node labels from the
provider. If -1 is configured then node labels are retrieved
from provider only once for each node after it registers.
Defaults to 30 mins.
yarn.resourcemanager.node-labels.provider.fetch-interval-ms
1800000
When "yarn.node-labels.configuration-type" is configured with
"delegated-centralized", then node labels of newly registered
nodes are updated by periodically retrieving node labels from
the provider. Defaults to 30 secs.
yarn.resourcemanager.node-labels.provider.update-newly-registered-nodes-interval-ms
30000
Overwrites default-node-label-expression only for the ApplicationMaster
container. It is disabled by default.
yarn.resourcemanager.node-labels.am.default-node-label-expression
Flag to indicate whether the AM can be allocated to non-exclusive nodes or not.
Default is false.
yarn.resourcemanager.node-labels.am.allow-non-exclusive-allocation
false
This property determines which provider will be plugged by the
node manager to collect node-attributes. Administrators can
configure "config", "script" or the class name of the provider.
Configured class needs to extend
org.apache.hadoop.yarn.server.nodemanager.nodelabels.NodeAttributesProvider.
If "config" is configured, then "ConfigurationNodeLabelsProvider" and if
"script" is configured, then "ScriptBasedNodeAttributesProvider"
will be used.
yarn.nodemanager.node-attributes.provider
The node attribute script NM runs to collect node attributes.
Script output Line starting with "NODE_ATTRIBUTE:" will be
considered as a record of node attribute, attribute name, type
and value should be delimited by comma. Each of such lines
will be parsed to a node attribute.
yarn.nodemanager.node-attributes.provider.script.path
Command arguments passed to the node attribute script.
yarn.nodemanager.node-attributes.provider.script.opts
Time interval that determines how long NM fetches node attributes
from a given provider. If -1 is configured then node labels are
retrieved from provider only during initialization. Defaults to 10 mins.
yarn.nodemanager.node-attributes.provider.fetch-interval-ms
600000
Timeout period after which NM will interrupt the node attribute
provider script which queries node attributes. Defaults to 20 mins.
yarn.nodemanager.node-attributes.provider.fetch-timeout-ms
1200000
When "yarn.nodemanager.node-attributes.provider" is configured with
"config" then ConfigurationNodeAttributesProvider fetches node attributes
from this parameter.
yarn.nodemanager.node-attributes.provider.configured-node-attributes
Interval at which NM syncs its node attributes with RM. NM will send its loaded
attributes every x intervals configured, along with heartbeat to RM.
yarn.nodemanager.node-attributes.resync-interval-ms
120000
Timeout in seconds for YARN node graceful decommission.
This is the maximal time to wait for running containers and applications to complete
before transition a DECOMMISSIONING node into DECOMMISSIONED.
yarn.resourcemanager.nodemanager-graceful-decommission-timeout-secs
3600
Timeout in seconds of DecommissioningNodesWatcher internal polling.
yarn.resourcemanager.decommissioning-nodes-watcher.poll-interval-secs
20
Used to specify custom web services for Resourcemanager. Value can be
classnames separated by comma.
Ex: org.apache.hadoop.yarn.server.resourcemanager.webapp.RMWebServices,
org.apache.hadoop.yarn.server.resourcemanager.webapp.DummyClass
yarn.http.rmwebapp.external.classes
Used to specify custom scheduler page
yarn.http.rmwebapp.scheduler.page.class
Used to specify custom DAO classes used by custom web services.
yarn.http.rmwebapp.custom.dao.classes
Used to specify custom DAO classes used by custom web services which requires
root unwrapping.
yarn.http.rmwebapp.custom.unwrapped.dao.classes
Used to specify custom WebServices class to bind with RMWebApp overriding
the default RMWebServices.
yarn.webapp.custom.webservice.class
The Node Label script to run. Script output Line starting with
"NODE_PARTITION:" will be considered as Node Label Partition. In case of
multiple lines have this pattern, then last one will be considered
yarn.nodemanager.node-labels.provider.script.path
The arguments to pass to the Node label script.
yarn.nodemanager.node-labels.provider.script.opts
Flag to indicate whether the RM is participating in Federation or not.
yarn.federation.enabled
false
Initial delay for federation state-store heartbeat service. Value is followed by a unit
specifier: ns, us, ms, s, m, h, d for nanoseconds, microseconds, milliseconds, seconds,
minutes, hours, days respectively. Values should provide units,
but seconds are assumed
yarn.federation.state-store.heartbeat.initial-delay
30s
Machine list file to be loaded by the FederationSubCluster Resolver
yarn.federation.machine-list
Class name for SubClusterResolver
yarn.federation.subcluster-resolver.class
org.apache.hadoop.yarn.server.federation.resolver.DefaultSubClusterResolverImpl
Store class name for federation state store
yarn.federation.state-store.class
org.apache.hadoop.yarn.server.federation.store.impl.MemoryFederationStateStore
The time in seconds after which the federation state store local cache
will be refreshed periodically
yarn.federation.cache-ttl.secs
300
The registry base directory for federation.
yarn.federation.registry.base-dir
yarnfederation/
The registry implementation to use.
yarn.registry.class
org.apache.hadoop.registry.client.impl.FSRegistryOperationsService
The interval that the yarn client library uses to poll the
completion status of the asynchronous API of application client protocol.
yarn.client.application-client-protocol.poll-interval-ms
200
The duration (in ms) the YARN client waits for an expected state change
to occur. -1 means unlimited wait time.
yarn.client.application-client-protocol.poll-timeout-ms
-1
RSS usage of a process computed via
/proc/pid/stat is not very accurate as it includes shared pages of a
process. /proc/pid/smaps provides useful information like
Private_Dirty, Private_Clean, Shared_Dirty, Shared_Clean which can be used
for computing more accurate RSS. When this flag is enabled, RSS is computed
as Min(Shared_Dirty, Pss) + Private_Clean + Private_Dirty. It excludes
read-only shared mappings in RSS computation.
yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled
false
URL for log aggregation server
yarn.log.server.url
URL for log aggregation server web service
yarn.log.server.web-service.url
RM Application Tracking URL
yarn.tracking.url.generator
Class to be used for YarnAuthorizationProvider
yarn.authorization-provider
Defines how often NMs wake up to upload log files.
The default value is -1. By default, the logs will be uploaded when
the application is finished. By setting this configuration logs can
be uploaded periodically while the application is running.
The minimum positive accepted value can be configured by the setting
"yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds.min".
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds
-1
Defines the positive minimum hard limit for
"yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds".
If this configuration has been set less than its default value (3600)
the NodeManager may raise a warning.
yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds.min
3600
Define how many aggregated log files per application per NM
we can have in remote file system. By default, the total number of
aggregated log files per application per NM is 30.
yarn.nodemanager.log-aggregation.num-log-files-per-app
30
Enable/disable intermediate-data encryption at YARN level. For now,
this only is used by the FileSystemRMStateStore to setup right
file-system security attributes.
yarn.intermediate-data-encryption.enable
false
Flag to enable cross-origin (CORS) support in the NM. This flag
requires the CORS filter initializer to be added to the filter initializers
list in core-site.xml.
yarn.nodemanager.webapp.cross-origin.enabled
false
Defines maximum application priority in a cluster.
If an application is submitted with a priority higher than this value, it will be
reset to this maximum value.
yarn.cluster.max-application-priority
0
The default log aggregation policy class. Applications can
override it via LogAggregationContext. This configuration can provide
some cluster-side default behavior so that if the application doesn't
specify any policy via LogAggregationContext administrators of the cluster
can adjust the policy globally.
yarn.nodemanager.log-aggregation.policy.class
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AllContainerLogAggregationPolicy
The default parameters for the log aggregation policy. Applications can
override it via LogAggregationContext. This configuration can provide
some cluster-side default behavior so that if the application doesn't
specify any policy via LogAggregationContext administrators of the cluster
can adjust the policy globally.
yarn.nodemanager.log-aggregation.policy.parameters
Enable/Disable AMRMProxyService in the node manager. This service is used to
intercept calls from the application masters to the resource manager.
yarn.nodemanager.amrmproxy.enabled
false
The address of the AMRMProxyService listener.
yarn.nodemanager.amrmproxy.address
0.0.0.0:8049
The number of threads used to handle requests by the AMRMProxyService.
yarn.nodemanager.amrmproxy.client.thread-count
25
The comma separated list of class names that implement the
RequestInterceptor interface. This is used by the AMRMProxyService to create
the request processing pipeline for applications.
yarn.nodemanager.amrmproxy.interceptor-class.pipeline
org.apache.hadoop.yarn.server.nodemanager.amrmproxy.DefaultRequestInterceptor
Whether AMRMProxy HA is enabled.
yarn.nodemanager.amrmproxy.ha.enable
false
Setting that controls whether distributed scheduling is enabled.
yarn.nodemanager.distributed-scheduling.enabled
false
Setting that controls whether opportunistic container allocation
is enabled.
yarn.resourcemanager.opportunistic-container-allocation.enabled
false
Maximum number of opportunistic containers to be allocated per
Application Master heartbeat.
yarn.resourcemanager.opportunistic.max.container-allocation.per.am.heartbeat
-1
Number of nodes to be used by the Opportunistic Container Allocator for
dispatching containers during container allocation.
yarn.resourcemanager.opportunistic-container-allocation.nodes-used
10
Frequency for computing least loaded NMs.
yarn.resourcemanager.nm-container-queuing.sorting-nodes-interval-ms
1000
Comparator for determining node load for Distributed Scheduling.
yarn.resourcemanager.nm-container-queuing.load-comparator
QUEUE_LENGTH
Value of standard deviation used for calculation of queue limit thresholds.
yarn.resourcemanager.nm-container-queuing.queue-limit-stdev
1.0f
Min length of container queue at NodeManager.
yarn.resourcemanager.nm-container-queuing.min-queue-length
5
Max length of container queue at NodeManager.
yarn.resourcemanager.nm-container-queuing.max-queue-length
15
Min queue wait time for a container at a NodeManager.
yarn.resourcemanager.nm-container-queuing.min-queue-wait-time-ms
10
Max queue wait time for a container queue at a NodeManager.
yarn.resourcemanager.nm-container-queuing.max-queue-wait-time-ms
100
Use container pause as the preemption policy over kill in the container
queue at a NodeManager.
yarn.nodemanager.opportunistic-containers-use-pause-for-preemption
false
Error filename pattern, to identify the file in the container's
Log directory which contain the container's error log. As error file
redirection is done by client/AM and yarn will not be aware of the error
file name. YARN uses this pattern to identify the error file and tail
the error log as diagnostics when the container execution returns non zero
value. Filename patterns are case sensitive and should match the
specifications of FileSystem.globStatus(Path) api. If multiple filenames
matches the pattern, first file matching the pattern will be picked.
yarn.nodemanager.container.stderr.pattern
{*stderr*,*STDERR*}
Size of the container error file which needs to be tailed, in bytes.
yarn.nodemanager.container.stderr.tail.bytes
4096
Choose different implementation of node label's storage
yarn.node-labels.fs-store.impl.class
org.apache.hadoop.yarn.nodelabels.FileSystemNodeLabelsStore
The replication factor for the FS store
files. Default value is 0, means it will use file system
default replication.
yarn.fs-store.file.replication
0
Enable the CSRF filter for the RM web app
yarn.resourcemanager.webapp.rest-csrf.enabled
false
Optional parameter that indicates the custom header name to use for CSRF
protection.
yarn.resourcemanager.webapp.rest-csrf.custom-header
X-XSRF-Header
Optional parameter that indicates the list of HTTP methods that do not
require CSRF protection
yarn.resourcemanager.webapp.rest-csrf.methods-to-ignore
GET,OPTIONS,HEAD
Enable the CSRF filter for the NM web app
yarn.nodemanager.webapp.rest-csrf.enabled
false
Optional parameter that indicates the custom header name to use for CSRF
protection.
yarn.nodemanager.webapp.rest-csrf.custom-header
X-XSRF-Header
Optional parameter that indicates the list of HTTP methods that do not
require CSRF protection
yarn.nodemanager.webapp.rest-csrf.methods-to-ignore
GET,OPTIONS,HEAD
The name of disk validator.
yarn.nodemanager.disk-validator
basic
Enable the CSRF filter for the timeline service web app
yarn.timeline-service.webapp.rest-csrf.enabled
false
Optional parameter that indicates the custom header name to use for CSRF
protection.
yarn.timeline-service.webapp.rest-csrf.custom-header
X-XSRF-Header
Optional parameter that indicates the list of HTTP methods that do not
require CSRF protection
yarn.timeline-service.webapp.rest-csrf.methods-to-ignore
GET,OPTIONS,HEAD
Enable the XFS filter for YARN
yarn.webapp.xfs-filter.enabled
true
Property specifying the xframe options value.
yarn.resourcemanager.webapp.xfs-filter.xframe-options
SAMEORIGIN
Property specifying the xframe options value.
yarn.nodemanager.webapp.xfs-filter.xframe-options
SAMEORIGIN
Property specifying the xframe options value.
yarn.timeline-service.webapp.xfs-filter.xframe-options
SAMEORIGIN
The least amount of time(msec.) an inactive (decommissioned or shutdown) node can
stay in the nodes list of the resourcemanager after being declared untracked.
A node is marked untracked if and only if it is absent from both include and
exclude nodemanager lists on the RM. All inactive nodes are checked twice per
timeout interval or every 10 minutes, whichever is lesser, and marked appropriately.
The same is done when refreshNodes command (graceful or otherwise) is invoked.
yarn.resourcemanager.node-removal-untracked.timeout-ms
60000
The RMAppLifetimeMonitor Service uses this value as monitor interval
yarn.resourcemanager.application-timeouts.monitor.interval-ms
3000
Specifies what the RM does regarding HTTPS enforcement for communication
with AM Web Servers, as well as generating and providing certificates.
Possible values are:
- NONE - the RM will do nothing special.
- LENIENT - the RM will generate and provide a keystore and truststore
to the AM, which it is free to use for HTTPS in its tracking URL web
server. The RM proxy will still allow HTTP connections to AMs that opt
not to use HTTPS.
- STRICT - this is the same as LENIENT, except that the RM proxy will
only allow HTTPS connections to AMs; HTTP connections will be blocked
and result in a warning page to the user.
yarn.resourcemanager.application-https.policy
NONE
Defines the limit of the diagnostics message of an application
attempt, in kilo characters (character count * 1024).
When using ZooKeeper to store application state behavior, its
important to limit the size of the diagnostic messages to
prevent YARN from overwhelming ZooKeeper. In cases where
yarn.resourcemanager.state-store.max-completed-applications is set to
a large number, it may be desirable to reduce the value of this property
to limit the total data stored.
yarn.app.attempt.diagnostics.limit.kc
64
Flag to enable cross-origin (CORS) support for timeline service v1.x or
Timeline Reader in timeline service v2. For timeline service v2, also add
org.apache.hadoop.security.HttpCrossOriginFilterInitializer to the
configuration hadoop.http.filter.initializers in core-site.xml.
yarn.timeline-service.http-cross-origin.enabled
false
The comma separated list of class names that implement the
RequestInterceptor interface. This is used by the RouterClientRMService
to create the request processing pipeline for users.
yarn.router.clientrm.interceptor-class.pipeline
org.apache.hadoop.yarn.server.router.clientrm.DefaultClientRequestInterceptor
The thread pool executor size of per user in Router ClientRM Service FederationClientInterceptor.
yarn.router.interceptor.user.threadpool-size
5
Size of LRU cache for Router ClientRM Service and RMAdmin Service.
yarn.router.pipeline.cache-max-size
25
The comma separated list of class names that implement the
RequestInterceptor interface. This is used by the RouterRMAdminService
to create the request processing pipeline for users.
yarn.router.rmadmin.interceptor-class.pipeline
org.apache.hadoop.yarn.server.router.rmadmin.DefaultRMAdminRequestInterceptor
The actual address the server will bind to. If this optional address is
set, the RPC and webapp servers will bind to this address and the port specified in
yarn.router.address and yarn.router.webapp.address, respectively. This is
most useful for making Router listen to all interfaces by setting to 0.0.0.0.
yarn.router.bind-host
Comma-separated list of PlacementRules to determine how applications
submitted by certain users get mapped to certain queues. Default is
user-group, which corresponds to UserGroupMappingPlacementRule.
yarn.scheduler.queue-placement-rules
user-group
The comma separated list of class names that implement the
RequestInterceptor interface. This is used by the RouterWebServices
to create the request processing pipeline for users.
yarn.router.webapp.interceptor-class.pipeline
org.apache.hadoop.yarn.server.router.webapp.DefaultRequestInterceptorREST
The http address of the Router web application.
If only a host is provided as the value,
the webapp will be served on a random port.
yarn.router.webapp.address
0.0.0.0:8089
The https address of the Router web application.
If only a host is provided as the value,
the webapp will be served on a random port.
yarn.router.webapp.https.address
0.0.0.0:8091
It is TimelineClient 1.5 configuration whether to store active
application’s timeline data with in user directory i.e
${yarn.timeline-service.entity-group-fs-store.active-dir}/${user.name}
yarn.timeline-service.entity-group-fs-store.with-user-dir
false
yarn.resource-types
The resource types to be used for scheduling. Use resource-types.xml
to specify details about the individual resource types.
yarn.webapp.filter-entity-list-by-user
false
Flag to enable display of applications per user as an admin
configuration.
yarn.webapp.filter-invalid-xml-chars
false
Flag to enable filter of invalid xml 1.0 characters present in the
value of diagnostics field of apps output from RM WebService.
The type of configuration store to use for scheduler configurations.
Default is "file", which uses file based capacity-scheduler.xml to
retrieve and change scheduler configuration. To enable API based
scheduler configuration, use either "memory" (in memory storage, no
persistence across restarts), "leveldb" (leveldb based storage), or
"zk" (zookeeper based storage). API based configuration is only useful
when using a scheduler which supports mutable configuration. Currently
only capacity scheduler supports this.
yarn.scheduler.configuration.store.class
file
The class to use for configuration mutation ACL policy if using a mutable
configuration provider. Controls whether a mutation request is allowed.
The DefaultConfigurationMutationACLPolicy checks if the requestor is a
YARN admin.
yarn.scheduler.configuration.mutation.acl-policy.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.DefaultConfigurationMutationACLPolicy
The storage path for LevelDB implementation of configuration store,
when yarn.scheduler.configuration.store.class is configured to be
"leveldb".
yarn.scheduler.configuration.leveldb-store.path
${hadoop.tmp.dir}/yarn/system/confstore
The compaction interval for LevelDB configuration store in secs,
when yarn.scheduler.configuration.store.class is configured to be
"leveldb". Default is one day.
yarn.scheduler.configuration.leveldb-store.compaction-interval-secs
86400
The max number of configuration change log entries kept in config
store, when yarn.scheduler.configuration.store.class is configured to be
"leveldb" or "zk". Default is 1000 for either.
yarn.scheduler.configuration.store.max-logs
1000
The file system directory to store the configuration files. The path
can be any format as long as it follows hadoop compatible schema,
for example value "file:///path/to/dir" means to store files on local
file system, value "hdfs:///path/to/dir" means to store files on HDFS.
If resource manager HA is enabled, recommended to use hdfs schema so
it works in fail-over scenario.
yarn.scheduler.configuration.fs.path
file://${hadoop.tmp.dir}/yarn/system/schedconf
The max number of configuration file in filesystem.
Default is 100 for either.
yarn.scheduler.configuration.max.version
100
ZK root node path for configuration store when using zookeeper-based
configuration store.
yarn.scheduler.configuration.zk-store.parent-path
/confstore
Provides an option for client to load supported resource types from RM
instead of depending on local resource-types.xml file.
yarn.client.load.resource-types.from-server
false
This setting controls if pluggable device framework is enabled.
Disabled by default
yarn.nodemanager.pluggable-device-framework.enabled
false
Configure vendor device plugin class name here. Comma separated.
The class must be found in CLASSPATH. The pluggable device framework will
load these classes.
yarn.nodemanager.pluggable-device-framework.device-classes
When yarn.nodemanager.resource.gpu.allowed-gpu-devices=auto specified,
YARN NodeManager needs to run GPU discovery binary (now only support
nvidia-smi) to get GPU-related information.
When value is empty (default), YARN NodeManager will try to locate
discovery executable itself.
An example of the config value is: /usr/local/bin/nvidia-smi
yarn.nodemanager.resource-plugins.gpu.path-to-discovery-executables
Enable additional discovery/isolation of resources on the NodeManager,
split by comma. By default, this is empty.
Acceptable values: { "yarn.io/gpu", "yarn.io/fpga"}.
yarn.nodemanager.resource-plugins
Specifies whether the initialization of the Node Manager should continue
if a certain device (GPU, FPGA, etc) was not found in the system. If set
to "true", then an exception will be thrown if a device is missing or
an error occurred during discovery.
yarn.nodemanager.resource-plugins.fail-fast
Specify GPU devices which can be managed by YARN NodeManager, split by comma
Number of GPU devices will be reported to RM to make scheduling decisions.
Set to auto (default) let YARN automatically discover GPU resource from
system.
Manually specify GPU devices if auto detect GPU device failed or admin
only want subset of GPU devices managed by YARN. GPU device is identified
by their minor device number and index. A common approach to get minor
device number of GPUs is using "nvidia-smi -q" and search "Minor Number"
output.
When manual specify minor numbers, admin needs to include indice of GPUs
as well, format is index:minor_number[,index:minor_number...]. An example
of manual specification is "0:0,1:1,2:2,3:4" to allow YARN NodeManager to
manage GPU devices with indice 0/1/2/3 and minor number 0/1/2/4.
numbers .
yarn.nodemanager.resource-plugins.gpu.allowed-gpu-devices
auto
Specify docker command plugin for GPU. By default uses Nvidia docker V1.
yarn.nodemanager.resource-plugins.gpu.docker-plugin
nvidia-docker-v1
Specify end point of nvidia-docker-plugin.
Please find documentation: https://github.com/NVIDIA/nvidia-docker/wiki
For more details.
yarn.nodemanager.resource-plugins.gpu.docker-plugin.nvidia-docker-v1.endpoint
http://localhost:3476/v1.0/docker/cli
Specify one vendor plugin to handle FPGA devices discovery/IP download/configure.
Only IntelFpgaOpenclPlugin is supported by default.
We only allow one NM configured with one vendor FPGA plugin now since the end user can put the same
vendor's cards in one host. And this also simplify our design.
yarn.nodemanager.resource-plugins.fpga.vendor-plugin.class
org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.fpga.IntelFpgaOpenclPlugin
When yarn.nodemanager.resource.fpga.allowed-fpga-devices=auto specified,
YARN NodeManager needs to run FPGA discovery binary (now only support
IntelFpgaOpenclPlugin) to get FPGA information.
When value is empty (default), YARN NodeManager will try to locate
discovery executable from vendor plugin's preference
yarn.nodemanager.resource-plugins.fpga.path-to-discovery-executables
Specify FPGA devices which can be managed by YARN NodeManager, split by comma
Number of FPGA devices will be reported to RM to make scheduling decisions.
Set to auto (default) let YARN automatically discover FPGA resource from
system.
Manually specify FPGA devices if admin only want subset of FPGA devices managed by YARN.
At present, since we can only configure one major number in c-e.cfg, FPGA device is
identified by their minor device number. A common approach to get minor
device number of FPGA is using "aocl diagnose" and check uevent with device name.
A sample manual value for this is like "0,1"
yarn.nodemanager.resource-plugins.fpga.allowed-fpga-devices
auto
Absolute path to a script or executable that returns the available FPGA cards.
The returned string must be a single line and follow the format:
"deviceA/N:M,deviceB/X:Y". Example: "acl0/243:0,acl1/243:1". The numbers after
the "/" character are the device major and minor numbers.
When the script is enabled, auto-discovery is disabled the "aocl" command is not
invoked to verify the available cards.
yarn.nodemanager.resource-plugins.fpga.device-discovery-script
List of FPGA available devices in the given node.
The value must follow the format: "deviceA/N:M,deviceB/X:Y".
Example: "acl0/243:0,acl1/243:1". The numbers after
the "/" character are the device major and minor numbers.
When this property is used, both auto-discovery and external script are ignored.
yarn.nodemanager.resource-plugins.fpga.available-devices
The http address of the timeline reader web application.
yarn.timeline-service.reader.webapp.address
${yarn.timeline-service.webapp.address}
The https address of the timeline reader web application.
yarn.timeline-service.reader.webapp.https.address
${yarn.timeline-service.webapp.https.address}
The actual address timeline reader will bind to. If this optional address is
set, the reader server will bind to this address and the port specified in
yarn.timeline-service.reader.webapp.address.
This is most useful for making the service listen to all interfaces by setting to
0.0.0.0.
yarn.timeline-service.reader.bind-host
Whether to enable the NUMA awareness for containers in Node Manager.
yarn.nodemanager.numa-awareness.enabled
false
Whether to read the NUMA topology from the system or from the
configurations. If the value is true then NM reads the NUMA topology from
system using the command 'numactl --hardware'. If the value is false then NM
reads the topology from the configurations
'yarn.nodemanager.numa-awareness.node-ids'(for node id's),
'yarn.nodemanager.numa-awareness.<NODE_ID>.memory'(for each node memory),
'yarn.nodemanager.numa-awareness.<NODE_ID>.cpus'(for each node cpus).
yarn.nodemanager.numa-awareness.read-topology
false
NUMA node id's in the form of comma separated list. Memory and No of CPUs
will be read using the properties
'yarn.nodemanager.numa-awareness.<NODE_ID>.memory' and
'yarn.nodemanager.numa-awareness.<NODE_ID>.cpus' for each id specified
in this value. This property value will be read only when
'yarn.nodemanager.numa-awareness.read-topology=false'.
For example, if yarn.nodemanager.numa-awareness.node-ids=0,1
then need to specify memory and cpus for node id's '0' and '1' like below,
yarn.nodemanager.numa-awareness.0.memory=73717
yarn.nodemanager.numa-awareness.0.cpus=4
yarn.nodemanager.numa-awareness.1.memory=73727
yarn.nodemanager.numa-awareness.1.cpus=4
yarn.nodemanager.numa-awareness.node-ids
The numactl command path which controls NUMA policy for processes or
shared memory.
yarn.nodemanager.numa-awareness.numactl.cmd
/usr/bin/numactl
Enable elastic memory control. This is a Linux only feature.
When enabled, the node manager adds a listener to receive an
event, if all the containers exceeded a limit.
The limit is specified by yarn.nodemanager.resource.memory-mb.
If this is not set, the limit is set based on the capabilities.
See yarn.nodemanager.resource.detect-hardware-capabilities
for details.
The limit applies to the physical or virtual (rss+swap) memory
depending on whether yarn.nodemanager.pmem-check-enabled or
yarn.nodemanager.vmem-check-enabled is set.
yarn.nodemanager.elastic-memory-control.enabled
false
The name of a JVM class. The class must implement the Runnable
interface. It is called,
if yarn.nodemanager.elastic-memory-control.enabled
is set and the system reaches its memory limit.
When called the handler must preempt a container,
since all containers are frozen by cgroups.
Once preempted some memory is released, so that the
kernel can resume all containers. Because of this the
handler has to act quickly.
yarn.nodemanager.elastic-memory-control.oom-handler
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.DefaultOOMHandler
The path to the oom-listener tool. Elastic memory control is only
supported on Linux. It relies on kernel events. The tool forwards
these kernel events to the standard input, so that the node manager
can preempt containers, in and out-of-memory scenario.
You rarely need to update this setting.
yarn.nodemanager.elastic-memory-control.oom-listener.path
Maximum time to wait for an OOM situation to get resolved before
bringing down the node.
yarn.nodemanager.elastic-memory-control.timeout-sec
5
URI for NodeAttributeManager. The default value is
/tmp/hadoop-yarn-${user}/node-attribute/ in the local filesystem.
yarn.node-attribute.fs-store.root-dir
Choose different implementation of node attribute's storage
yarn.node-attribute.fs-store.impl.class
org.apache.hadoop.yarn.server.resourcemanager.nodelabels.FileSystemNodeAttributeStore
CSI driver adaptor addresses on a node manager.
This configuration will be loaded by the resource manager to initiate
a client for each adaptor in order to communicate with CSI drivers.
Note, these addresses should be mapped to the adaptor addresses which
runs the controller plugin.
yarn.nodemanager.csi-driver-adaptor.addresses
CSI driver names running on this node, multiple driver names need to
be delimited by comma. The driver name should be same value returned
by the getPluginInfo call.For each of the CSI driver name, it must
to define following two corresponding properties:
"yarn.nodemanager.csi-driver.${NAME}.endpoint"
"yarn.nodemanager.csi-driver-adaptor.${NAME}.address"
The 1st property defines where the driver's endpoint is;
2nd property defines where the mapping csi-driver-adaptor's address is.
What's more, an optional csi-driver-adaptor class can be defined
for each csi-driver:
"yarn.nodemanager.csi-driver.${NAME}.class"
once given, the adaptor will be initiated with the given class instead
of the default implementation
org.apache.hadoop.yarn.csi.adaptor.DefaultCsiAdaptorImpl. User can plug
customized adaptor code for csi-driver with this configuration
if necessary.
yarn.nodemanager.csi-driver.names
The cleanup interval for activities in milliseconds.
yarn.resourcemanager.activities-manager.cleanup-interval-ms
5000
Time to live for scheduler activities in milliseconds.
yarn.resourcemanager.activities-manager.scheduler-activities.ttl-ms
600000
Time to live for app activities in milliseconds.
yarn.resourcemanager.activities-manager.app-activities.ttl-ms
600000
Max queue length for app activities.
yarn.resourcemanager.activities-manager.app-activities.max-queue-length
100
Containers launcher implementation for determining how containers
are launched within NodeManagers.
yarn.nodemanager.containers-launcher.class
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
Enable the Pre processing of Application Submission context with server side configuration
yarn.resourcemanager.submission-preprocessor.enabled
false
Path to file with hosts for the submission processor to handle.
yarn.resourcemanager.submission-preprocessor.file-path
Submission processor refresh interval
yarn.resourcemanager.submission-preprocessor.file-refresh-interval-ms
60000
Comma-separated list of partitions. If a label P is in this list,
then the RM will enforce that an app has resource requests with label
P iff that app's node label expression is P.
yarn.node-labels.exclusive-enforced-partitions
Prefix used to identify the YARN tag which contains workflow ID. If a tag coming in application
submission context has this prefix, whatever follows the prefix will be considered as workflow ID
associated with the application. This configuration is used by features such as workflow priority
for identifying the workflow associated with an application.
yarn.workflow-id.tag-prefix
workflowid:
Whether or not to allow application submissions via REST. Default is true.
yarn.webapp.enable-rest-app-submissions
true
The maximum number of application attempts. It's a global
setting for all application masters. Each application master can specify
its individual maximum number of application attempts via the API, but the
individual number cannot be more than the global upper bound. If it is,
the resourcemanager will override it. The default number value is set to
yarn.resourcemanager.am.max-attempts.
yarn.resourcemanager.am.global.max-attempts
Max number of application tags set by user in ApplicationSubmissionContext
while submitting application
yarn.resourcemanager.application.max-tags
10
Max length of each application tag set by user in ApplicationSubmissionContext
while submitting application.
yarn.resourcemanager.application.max-tag.length
100
Specifies whether application tags should be converted to lowercase or not.
yarn.resourcemanager.application-tag-based-placement.force-lowercase
true
Whether to enable RM to mark inactive nodes as untracked after the timeout
specified by yarn.resourcemanager.node-removal-untracked.timeout-ms and
then remove them from nodes list for the YARN cluster without configured
include path, which means RM can periodically clear inactive nodes to
avoid increasing memory to store these data when enabled, most desired by
elastic cloud environment with frequent auto-scaling operations.
It works only when the YARN cluster doesn't utilize include file, the key
configurations are as follows:
yarn.resourcemanager.nodes.exclude-path=/path-to-exclude-file
yarn.resourcemanager.nodes.include-path=
yarn.resourcemanager.node-removal-untracked.timeout-ms=60000
In this situation, the inactive nodes will never be marked as untracked
and removed from the nodes list unless this configuration is enabled:
yarn.resourcemanager.enable-node-untracked-without-include-path=true
yarn.resourcemanager.enable-node-untracked-without-include-path
false
yarn.scheduler.app-placement-allocator.class
In the absence of APPLICATION_PLACEMENT_TYPE_CLASS from the RM
application scheduling environments, the value of this config
is used to determine the default implementation of AppPlacementAllocator.
If APPLICATION_PLACEMENT_TYPE_CLASS is absent from the application
scheduling env and this config also has no value present, then
default implementation LocalityAppPlacementAllocator is used.
yarn.router.keytab.file
The keytab file used by router to login as its
service principal. The principal name is configured with
dfs.federation.router.kerberos.principal.
yarn.router.kerberos.principal
The Router service principal. This is typically set to
router/_HOST@REALM.TLD. Each Router will substitute _HOST with its
own fully qualified hostname at startup. The _HOST placeholder
allows using the same configuration setting on both Router setup.
yarn.router.kerberos.principal.hostname
Optional.
The hostname for the Router containing this
configuration file. Will be different for each machine.
Defaults to current hostname.