12 KiB
Hadoop: YARN Resource Configuration
Overview
YARN supports an extensible resource model. By default YARN tracks CPU and memory for all nodes, applications, and queues, but the resource definition can be extended to include arbitrary "countable" resources. A countable resource is a resource that is consumed while a container is running, but is released afterwards. CPU and memory are both countable resources. Other examples include GPU resources and software licenses.
In addition, YARN also supports the use of "resource profiles", which allow a user to specify multiple resource requests through a single profile, similar to Amazon Web Services Elastic Compute Cluster instance types. For example, "large" might mean 8 virtual cores and 16GB RAM.
Configuration
The following configuration properties are supported. See below for details.
yarn-site.xml
Configuration Property | Description |
---|---|
yarn.resourcemanager.resource-profiles.enabled |
Indicates whether resource profiles support is enabled. Defaults to false . |
resource-types.xml
Configuration Property | Description |
---|---|
yarn.resource-types |
Comma-separated list of additional resources. May not include memory , memory-mb , or vcores |
yarn.resource-types.<resource>.units |
Default unit for the specified resource type |
yarn.resource-types.<resource>.minimum-allocation |
The minimum request for the specified resource type |
yarn.resource-types.<resource>.maximum-allocation |
The maximum request for the specified resource type |
node-resources.xml
Configuration Property | Description |
---|---|
yarn.nodemanager.resource-type.<resource> |
The count of the specified resource available from the node manager |
Please note that the resource-types.xml
and node-resources.xml
files
also need to be placed in the same configuration directory as yarn-site.xml
if
they are used. Alternatively, the properties may be placed into the
yarn-site.xml
file instead.
YARN Resource Model
Resource Manager
The resource manager is the final arbiter of what resources in the cluster are tracked. The resource manager loads its resource definition from XML configuration files. For example, to define a new resource in addition to CPU and memory, the following property should be configured:
<configuration>
<property>
<name>yarn.resource-types</name>
<value>resource1,resource2</value>
<description>
The resources to be used for scheduling. Use resource-types.xml
to specify details about the individual resource types.
</description>
</property>
</configuration>
A valid resource name must begin with a letter and contain only letters, numbers, and any of: '.', '_', or '-'. A valid resource name may also be optionally preceded by a name space followed by a slash. A valid name space consists of period-separated groups of letters, numbers, and dashes. For example, the following are valid resource names:
- myresource
- my_resource
- My-Resource01
- com.acme/myresource
The following are examples of invalid resource names:
- 10myresource
- my resource
- com/acme/myresource
- $NS/myresource
- -none-/myresource
For each new resource type defined an optional unit property can be added to set the default unit for the resource type. Valid values are:
Unit Name | Meaning |
---|---|
p | pico |
n | nano |
u | micro |
m | milli |
default, i.e. no unit | |
k | kilo |
M | mega |
G | giga |
T | tera |
P | peta |
Ki | binary kilo, i.e. 1024 |
Mi | binary mega, i.e. 1024^2 |
Gi | binary giga, i.e. 1024^3 |
Ti | binary tera, i.e. 1024^4 |
Pi | binary peta, i.e. 1024^5 |
The property must be named yarn.resource-types.<resource>.units
. Each defined
resource may also have optional minimum and maximum properties. The properties
must be named yarn.resource-types.<resource>.minimum-allocation
and
yarn.resource-types.<resource>.maximum-allocation
.
The yarn.resource-types
property and any unit, mimimum, or maximum properties
may be defined in either the usual yarn-site.xml
file or in a file named
resource-types.xml
. For example, the following could appear in either file:
<configuration>
<property>
<name>yarn.resource-types</name>
<value>resource1, resource2</value>
</property>
<property>
<name>yarn.resource-types.resource1.units</name>
<value>G</value>
</property>
<property>
<name>yarn.resource-types.resource2.minimum-allocation</name>
<value>1</value>
</property>
<property>
<name>yarn.resource-types.resource2.maximum-allocation</name>
<value>1024</value>
</property>
</configuration>
Node Manager
Each node manager independently defines the resources that are available from
that node. The resource definition is done through setting a property for each
available resource. The property must be named
yarn.nodemanager.resource-type.<resource>
and may be placed in the usual
yarn-site.xml
file or in a file named noderesources.xml
. The value of the
property should be the amount of that resource offered by the node. For
example:
<configuration>
<property>
<name>yarn.nodemanager.resource-type.resource1</name>
<value>5G</value>
</property>
<property>
<name>yarn.nodemanager.resource-type.resource2</name>
<value>2m</value>
</property>
</configuration>
Note that the units used for these resources need not match the definition held by the resource manager. If the units do not match, the resource manager will automatically do a conversion.
Using Resources With MapReduce
MapReduce requests three different kinds of containers from YARN: the application master container, map containers, and reduce containers. For each container type, there is a corresponding set of properties that can be used to set the resources requested.
The properties for setting resource requests in MapReduce are:
Property | Description |
---|---|
yarn.app.mapreduce.am.resource.mb |
Sets the memory requested for the application master container to the value in MB. No longer preferred. Use yarn.app.mapreduce.am.resource.memory-mb instead. Defaults to 1536. |
yarn.app.mapreduce.am.resource.memory |
Sets the memory requested for the application master container to the value in MB. No longer preferred. Use yarn.app.mapreduce.am.resource.memory-mb instead. Defaults to 1536. |
yarn.app.mapreduce.am.resource.memory-mb |
Sets the memory requested for the application master container to the value in MB. Defaults to 1536. |
yarn.app.mapreduce.am.resource.cpu-vcores |
Sets the CPU requested for the application master container to the value. No longer preferred. Use yarn.app.mapreduce.am.resource.vcores instead. Defaults to 1. |
yarn.app.mapreduce.am.resource.vcores |
Sets the CPU requested for the application master container to the value. Defaults to 1. |
yarn.app.mapreduce.am.resource.<resource> |
Sets the quantity requested of <resource> for the application master container to the value. If no unit is specified, the default unit for the resource is assumed. See the section on units above. |
mapreduce.map.memory.mb |
Sets the memory requested for the all map task containers to the value in MB. No longer preferred. Use mapreduce.map.resource.memory-mb instead. Defaults to 1024. |
mapreduce.map.resource.memory |
Sets the memory requested for the all map task containers to the value in MB. No longer preferred. Use mapreduce.map.resource.memory-mb instead. Defaults to 1024. |
mapreduce.map.resource.memory-mb |
Sets the memory requested for the all map task containers to the value in MB. Defaults to 1024. |
mapreduce.map.cpu.vcores |
Sets the CPU requested for the all map task containers to the value. No longer preferred. Use mapreduce.map.resource.vcores instead. Defaults to 1. |
mapreduce.map.resource.vcores |
Sets the CPU requested for the all map task containers to the value. Defaults to 1. |
mapreduce.map.resource.<resource> |
Sets the quantity requested of <resource> for the all map task containers to the value. If no unit is specified, the default unit for the resource is assumed. See the section on units above. |
mapreduce.reduce.memory.mb |
Sets the memory requested for the all reduce task containers to the value in MB. No longer preferred. Use mapreduce.reduce.resource.memory-mb instead. Defaults to 1024. |
mapreduce.reduce.resource.memory |
Sets the memory requested for the all reduce task containers to the value in MB. No longer preferred. Use mapreduce.reduce.resource.memory-mb instead. Defaults to 1024. |
mapreduce.reduce.resource.memory-mb |
Sets the memory requested for the all reduce task containers to the value in MB. Defaults to 1024. |
mapreduce.reduce.cpu.vcores |
Sets the CPU requested for the all reduce task containers to the value. No longer preferred. Use mapreduce.reduce.resource.vcores instead. Defaults to 1. |
mapreduce.reduce.resource.vcores |
Sets the CPU requested for the all reduce task containers to the value. Defaults to 1. |
mapreduce.reduce.resource.<resource> |
Sets the quantity requested of <resource> for the all reduce task containers to the value. If no unit is specified, the default unit for the resource is assumed. See the section on units above. |
Note that these resource requests may be modified by YARN to meet the configured
minimum and maximum resource values or to be a multiple of the configured
increment. See the yarn.scheduler.maximum-allocation-mb
,
yarn.scheduler.minimum-allocation-mb
,
yarn.scheduler.increment-allocation-mb
,
yarn.scheduler.maximum-allocation-vcores
,
yarn.scheduler.minimum-allocation-vcores
, and
yarn.scheduler.increment-allocation-vcores
properties in the YARN scheduler
configuration.
Resource Profiles
Resource profiles provides an easy way for users to request a set of resources with a single profile and a means for administrators to regulate how resources are consumed.
To configure resource types, the administrator must set
yarn.resourcemanager.resource-profiles.enabled
to true
in the resource
manager's yarn-site.xml
file. This file defines the supported profiles.
For example:
{
"small": {
"memory-mb" : 1024,
"vcores" : 1
},
"default" : {
"memory-mb" : 2048,
"vcores" : 2
},
"large" : {
"memory-mb": 4096,
"vcores" : 4
},
"compute" : {
"memory-mb" : 2048,
"vcores" : 2,
"gpu" : 1
}
}
In this example, users have access to four profiles with different resource
settings. Note that in the compute
profile, the administrator has configured
an additional resource as described above.
Requesting Profiles
The distributed shell is currently the only client that supports resource profiles. Using the distributed shell, the user can specify a resource profile name which will automatically be translated into an appropriate set of resource requests.
For example:
hadoop job $DISTSHELL -jar $DISTSHELL -shell_command run.sh -container_resource_profile small