diff --git a/CHANGES.txt b/CHANGES.txt index bc2000db2c..1f5243d0d7 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -958,6 +958,9 @@ Release 0.21.0 - Unreleased HADOOP-6668. Apply audience and stability annotations to classes in common. (tomwhite) + HADOOP-6821. Document changes to memory monitoring. (Hemanth Yamijala + via tomwhite) + OPTIMIZATIONS HADOOP-5595. NameNode does not need to run a replicator to choose a diff --git a/src/docs/src/documentation/content/xdocs/cluster_setup.xml b/src/docs/src/documentation/content/xdocs/cluster_setup.xml index 7c52eebfc0..e2dbc6485f 100644 --- a/src/docs/src/documentation/content/xdocs/cluster_setup.xml +++ b/src/docs/src/documentation/content/xdocs/cluster_setup.xml @@ -739,147 +739,251 @@ -
- Memory management -

Users/admins can also specify the maximum virtual memory - of the launched child-task, and any sub-process it launches - recursively, using mapred.{map|reduce}.child.ulimit. Note - that the value set here is a per process limit. - The value for mapred.{map|reduce}.child.ulimit should be - specified in kilo bytes (KB). And also the value must be greater than - or equal to the -Xmx passed to JavaVM, else the VM might not start. +

+ Configuring Memory Parameters for MapReduce Jobs +

+ As MapReduce jobs could use varying amounts of memory, Hadoop + provides various configuration options to users and administrators + for managing memory effectively. Some of these options are job + specific and can be used by users. While setting up a cluster, + administrators can configure appropriate default values for these + options so that users jobs run out of the box. Other options are + cluster specific and can be used by administrators to enforce + limits and prevent misconfigured or memory intensive jobs from + causing undesired side effects on the cluster.

+

+ The values configured should + take into account the hardware resources of the cluster, such as the + amount of physical and virtual memory available for tasks, + the number of slots configured on the slaves and the requirements + for other processes running on the slaves. If right values are not + set, it is likely that jobs start failing with memory related + errors or in the worst case, even affect other tasks or + the slaves themselves. +

+ +
+ Monitoring Task Memory Usage +

+ Before describing the memory options, it is + useful to look at a feature provided by Hadoop to monitor + memory usage of MapReduce tasks it runs. The basic objective + of this feature is to prevent MapReduce tasks from consuming + memory beyond a limit that would result in their affecting + other processes running on the slave, including other tasks + and daemons like the DataNode or TaskTracker. +

-

Note: mapred.{map|reduce}.child.java.opts are used only for - configuring the launched child tasks from task tracker. Configuring - the memory options for daemons is documented in - - cluster_setup.html

- -

The memory available to some parts of the framework is also - configurable. In map and reduce tasks, performance may be influenced - by adjusting parameters influencing the concurrency of operations and - the frequency with which data will hit disk. Monitoring the filesystem - counters for a job- particularly relative to byte counts from the map - and into the reduce- is invaluable to the tuning of these - parameters.

+

+ Note: For the time being, this feature is available + only for the Linux platform. +

+ +

+ Hadoop allows monitoring to be done both for virtual + and physical memory usage of tasks. This monitoring + can be done independently of each other, and therefore the + options can be configured independently of each other. It + has been found in some environments, particularly related + to streaming, that virtual memory recorded for tasks is high + because of libraries loaded by the programs used to run + the tasks. However, this memory is largely unused and does + not affect the slaves's memory itself. In such cases, + monitoring based on physical memory can provide a more + accurate picture of memory usage. +

+ +

+ This feature considers that there is a limit on + the amount of virtual or physical memory on the slaves + that can be used by + the running MapReduce tasks. The rest of the memory is + assumed to be required for the system and other processes. + Since some jobs may require higher amount of memory for their + tasks than others, Hadoop allows jobs to specify how much + memory they expect to use at a maximum. Then by using + resource aware scheduling and monitoring, Hadoop tries to + ensure that at any time, only enough tasks are running on + the slaves as can meet the dual constraints of an individual + job's memory requirements and the total amount of memory + available for all MapReduce tasks. +

+ +

+ The TaskTracker monitors tasks in regular intervals. Each time, + it operates in two steps: +

+ +
    + +
  • + In the first step, it + checks that a job's task and any child processes it + launches are not cumulatively using more virtual or physical + memory than specified. If both virtual and physical memory + monitoring is enabled, then virtual memory usage is checked + first, followed by physical memory usage. + Any task that is found to + use more memory is killed along with any child processes it + might have launched, and the task status is marked + failed. Repeated failures such as this will terminate + the job. +
  • + +
  • + In the next step, it checks that the cumulative virtual and + physical memory + used by all running tasks and their child processes + does not exceed the total virtual and physical memory limit, + respectively. Again, virtual memory limit is checked first, + followed by physical memory limit. In this case, it kills + enough number of tasks, along with any child processes they + might have launched, until the cumulative memory usage + is brought under limit. In the case of virtual memory limit + being exceeded, the tasks chosen for killing are + the ones that have made the least progress. In the case of + physical memory limit being exceeded, the tasks chosen + for killing are the ones that have used the maximum amount + of physical memory. Also, the status + of these tasks is marked as killed, and hence repeated + occurrence of this will not result in a job failure. +
  • + +
+ +

+ In either case, the task's diagnostic message will indicate the + reason why the task was terminated. +

+ +

+ Resource aware scheduling can ensure that tasks are scheduled + on a slave only if their memory requirement can be satisfied + by the slave. The Capacity Scheduler, for example, + takes virtual memory requirements into account while + scheduling tasks, as described in the section on + + memory based scheduling. +

+ +

+ Memory monitoring is enabled when certain configuration + variables are defined with non-zero values, as described below. +

+
- Memory monitoring -

A TaskTracker(TT) can be configured to monitor memory - usage of tasks it spawns, so that badly-behaved jobs do not bring - down a machine due to excess memory consumption. With monitoring - enabled, every task is assigned a task-limit for virtual memory (VMEM). - In addition, every node is assigned a node-limit for VMEM usage. - A TT ensures that a task is killed if it, and - its descendants, use VMEM over the task's per-task limit. It also - ensures that one or more tasks are killed if the sum total of VMEM - usage by all tasks, and their descendents, cross the node-limit.

- -

Users can, optionally, specify the VMEM task-limit per job. If no - such limit is provided, a default limit is used. A node-limit can be - set per node.

-

Currently the memory monitoring and management is only supported - in Linux platform.

-

To enable monitoring for a TT, the - following parameters all need to be set:

- - - - - - - - - -
NameTypeDescription
mapred.tasktracker.vmem.reservedlongA number, in bytes, that represents an offset. The total VMEM on - the machine, minus this offset, is the VMEM node-limit for all - tasks, and their descendants, spawned by the TT. -
mapred.task.default.maxvmemlongA number, in bytes, that represents the default VMEM task-limit - associated with a task. Unless overridden by a job's setting, - this number defines the VMEM task-limit. -
mapred.task.limit.maxvmemlongA number, in bytes, that represents the upper VMEM task-limit - associated with a task. Users, when specifying a VMEM task-limit - for their tasks, should not specify a limit which exceeds this amount. -
- -

In addition, the following parameters can also be configured.

- - - - - - -
NameTypeDescription
mapreduce.tasktracker.taskmemorymanager.monitoringintervallongThe time interval, in milliseconds, between which the TT - checks for any memory violation. The default value is 5000 msec - (5 seconds). -
- -

Here's how the memory monitoring works for a TT.

-
    -
  1. If one or more of the configuration parameters described - above are missing or -1 is specified , memory monitoring is - disabled for the TT. -
  2. -
  3. In addition, monitoring is disabled if - mapred.task.default.maxvmem is greater than - mapred.task.limit.maxvmem. -
  4. -
  5. If a TT receives a task whose task-limit is set by the user - to a value larger than mapred.task.limit.maxvmem, it - logs a warning but executes the task. -
  6. -
  7. Periodically, the TT checks the following: -
      -
    • If any task's current VMEM usage is greater than that task's - VMEM task-limit, the task is killed and reason for killing - the task is logged in task diagonistics . Such a task is considered - failed, i.e., the killing counts towards the task's failure count. -
    • -
    • If the sum total of VMEM used by all tasks and descendants is - greater than the node-limit, the TT kills enough tasks, in the - order of least progress made, till the overall VMEM usage falls - below the node-limt. Such killed tasks are not considered failed - and their killing does not count towards the tasks' failure counts. -
    • -
    -
  8. -
- -

Schedulers can choose to ease the monitoring pressure on the TT by - preventing too many tasks from running on a node and by scheduling - tasks only if the TT has enough VMEM free. In addition, Schedulers may - choose to consider the physical memory (RAM) available on the node - as well. To enable Scheduler support, TTs report their memory settings - to the JobTracker in every heartbeat. Before getting into details, - consider the following additional memory-related parameters than can be - configured to enable better scheduling:

- - - - - -
NameTypeDescription
mapred.tasktracker.pmem.reservedintA number, in bytes, that represents an offset. The total - physical memory (RAM) on the machine, minus this offset, is the - recommended RAM node-limit. The RAM node-limit is a hint to a - Scheduler to scheduler only so many tasks such that the sum - total of their RAM requirements does not exceed this limit. - RAM usage is not monitored by a TT. -
- -

A TT reports the following memory-related numbers in every - heartbeat:

-
    -
  • The total VMEM available on the node.
  • -
  • The value of mapred.tasktracker.vmem.reserved, - if set.
  • -
  • The total RAM available on the node.
  • -
  • The value of mapred.tasktracker.pmem.reserved, - if set.
  • -
+ Job Specific Options +

+ Memory related options that can be configured individually per + job are described in detail in the section on + + Configuring Memory Requirements For A Job in the MapReduce + tutorial. While setting up + the cluster, the Hadoop defaults for these options can be reviewed + and changed to better suit the job profiles expected to be run on + the clusters, as also the hardware configuration. +

+

+ As with any other configuration option in Hadoop, if the + administrators desire to prevent users from overriding these + options in jobs they submit, these values can be marked as + final in the cluster configuration. +

+ +
+ Cluster Specific Options + +

+ This section describes the memory related options that are + used by the JobTracker and TaskTrackers, and cannot be changed + by jobs. The values set for these options should be the same + for all the slave nodes in a cluster. +

+ +
    + +
  • + mapreduce.cluster.{map|reduce}memory.mb: These + options define the default amount of virtual memory that should be + allocated for MapReduce tasks running in the cluster. They + typically match the default values set for the options + mapreduce.{map|reduce}.memory.mb. They help in the + calculation of the total amount of virtual memory available for + MapReduce tasks on a slave, using the following equation:
    + Total virtual memory for all MapReduce tasks = + (mapreduce.cluster.mapmemory.mb * + mapreduce.tasktracker.map.tasks.maximum) + + (mapreduce.cluster.reducememory.mb * + mapreduce.tasktracker.reduce.tasks.maximum)
    + Typically, reduce tasks require more memory than map tasks. + Hence a higher value is recommended for + mapreduce.cluster.reducememory.mb. The value is + specified in MB. To set a value of 2GB for reduce tasks, set + mapreduce.cluster.reducememory.mb to 2048. +
  • + +
  • + mapreduce.jobtracker.max{map|reduce}memory.mb: + These options define the maximum amount of virtual memory that + can be requested by jobs using the parameters + mapreduce.{map|reduce}.memory.mb. The system + will reject any job that is submitted requesting for more + memory than these limits. Typically, the values for these + options should be set to satisfy the following constraint:
    + mapreduce.jobtracker.maxmapmemory.mb = + mapreduce.cluster.mapmemory.mb * + mapreduce.tasktracker.map.tasks.maximum
    + mapreduce.jobtracker.maxreducememory.mb = + mapreduce.cluster.reducememory.mb * + mapreduce.tasktracker.reduce.tasks.maximum

    + The value is specified in MB. If + mapreduce.cluster.reducememory.mb is set to 2GB and + there are 2 reduce slots configured in the slaves, the value + for mapreduce.jobtracker.maxreducememory.mb should + be set to 4096. +
  • + +
  • + mapreduce.tasktracker.reserved.physicalmemory.mb: + This option defines the amount of physical memory that is + marked for system and daemon processes. Using this, the amount + of physical memory available for MapReduce tasks is calculated + using the following equation:
    + Total physical memory for all MapReduce tasks = + Total physical memory available on the system - + mapreduce.tasktracker.reserved.physicalmemory.mb
    + The value is specified in MB. To set this value to 2GB, + specify the value as 2048. +
  • + +
  • + mapreduce.tasktracker.taskmemorymanager.monitoringinterval: + This option defines the time the TaskTracker waits between + two cycles of memory monitoring. The value is specified in + milliseconds. +
  • + +
+ +

+ Note: The virtual memory monitoring function is only + enabled if + the variables mapreduce.cluster.{map|reduce}memory.mb + and mapreduce.jobtracker.max{map|reduce}memory.mb + are set to values greater than zero. Likewise, the physical + memory monitoring function is only enabled if the variable + mapreduce.tasktracker.reserved.physicalmemory.mb + is set to a value greater than zero. +

+
+
+ +
Task Controllers

Task controllers are classes in the Hadoop Map/Reduce diff --git a/src/docs/src/documentation/content/xdocs/site.xml b/src/docs/src/documentation/content/xdocs/site.xml index 1004832499..e71e8a5cca 100644 --- a/src/docs/src/documentation/content/xdocs/site.xml +++ b/src/docs/src/documentation/content/xdocs/site.xml @@ -72,9 +72,12 @@ See http://forrest.apache.org/docs/linking.html for more info. - + + + +