MAPREDUCE-3136. Added documentation for setting up Hadoop clusters in both non-secure and secure mode for both HDFS & YARN.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1184977 13f79535-47bb-0310-9956-ffa450edef68
2011-10-17 02:18:06 +00:00 · 2011-10-17 02:18:06 +00:00 · 86e1c9536f
commit 86e1c9536f
parent 4872615441
3 changed files with 994 additions and 1 deletions
--- a/hadoop-mapreduce-project/CHANGES.txt
+++ b/hadoop-mapreduce-project/CHANGES.txt
@ -392,6 +392,9 @@ Release 0.23.0 - Unreleased
    MAPREDUCE-3187. Add names for various unnamed threads in MR2.
    (Todd Lipcon and Siddharth Seth via mahadev)

+    MAPREDUCE-3136. Added documentation for setting up Hadoop clusters in both
+    non-secure and secure mode for both HDFS & YARN. (acmurthy)
+
  OPTIMIZATIONS

    MAPREDUCE-2026. Make JobTracker.getJobCounters() and
--- a/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
+++ b/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/CapacityScheduler.apt.vm
@ -141,7 +141,7 @@ Hadoop MapReduce Next Generation - Capacity Scheduler
 *--------------------------------------+--------------------------------------+
 | <<<yarn.resourcemanager.scheduler.class>>> | |
 | | <<<org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.CapacityScheduler>>> |
-*--------------------------------------------+--------------------------------------------+
+*--------------------------------------+--------------------------------------+

  * Setting up <queues>
   
--- a/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ClusterSetup.apt.vm
+++ b/hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ClusterSetup.apt.vm
@ -0,0 +1,990 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Hadoop Map Reduce Next Generation-${project.version} - Cluster Setup
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Hadoop MapReduce Next Generation - Cluster Setup
+
+  \[ {{{./index.html}Go Back}} \]
+
+%{toc|section=1|fromDepth=0}
+
+* {Purpose}
+
+  This document describes how to install, configure and manage non-trivial 
+  Hadoop clusters ranging from a few nodes to extremely large clusters 
+  with thousands of nodes.
+
+  To play with Hadoop, you may first want to install it on a single 
+  machine (see {{{SingleCluster}Single Node Setup}}).
+  
+* {Prerequisites}
+
+  Download a stable version of Hadoop from Apache mirrors.
+  
+* {Installation}
+
+  Installing a Hadoop cluster typically involves unpacking the software on all 
+  the machines in the cluster or installing RPMs.
+
+  Typically one machine in the cluster is designated as the NameNode and 
+  another machine the as ResourceManager, exclusively. These are the masters. 
+  
+  The rest of the machines in the cluster act as both DataNode and NodeManager. 
+  These are the slaves.
+
+* {Running Hadoop in Non-Secure Mode}
+
+  The following sections describe how to configure a Hadoop cluster.
+
+  * {Configuration Files}
+  
+    Hadoop configuration is driven by two types of important configuration files:
+
+      * Read-only default configuration - <<<core-default.xml>>>, 
+        <<<hdfs-default.xml>>>, <<<yarn-default.xml>>> and 
+        <<<mapred-default.xml>>>.
+        
+      * Site-specific configuration - <<conf/core-site.xml>>, 
+        <<conf/hdfs-site.xml>>, <<conf/yarn-site.xml>> and 
+        <<conf/mapred-site.xml>>.
+
+
+    Additionally, you can control the Hadoop scripts found in the bin/ 
+    directory of the distribution, by setting site-specific values via the 
+    <<conf/hadoop-env.sh>> and <<yarn-env.sh>>.
+
+  * {Site Configuration}
+  
+  To configure the Hadoop cluster you will need to configure the 
+  <<<environment>>> in which the Hadoop daemons execute as well as the 
+  <<<configuration parameters>>> for the Hadoop daemons.
+
+  The Hadoop daemons are NameNode/DataNode and ResourceManager/NodeManager.
+
+
+    * {Configuring Environment of Hadoop Daemons}
+    
+    Administrators should use the <<conf/hadoop-env.sh>> and 
+    <<conf/yarn-env.sh>> script to do site-specific customization of the 
+    Hadoop daemons' process environment.
+
+    At the very least you should specify the <<<JAVA_HOME>>> so that it is 
+    correctly defined on each remote node.
+
+    Administrators can configure individual daemons using the configuration 
+    options shown below in the table:
+
+*--------------------------------------+--------------------------------------+
+|| Daemon                              || Environment Variable                |
+*--------------------------------------+--------------------------------------+
+| NameNode                             | HADOOP_NAMENODE_OPTS                 |
+*--------------------------------------+--------------------------------------+
+| DataNode                             | HADOOP_DATANODE_OPTS                 |
+*--------------------------------------+--------------------------------------+
+| Backup NameNode                      | HADOOP_SECONDARYNAMENODE_OPTS        |
+*--------------------------------------+--------------------------------------+
+| ResourceManager                      | YARN_RESOURCEMANAGER_OPTS            |
+*--------------------------------------+--------------------------------------+
+| NodeManager                          | YARN_NODEMANAGER_OPTS                |
+*--------------------------------------+--------------------------------------+
+
+    For example, To configure Namenode to use parallelGC, the following 
+    statement should be added in hadoop-env.sh :
+     
+----
+    export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}" 
+----
+    
+    Other useful configuration parameters that you can customize include:
+
+      * <<<HADOOP_LOG_DIR>>> / <<<YARN_LOG_DIR>>> - The directory where the 
+        daemons' log files are stored. They are automatically created if they 
+        don't exist.
+        
+      * <<<HADOOP_HEAPSIZE>>> / <<<YARN_HEAPSIZE>>> - The maximum amount of 
+        heapsize to use, in MB e.g. 1000MB. This is used to configure the heap 
+        size for the daemon. By default, the value is 1000MB.
+
+  
+    * {Configuring the Hadoop Daemons in Non-Secure Mode}
+
+      This section deals with important parameters to be specified in 
+      the given configuration files:
+       
+      * <<<conf/core-site.xml>>>
+      
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<fs.defaultFS>>>      | NameNode URI            | <hdfs://host:port/>    |
+*-------------------------+-------------------------+------------------------+
+| <<<io.file.buffer.size>>> | 131072 |  |
+| | | Size of read/write buffer used in SequenceFiles. |
+*-------------------------+-------------------------+------------------------+
+
+      * <<<conf/hdfs-site.xml>>>
+      
+        * Configurations for NameNode:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.name.dir>>> | | | 
+| | Path on the local filesystem where the NameNode stores the namespace | | 
+| | and transactions logs persistently. | | 
+| | | If this is a comma-delimited list of directories then the name table is  |
+| | | replicated in all of the directories, for redundancy. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.hosts>>> / <<<dfs.namenode.hosts.exclude>>> | | |
+| | List of permitted/excluded DataNodes. | |
+| | | If necessary, use these files to control the list of allowable |
+| | | datanodes. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.blocksize>>> | 268435456 | |
+| | | HDFS blocksize of 256MB for large file-systems. |      
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.handler.count>>> | 100 | |
+| | | More NameNode server threads to handle RPCs from large number of |
+| | | DataNodes. |
+*-------------------------+-------------------------+------------------------+
+        
+        * Configurations for DataNode:
+      
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.data.dir>>> | | |
+| | Comma separated list of paths on the local filesystem of a | | 
+| | <<<DataNode>>> where it should store its blocks. | |
+| | | If this is a comma-delimited list of directories, then data will be | 
+| | | stored in all named directories, typically on different devices. |
+*-------------------------+-------------------------+------------------------+
+      
+      * <<<conf/yarn-site.xml>>>
+
+        * Configurations for ResourceManager:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.address>>> | | | 
+| | <<<ResourceManager>>> host:port for clients to submit jobs. | |
+| | | <host:port> |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.scheduler.address>>> | | | 
+| | <<<ResourceManager>>> host:port for ApplicationMasters to talk to | |
+| | Scheduler to obtain resources. | |
+| | | <host:port> |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.resource-tracker.address>>> | | | 
+| | <<<ResourceManager>>> host:port for NodeManagers. | |
+| | | <host:port> |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.scheduler.class>>> | | |
+| | <<<ResourceManager>>> Scheduler class. | |
+| | | <<<CapacityScheduler>>> (recommended) or <<<FifoScheduler>>> |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.acl.enable>>> | | |
+| | <<<true>>> / <<<false>>> | |
+| | | Enable ACLs? |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.admin.acl>>> | | |
+| | Admin ACL | |
+| | | ACL to set admins on the cluster. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.remote-app-log-dir>>> | | |
+| | </logs> | |
+| | | HDFS directory where the application logs are moved on application |
+| | | completion. Need to set appropriate permissions. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.nodes.include-path>>> / | | | 
+| <<<yarn.resourcemanager.nodes.exclude-path>>> | | |  
+| | List of permitted/excluded NodeManagers. | |
+| | | If necessary, use these files to control the list of allowable | 
+| | | NodeManagers. |
+*-------------------------+-------------------------+------------------------+
+| 
+        * Configurations for NodeManager:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.resource.memory-gb>>> | | |
+| | Resource i.e. available memory, in GB, for given <<<NodeManager>>> | |
+| | | Defines available resources on the <<<NodeManager>>>. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.local-dirs>>> | | |
+| | Comma-separated list of paths on the local filesystem where | |
+| | intermediate data is written. ||
+| | | Multiple paths help spread disk i/o. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.log-dirs>>> | | |
+| | Comma-separated list of paths on the local filesystem where logs  | |
+| | are written. | |
+| | | Multiple paths help spread disk i/o. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.aux-services>>> | | |
+| | mapreduce.shuffle  | |
+| | | Shuffle service that needs to be set for Map Reduce applications. |
+*-------------------------+-------------------------+------------------------+
+
+      * <<<conf/mapred-site.xml>>>
+
+        * Configurations for MapReduce Applications:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.framework.name>>> | | |
+| | yarn | |
+| | | Execution framework set to Hadoop YARN. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.map.memory.mb>>> | 1536 | |
+| | | Larger resource limit for maps. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.map.java.opts>>> | -Xmx1024M | |
+| | | Larger heap-size for child jvms of maps. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.reduce.memory.mb>>> | 3072 | |
+| | | Larger resource limit for reduces. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.reduce.java.opts>>> | -Xmx2560M | |
+| | | Larger heap-size for child jvms of reduces. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.task.io.sort.mb>>> | 512 | |
+| | | Higher memory-limit while sorting data for efficiency. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.task.io.sort.factor>>> | 100 | |
+| | | More streams merged at once while sorting files. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.reduce.shuffle.parallelcopies>>> | 50 | |
+| | | Higher number of parallel copies run by reduces to fetch outputs |
+| | | from very large number of maps. |
+*-------------------------+-------------------------+------------------------+
+
+        * Configurations for MapReduce JobHistory Server:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.address>>> | | |
+| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.webapp.address>>> | | |
+| | MapReduce JobHistory Server Web UI <host:port> | Default port is 19888. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.intermediate-done-dir>>> | /mr-history/tmp | |
+|  | | Directory where history files are written by MapReduce jobs. | 
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.done-dir>>> | /mr-history/done| |
+| | | Directory where history files are managed by the MR JobHistory Server. | 
+*-------------------------+-------------------------+------------------------+
+
+      * Hadoop Rack Awareness
+      
+      The HDFS and the YARN components are rack-aware.
+
+      The NameNode and the ResourceManager obtains the rack information of the 
+      slaves in the cluster by invoking an API <resolve> in an administrator 
+      configured module. 
+      
+      The API resolves the DNS name (also IP address) to a rack id. 
+      
+      The site-specific module to use can be configured using the configuration 
+      item <<<topology.node.switch.mapping.impl>>>. The default implementation 
+      of the same runs a script/command configured using 
+      <<<topology.script.file.name>>>. If <<<topology.script.file.name>>> is 
+      not set, the rack id </default-rack> is returned for any passed IP address. 
+
+      * Monitoring Health of NodeManagers
+      
+      Hadoop provides a mechanism by which administrators can configure the 
+      NodeManager to run an administrator supplied script periodically to 
+      determine if a node is healthy or not. 
+      
+      Administrators can determine if the node is in a healthy state by 
+      performing any checks of their choice in the script. If the script 
+      detects the node to be in an unhealthy state, it must print a line to 
+      standard output beginning with the string ERROR. The NodeManager spawns 
+      the script periodically and checks its output. If the script's output 
+      contains the string ERROR, as described above, the node's status is 
+      reported as <<<unhealthy>>> and the node is black-listed by the 
+      ResourceManager. No further tasks will be assigned to this node. 
+      However, the NodeManager continues to run the script, so that if the 
+      node becomes healthy again, it will be removed from the blacklisted nodes
+      on the ResourceManager automatically. The node's health along with the 
+      output of the script, if it is unhealthy, is available to the 
+      administrator in the ResourceManager web interface. The time since the 
+      node was healthy is also displayed on the web interface.
+
+      The following parameters can be used to control the node health 
+      monitoring script in <<<conf/yarn-site.xml>>>.
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.health-checker.script.path>>> | | |
+| | Node health script  | |
+| | | Script to check for node's health status. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.health-checker.script.opts>>> | | |
+| | Node health script options  | |
+| | | Options for script to check for node's health status. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.health-checker.script.interval-ms>>> | | |
+| | Node health script interval  | |
+| | | Time interval for running health script. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.health-checker.script.timeout-ms>>> | | |
+| | Node health script timeout interval  | |
+| | | Timeout for health script execution. |
+*-------------------------+-------------------------+------------------------+
+
+    * {Slaves file}
+      
+    Typically you choose one machine in the cluster to act as the NameNode and 
+    one machine as to act as the ResourceManager, exclusively. The rest of the 
+    machines act as both a DataNode and NodeManager and are referred to as 
+    <slaves>.
+
+    List all slave hostnames or IP addresses in your <<<conf/slaves>>> file, 
+    one per line.
+
+    * {Logging}
+    
+    Hadoop uses the Apache log4j via the Apache Commons Logging framework for 
+    logging. Edit the <<<conf/log4j.properties>>> file to customize the 
+    Hadoop daemons' logging configuration (log-formats and so on).
+    
+  * {Operating the Hadoop Cluster}
+
+  Once all the necessary configuration is complete, distribute the files to the 
+  <<<HADOOP_CONF_DIR>>> directory on all the machines.
+
+    * Hadoop Startup
+  
+    To start a Hadoop cluster you will need to start both the HDFS and YARN 
+    cluster.
+
+    Format a new distributed filesystem:
+  
+----
+  $ $HADOOP_HDFS_HOME/bin/hdfs namenode -format <cluster_name>
+----
+
+    Start the HDFS with the following command, run on the designated NameNode:
+  
+----
+  $ $HADOOP_HDFS_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to start DataNodes on all slaves:
+
+----
+  $ $HADOOP_HDFS_HOME/bin/hdfs start datanode --config $HADOOP_CONF_DIR  
+----    	  
+  
+    Start the YARN with the following command, run on the designated 
+    ResourceManager:
+  
+----
+  $ $YARN_HOME/bin/yarn start resourcemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to start NodeManagers on all slaves:
+
+----
+  $ $YARN_HOME/bin/hdfs start nodemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Start the MapReduce JobHistory Server with the following command, run on the  
+    designated server:
+  
+----
+  $ $YARN_HOME/bin/yarn start historyserver --config $HADOOP_CONF_DIR  
+----    	  
+
+    * Hadoop Shutdown      
+
+    Stop the NameNode with the following command, run on the designated 
+    NameNode:
+  
+----
+  $ $HADOOP_HDFS_HOME/bin/hdfs stop namenode --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to stop DataNodes on all slaves:
+
+----
+  $ $HADOOP_HDFS_HOME/bin/hdfs stop datanode --config $HADOOP_CONF_DIR  
+----    	  
+  
+    Stop the ResourceManager with the following command, run on the designated 
+    ResourceManager:
+  
+----
+  $ $YARN_HOME/bin/yarn stop resourcemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to stop NodeManagers on all slaves:
+
+----
+  $ $YARN_HOME/bin/hdfs stop nodemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Stop the MapReduce JobHistory Server with the following command, run on the  
+    designated server:
+  
+----
+  $ $YARN_HOME/bin/yarn stop historyserver --config $HADOOP_CONF_DIR  
+----    	  
+
+    
+* {Running Hadoop in Secure Mode}
+
+  This section deals with important parameters to be specified in 
+  to run Hadoop in <<secure mode>> with strong, Kerberos-based
+  authentication.
+      
+  * <<<User Acccounts for Hadoop Daemons>>>
+      
+  Ensure that HDFS and YARN daemons run as different Unix users, for e.g.
+  <<<hdfs>>> and <<<yarn>>>. Also, ensure that the MapReduce JobHistory
+  server runs as user <<<mapred>>>. 
+      
+  It's recommended to have them share a Unix group, for e.g. <<<hadoop>>>.
+      
+*--------------------------------------+--------------------------------------+
+|| User:Group                          || Daemons                             |
+*--------------------------------------+--------------------------------------+
+| hdfs:hadoop                          | NameNode, Backup NameNode, DataNode  |
+*--------------------------------------+--------------------------------------+
+| yarn:hadoop                          | ResourceManager, NodeManager         |
+*--------------------------------------+--------------------------------------+
+| mapred:hadoop                        | MapReduce JobHistory Server          |  
+*--------------------------------------+--------------------------------------+
+      
+  * <<<Permissions for both HDFS and local fileSystem paths>>>
+     
+  The following table lists various paths on HDFS and local filesystems (on
+  all nodes) and recommended permissions:
+   
+*-------------------+-------------------+------------------+------------------+
+|| Filesystem       || Path             || User:Group      || Permissions     |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<dfs.namenode.name.dir>>> | hdfs:hadoop | drwx------ | 
+*-------------------+-------------------+------------------+------------------+
+| local | <<<dfs.datanode.data.dir>>> | hdfs:hadoop | drwx------ |
+*-------------------+-------------------+------------------+------------------+
+| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | container-executor | root:hadoop | --Sr-s--- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | / | hdfs:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | /user | hdfs:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<yarn.nodemanager.remote-app-log-dir>>> | yarn:hadoop | drwxrwxrwxt |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<mapreduce.jobhistory.intermediate-done-dir>>> | mapred:hadoop | |
+| | | | drwxrwxrwxt |      
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<mapreduce.jobhistory.done-dir>>> | mapred:hadoop | |
+| | | | drwxr-x--- |      
+*-------------------+-------------------+------------------+------------------+
+
+  * Kerberos Keytab files
+  
+    * HDFS
+    
+    The NameNode keytab file, on the NameNode host, should look like the 
+    following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nn.service.keytab 
+Keytab name: FILE:/etc/security/keytab/nn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+
+    The Secondary NameNode keytab file, on that host, should look like the 
+    following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/sn.service.keytab 
+Keytab name: FILE:/etc/security/keytab/sn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+
+    The DataNode keytab file, on each host, should look like the following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/dn.service.keytab 
+Keytab name: FILE:/etc/security/keytab/dn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+    
+    * YARN
+    
+    The ResourceManager keytab file, on the ResourceManager host, should look  
+    like the following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/rm.service.keytab 
+Keytab name: FILE:/etc/security/keytab/rm.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+
+    The NodeManager keytab file, on each host, should look like the following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nm.service.keytab 
+Keytab name: FILE:/etc/security/keytab/nm.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+    
+    * MapReduce JobHistory Server
+
+    The MapReduce JobHistory Server keytab file, on that host, should look  
+    like the following:
+    
+----
+
+$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/jhs.service.keytab 
+Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC) 
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5) 
+
+----
+    
+  * Configuration in Secure Mode
+  
+    * <<<conf/core-site.xml>>>
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.security.authentication>>> | <kerberos> | <simple> is non-secure. |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.security.authorization>>> | <true> | |
+| | | Enable RPC service-level authorization. |
+*-------------------------+-------------------------+------------------------+
+
+    * <<<conf/hdfs-site.xml>>>
+    
+      * Configurations for NameNode:
+    
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.block.access.token.enable>>> | <true> |  |
+| | | Enable HDFS block access tokens for secure operations. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.https.enable>>> | <true> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.https-address>>> | <nn_host_fqdn:50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.https.port>>> | <50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.keytab.file>>> | </etc/security/keytab/nn.service.keytab> | |
+| | | Kerberos keytab file for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.kerberos.principal>>> | nn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.kerberos.https.principal>>> | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+
+      * Configurations for Secondary NameNode:
+    
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.https-address>>> | <c_nn_host_fqdn:50090> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.secondary.https.port>>> | <50090> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.keytab.file>>> | | | 
+| | </etc/security/keytab/sn.service.keytab> | |
+| | | Kerberos keytab file for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.kerberos.principal>>> | sn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the Secondary NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.kerberos.https.principal>>> | | |
+| | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the Secondary NameNode. |
+*-------------------------+-------------------------+------------------------+
+
+      * Configurations for DataNode:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.data.dir.perm>>> | 700 | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.address>>> | <0.0.0.0:2003> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.https.address>>> | <0.0.0.0:2005> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.keytab.file>>> | </etc/security/keytab/dn.service.keytab> | |
+| | | Kerberos keytab file for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.kerberos.principal>>> | dn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.kerberos.https.principal>>> | | |
+| | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+
+    * <<<conf/yarn-site.xml>>>
+    
+      * LinuxContainerExecutor
+      
+      A <<<ContainerExecutor>>> used by YARN framework which define how any
+      <container> launched and controlled. 
+      
+      The following are the available in Hadoop YARN:
+
+*--------------------------------------+--------------------------------------+
+|| ContainerExecutor                   || Description                         |
+*--------------------------------------+--------------------------------------+
+| <<<DefaultContainerExecutor>>>             | |
+| | The default executor which YARN uses to manage container execution. |
+| | The container process has the same Unix user as the NodeManager.  |
+*--------------------------------------+--------------------------------------+
+| <<<LinuxContainerExecutor>>>               | |
+| | Supported only on GNU/Linux, this executor runs the containers as the | 
+| | user who submitted the application. It requires all user accounts to be |
+| | created on the cluster nodes where the containers are launched. It uses |
+| | a <setuid> executable that is included in the Hadoop distribution. |
+| | The NodeManager uses this executable to launch and kill containers. |
+| | The setuid executable switches to the user who has submitted the |
+| | application and launches or kills the containers. For maximum security, |
+| | this executor sets up restricted permissions and user/group ownership of |
+| | local files and directories used by the containers such as the shared |
+| | objects, jars, intermediate files, log files etc. Particularly note that, |
+| | because of this, except the application owner and NodeManager, no other |
+| | user can access any of the local files/directories including those |
+| | localized as part of the distributed cache. |
+*--------------------------------------+--------------------------------------+
+
+      To build the LinuxContainerExecutor executable run:
+        
+----
+ $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
+----
+        
+      The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the 
+      path on the cluster nodes where a configuration file for the setuid 
+      executable should be located. The executable should be installed in
+      $YARN_HOME/bin.
+
+      The executable must have specific permissions: 6050 or --Sr-s--- 
+      permissions user-owned by <root> (super-user) and group-owned by a 
+      special group (e.g. <<<hadoop>>>) of which the NodeManager Unix user is 
+      the group member and no ordinary application user is. If any application 
+      user belongs to this special group, security will be compromised. This 
+      special group name should be specified for the configuration property 
+      <<<yarn.nodemanager.linux-container-executor.group>>> in both 
+      <<<conf/yarn-site.xml>>> and <<<conf/container-executor.cfg>>>. 
+      
+      For example, let's say that the NodeManager is run as user <yarn> who is 
+      part of the groups users and <hadoop>, any of them being the primary group.
+      Let also be that <users> has both <yarn> and another user 
+      (application submitter) <alice> as its members, and <alice> does not 
+      belong to <hadoop>. Going by the above description, the setuid/setgid 
+      executable should be set 6050 or --Sr-s--- with user-owner as <yarn> and 
+      group-owner as <hadoop> which has <yarn> as its member (and not <users> 
+      which has <alice> also as its member besides <yarn>).
+
+      The LinuxTaskController requires that paths including and leading up to 
+      the directories specified in <<<yarn.nodemanager.local-dirs>>> and 
+      <<<yarn.nodemanager.log-dirs>>> to be set 755 permissions as described 
+      above in the table on permissions on directories.
+
+        * <<<conf/container-executor.cfg>>>
+        
+        The executable requires a configuration file called 
+        <<<container-executor.cfg>>> to be present in the configuration 
+        directory passed to the mvn target mentioned above. 
+
+        The configuration file must be owned by the user running NodeManager 
+        (user <<<yarn>>> in the above example), group-owned by anyone and 
+        should have the permissions 0400 or r--------.
+
+        The executable requires following configuration items to be present 
+        in the <<<conf/container-executor.cfg>>> file. The items should be 
+        mentioned as simple key=value pairs, one per-line:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.local-dirs>>> | |
+| | Comma-separated list of NodeManager local directories. | |
+| | | Paths to NodeManager local directories. Should be same as the value |
+| | | which was provided to key in <<<conf/yarn-site.xml>>>. This is |
+| | | required to validate paths passed to the setuid executable in order |
+| | to prevent arbitrary paths being passed to it. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
+| | | Unix group of the NodeManager. The group owner of the |
+| | |<container-executor> binary should be this group. Should be same as the |
+| | | value with which the NodeManager is configured. This configuration is |
+| | | required for validating the secure access of the <container-executor> |
+| | | binary. |        
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.log-dirs>>> | |
+| | Comma-separated list of NodeManager log directories. | |
+| | | Paths to NodeManager log directories. Should be same as the value |
+| | | which was provided to key in <<<conf/yarn-site.xml>>>. This is |
+| | | required to set proper permissions on the log files so that they can |
+| | | be written to by the user's containers and read by the NodeManager for |
+| | | <log aggregation>. |
+*-------------------------+-------------------------+------------------------+
+| <<<banned.users>>> | hfds,yarn,mapred,bin | Banned users. |
+*-------------------------+-------------------------+------------------------+
+| <<<min.user.id>>> | 1000 | Prevent other super-users. |      
+*-------------------------+-------------------------+------------------------+
+
+      To re-cap, here are the local file-ssytem permissions required for the 
+      various paths related to the <<<LinuxContainerExecutor>>>:
+      
+*-------------------+-------------------+------------------+------------------+
+|| Filesystem       || Path             || User:Group      || Permissions     |
+*-------------------+-------------------+------------------+------------------+
+| local | container-executor | root:hadoop | --Sr-s--- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+      
+      * Configurations for ResourceManager:
+    
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.keytab>>> | | |
+| | </etc/security/keytab/rm.service.keytab> | |
+| | | Kerberos keytab file for the ResourceManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.principal>>> | rm/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the ResourceManager. |
+*-------------------------+-------------------------+------------------------+
+      
+      * Configurations for NodeManager:
+      
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.keytab>>> | </etc/security/keytab/nm.service.keytab> | |
+| | | Kerberos keytab file for the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.principal>>> | nm/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.container-executor.class>>> | | |
+| | <<<org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor>>> | 
+| | | Use LinuxContainerExecutor. | 
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
+| | | Unix group of the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+
+    * <<<conf/mapred-site.xml>>>
+    
+      * Configurations for MapReduce JobHistory Server:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.address>>> | | |
+| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.keytab>>> | |
+| | </etc/security/keytab/jhs.service.keytab> | |
+| | | Kerberos keytab file for the MapReduce JobHistory Server. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.principal>>> | mapred/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the MapReduce JobHistory Server. |
+*-------------------------+-------------------------+------------------------+
+
+        
+  * {Operating the Hadoop Cluster}
+
+  Once all the necessary configuration is complete, distribute the files to the 
+  <<<HADOOP_CONF_DIR>>> directory on all the machines.
+
+  This section also describes the various Unix users who should be starting the
+  various components and uses the same Unix accounts and groups used previously:
+  
+    * Hadoop Startup
+  
+    To start a Hadoop cluster you will need to start both the HDFS and YARN 
+    cluster.
+
+    Format a new distributed filesystem as <hdfs>:
+  
+----
+[hdfs]$ $HADOOP_HDFS_HOME/bin/hdfs namenode -format <cluster_name>
+----
+
+    Start the HDFS with the following command, run on the designated NameNode
+    as <hdfs>:
+  
+----
+[hdfs]$ $HADOOP_HDFS_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to start DataNodes on all slaves as <root> with a special
+    environment variable <<<HADOOP_SECURE_DN_USER>>> set to <hdfs>:
+
+----
+[root]$ HADOOP_SECURE_DN_USER=hdfs $HADOOP_HDFS_HOME/bin/hdfs start datanode --config $HADOOP_CONF_DIR  
+----    	  
+  
+    Start the YARN with the following command, run on the designated 
+    ResourceManager as <yarn>:
+  
+----
+[yarn]$ $YARN_HOME/bin/yarn start resourcemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to start NodeManagers on all slaves as <yarn>:
+
+----
+[yarn]$ $YARN_HOME/bin/hdfs start nodemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Start the MapReduce JobHistory Server with the following command, run on the  
+    designated server as <mapred>:
+  
+----
+[mapred]$ $YARN_HOME/bin/yarn start historyserver --config $HADOOP_CONF_DIR  
+----    	  
+
+    * Hadoop Shutdown      
+
+    Stop the NameNode with the following command, run on the designated NameNode
+    as <hdfs>:
+  
+----
+[hdfs]$ $HADOOP_HDFS_HOME/bin/hdfs stop namenode --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to stop DataNodes on all slaves as <root>:
+
+----
+[root]$ $HADOOP_HDFS_HOME/bin/hdfs stop datanode --config $HADOOP_CONF_DIR  
+----    	  
+  
+    Stop the ResourceManager with the following command, run on the designated 
+    ResourceManager as <yarn>:
+  
+----
+[yarn]$ $YARN_HOME/bin/yarn stop resourcemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Run a script to stop NodeManagers on all slaves as <yarn>:
+
+----
+[yarn]$ $YARN_HOME/bin/hdfs stop nodemanager --config $HADOOP_CONF_DIR  
+----    	  
+
+    Stop the MapReduce JobHistory Server with the following command, run on the  
+    designated server as <mapred>:
+  
+----
+[mapred]$ $YARN_HOME/bin/yarn stop historyserver --config $HADOOP_CONF_DIR  
+----    	  
+    
+* {Web Interfaces}      
+
+    Once the Hadoop cluster is up and running check the web-ui of the 
+    components as described below:
+    
+*-------------------------+-------------------------+------------------------+
+|| Daemon                 || Web Interface          || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| NameNode | http://<nn_host:port>/ | Default HTTP port is 50070. |
+*-------------------------+-------------------------+------------------------+
+| ResourceManager | http://<rm_host:port>/ | Default HTTP port is 8088. |
+*-------------------------+-------------------------+------------------------+
+| MapReduce JobHistory Server | http://<jhs_host:port>/ | |
+| | | Default HTTP port is 19888. |
+*-------------------------+-------------------------+------------------------+
+    
+