diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index 77aab5002a..ca69f28166 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -318,6 +318,9 @@ Release 0.23.0 - Unreleased MAPREDUCE-3092. Removed a special comparator for JobIDs in JobHistory as JobIDs are already comparable. (Devaraj K via vinodkv) + MAPREDUCE-3099. Add docs for setting up a single node MRv2 cluster. + (mahadev) + OPTIMIZATIONS MAPREDUCE-2026. Make JobTracker.getJobCounters() and diff --git a/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/SingleCluster.apt.vm b/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/SingleCluster.apt.vm new file mode 100644 index 0000000000..affb277b7f --- /dev/null +++ b/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/SingleCluster.apt.vm @@ -0,0 +1,180 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster. + --- + --- + ${maven.build.timestamp} + +Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. + + \[ {{{./index.html}Go Back}} \] + +* Mapreduce Tarball + + You should be able to obtain the MapReduce tarball from the release. + If not, you should be able to create a tarball from the source. + ++---+ +$ mvn clean install -DskipTests +$ cd hadoop-mapreduce-project +$ mvn clean install assembly:assembly ++---+ + <> You will need protoc installed of version 2.4.1 or greater. + + To ignore the native builds in mapreduce you can use <<<-P-cbuild>>> argument + for maven. The tarball should be available in <<>> directory. + + +* Setting up the environment. + + Assuming you have installed hadoop-common/hadoop-hdfs and exported + <<$HADOOP_COMMON_HOME>>/<<$HADOOP_COMMON_HOME>>, untar hadoop mapreduce + tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the + untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>. + + <> The following instructions assume you have hdfs running. + +* Setting up Configuration. + + To start the ResourceManager and NodeManager, you will have to update the configs. + Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed + configs for HDFS and <<>>. There are 2 config files you will have to setup + <<>> and <<>>. + +** Setting up <<>> + + Add the following configs to your <<>>. + ++---+ + + mapreduce.cluster.temp.dir + + No description + true + + + + mapreduce.cluster.local.dir + + No description + true + ++---+ + +** Setting up <<>> + +Add the following configs to your <<>> + ++---+ + + yarn.resourcemanager.resource-tracker.address + host:port + host is the hostname of the resource manager and + port is the port on which the NodeManagers contact the Resource Manager. + + + + + yarn.resourcemanager.scheduler.address + host:port + host is the hostname of the resourcemanager and port is the port + on which the Applications in the cluster talk to the Resource Manager. + + + + + yarn.resourcemanager.scheduler.class + org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler + In case you do not want to use the default scheduler + + + + yarn.resourcemanager.address + host:port + the host is the hostname of the ResourceManager and the port is the port on + which the clients can talk to the Resource Manager. + + + + yarn.nodemanager.local-dirs + + the local directories used by the nodemanager + + + + yarn.nodemanager.address + 0.0.0.0:port + the nodemanagers bind to this port + + + + yarn.nodemanager.resource.memory-gb + 10 + the amount of memory on the NodeManager in GB + + + + yarn.nodemanager.remote-app-log-dir + /app-logs + directory on hdfs where the application logs are moved to + + + + yarn.nodemanager.log-dirs + + the directories used by Nodemanagers as log directories + + + + yarn.nodemanager.aux-services + mapreduce.shuffle + shuffle service that needs to be set for Map Reduce to run + ++---+ + +* Create Symlinks. + + You will have to create the following symlinks: + ++---+ +$ cd $HADOOP_COMMON_HOME/share/hadoop/common/lib/ +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-app-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-jobclient-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-common-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-shuffle-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-core-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-common-*-SNAPSHOT.jar . +$ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-api-*-SNAPSHOT.jar . ++---+ +* Running daemons. + + Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>, + <<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately. + Set $<<$YARN_CONF_DIR>> the same as $<> + + Run ResourceManager and NodeManager as: + ++---+ +$ cd $HADOOP_MAPRED_HOME +$ bin/yarn-daemon.sh start resourcemanager +$ bin/yarn-daemon.sh start nodemanager ++---+ + + You should be up and running. You can run randomwriter as: + ++---+ +$ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out ++---+ + +Good luck. diff --git a/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/index.apt.vm b/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/index.apt.vm new file mode 100644 index 0000000000..db9fe87034 --- /dev/null +++ b/hadoop-mapreduce-project/hadoop-yarn/src/site/apt/index.apt.vm @@ -0,0 +1,39 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop MapReduce Next Generation ${project.version} + --- + --- + ${maven.build.timestamp} + +Hadoop MapReduce Next Generation + +* Architecture + + The new architecture introduced in 0.23, divides the two major functions + of the JobTracker, resource management and job scheduling/monitoring, into separate + components. + The new ResourceManager manages the global assignment of compute resources to applications + and the per-application ApplicationMaster manages the application’s scheduling and coordination. + An application is either a single job in the classic MapReduce jobs or a DAG of such jobs. + The ResourceManager and per-machine NodeManager server, which manages the user processes on that + machine, form the computation fabric. The per-application ApplicationMaster is, in effect, a + framework specific library and is tasked with negotiating resources from the ResourceManager + and working with the NodeManager(s) to execute and monitor the tasks. + +* User Documentation + + * {{{./SingleCluster.html}SingleCluster}} + + * {{{./apidocs/index.html}JavaDocs}} + diff --git a/hadoop-mapreduce-project/hadoop-yarn/src/site/site.xml b/hadoop-mapreduce-project/hadoop-yarn/src/site/site.xml new file mode 100644 index 0000000000..35a75cb2e5 --- /dev/null +++ b/hadoop-mapreduce-project/hadoop-yarn/src/site/site.xml @@ -0,0 +1,34 @@ + + + + + + +   + + + + org.apache.maven.skins + maven-stylus-skin + 1.1 + + + + + + + + +