hadoop/hadoop-common-project/hadoop-common/src/site/markdown/CLIMiniCluster.md.vm

69 lines
3.3 KiB
Plaintext
Raw Normal View History

<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
Hadoop: CLI MiniCluster.
========================
* [Hadoop: CLI MiniCluster.](#Hadoop:_CLI_MiniCluster.)
* [Purpose](#Purpose)
* [Hadoop Tarball](#Hadoop_Tarball)
* [Running the MiniCluster](#Running_the_MiniCluster)
Purpose
-------
Using the CLI MiniCluster, users can simply start and stop a single-node Hadoop cluster with a single command, and without the need to set any environment variables or manage configuration files. The CLI MiniCluster starts both a `YARN`/`MapReduce` & `HDFS` clusters.
This is useful for cases where users want to quickly experiment with a real Hadoop cluster or test non-Java programs that rely on significant Hadoop functionality.
Hadoop Tarball
--------------
You should be able to obtain the Hadoop tarball from the release. Also, you can directly create a tarball from the source:
$ mvn clean install -DskipTests
$ mvn package -Pdist -Dtar -DskipTests -Dmaven.javadoc.skip
**NOTE:** You will need [protoc 2.5.0](http://code.google.com/p/protobuf/) installed.
The tarball should be available in `hadoop-dist/target/` directory.
Running the MiniCluster
-----------------------
From inside the root directory of the extracted tarball, you can start the CLI MiniCluster using the following command:
$ bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-${project.version}-tests.jar minicluster -rmport RM_PORT -jhsport JHS_PORT
In the example command above, `RM_PORT` and `JHS_PORT` should be replaced by the user's choice of these port numbers. If not specified, random free ports will be used.
There are a number of command line arguments that the users can use to control which services to start, and to pass other configuration properties. The available command line arguments:
$ -D <property=value> Options to pass into configuration object
$ -datanodes <arg> How many datanodes to start (default 1)
$ -format Format the DFS (default false)
$ -help Prints option help.
$ -jhsport <arg> JobHistoryServer port (default 0--we choose)
$ -namenode <arg> URL of the namenode (default is either the DFS
$ cluster or a temporary dir)
$ -nnport <arg> NameNode port (default 0--we choose)
$ -nodemanagers <arg> How many nodemanagers to start (default 1)
$ -nodfs Don't start a mini DFS cluster
$ -nomr Don't start a mini MR cluster
$ -rmport <arg> ResourceManager port (default 0--we choose)
$ -writeConfig <path> Save configuration to this XML file.
$ -writeDetails <path> Write basic information to this JSON file.
To display this full list of available arguments, the user can pass the `-help` argument to the above command.