HDFS-11963. Ozone: Documentation: Add getting started page. Contributed by Anu Engineer.

2017-06-19 21:29:18 -07:00 · 2017-06-19 21:29:18 -07:00 · 8d289df627
commit 8d289df627
parent e73d285567
3 changed files with 409 additions and 0 deletions
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/OzoneCommandShell.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/OzoneCommandShell.md
@ -0,0 +1,146 @@
 <!---
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
 -->
 Ozone Command Shell
 ===================
 Ozone command shell gives a command shell interface to work against ozone.
 Please note that this  document assumes that cluster is deployed
 with simple authentication.
 The Ozone commands take the following format.
 * `hdfs oz --command_ http://hostname:port/volume/bucket/key -user
 <name> -root`
 The *--root* option is a command line short cut that allows *hdfs oz*
 commands to be run as the user that started the cluster. This is useful to
 indicate that you want the commands to be run as some admin user. The only
 reason for this option is that it makes the life of a lazy developer more
 easier.
 Ozone Volume Commands
 --------------------
 The volume commands allow users to create, delete and list the volumes in the
 ozone cluster.
 ### Create Volume
 Volumes can be created only by Admins. Here is an example of creating a volume.
 * `hdfs oz -createVolume http://localhost:9864/hive -user bilbo -quota
 100TB -root`
 The above command creates a volume called `hive` owned by user `bilbo`. The
 `--root` option allows the command to be executed as user `hdfs` which is an
 admin in the cluster.
 ### Update Volume
 Updates information like ownership and quota on an existing volume.
 * `hdfs oz  -updateVolume  http://localhost:9864/hive -quota 500TB -root`
 The above command changes the volume quota of hive from 100TB to 500TB.
 ### Delete Volume
 Deletes a Volume if it is empty.
 * `hdfs oz -deleteVolume http://localhost:9864/hive -root`
 ### Info Volume
 Info volume command allows the owner or the administrator of the cluster to read meta-data about a specific volume.
 * `hdfs oz -infoVolume http://localhost:9864/hive -root`
 ### List Volumes
 List volume command can be used by administrator to list volumes of any user. It can also be used by a user to list volumes owned by him.
 * `hdfs oz -listVolume http://localhost:9864/ -user bilbo -root`
 The above command lists all volumes owned by user bilbo.
 Ozone Bucket Commands
 --------------------
 Bucket commands follow a similar pattern as volume commands. However bucket commands are designed to be run by the owner of the volume.
 Following examples assume that these commands are run by the owner of the volume or bucket.
 ### Create Bucket
 Create bucket call allows the owner of a volume to create a bucket.
 * `hdfs oz -createBucket http://localhost:9864/hive/january`
 This call creates a bucket called `january` in the volume called `hive`. If
 the volume does not exist, then this call will fail.
 ### Update Bucket
 Updates bucket meta-data, like ACLs.
 * `hdfs oz -updateBucket http://localhost:9864/hive/january  -addAcl
 user:spark:rw`
 ### Delete Bucket
 Deletes a bucket if it is empty.
 * `hdfs oz -deleteBucket http://localhost:9864/hive/january`
 ### Info Bucket
 Returns information about a given bucket.
 * `hdfs oz -infoBucket http://localhost:9864/hive/january`
 ### List Buckets
 List buckets on a given volume.
 * `hdfs oz -listtBucket http://localhost:9864/hive`
 Ozone Key Commands
 ------------------
 Ozone key commands allows users to put, delete and get keys from ozone buckets.
 ### Put Key
 Creates or overwrites a key in ozone store, -file points to the file you want
 to upload.
 * `hdfs oz -putKey  http://localhost:9864/hive/january/processed.orc  -file
 processed.orc`
 ### Get Key
 Downloads a file from the ozone bucket.
 * `hdfs oz -getKey  http://localhost:9864/hive/january/processed.orc  -file
  processed.orc.copy`
 ### Delete Key
 Deletes a key  from the ozone store.
 * `hdfs oz -deleteKey http://localhost:9864/hive/january/processed.orc`
 ### Info Key
 Reads  key metadata from the ozone store.
 * `hdfs oz -infoKey http://localhost:9864/hive/january/processed.orc`
 ### List Keys
 List all keys in an ozone bucket.
 * `hdfs oz -listKey  http://localhost:9864/hive/january`
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/OzoneGettingStarted.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/OzoneGettingStarted.md
@ -0,0 +1,257 @@
 <!---
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
   http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
 -->
 Ozone - Object store for Hadoop
 ==============================
 Introduction
 ------------
 Ozone is an object store for Hadoop. It  is a redundant, distributed object
 store build by leveraging primitives present in HDFS. Ozone supports REST
 API for accessing the store.
 Getting Started
 ---------------
 Ozone is a work in progress and  currently lives in its own branch. To
 use it, you have to build a package by yourself and deploy a cluster.
 ### Building Ozone
 To build Ozone, please checkout the hadoop sources from github. Then
 checkout the ozone branch, HDFS-7240 and build it.
 - `git checkout HDFS-7240`
 - `mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Pdist -Dtar -DskipShade`
 skipShade is just to make compilation faster and not really required.
 This will give you a tarball in your distribution directory. This is the
 tarball that can be used for deploying your hadoop cluster. Here is an
 example of the tarball that will be generated.
 * `~/apache/hadoop/hadoop-dist/target/hadoop-3.0.0-alpha4-SNAPSHOT.tar.gz`
 Please proceed to setup a hadoop cluster by creating the hdfs-site.xml and
 other configuration files that are needed for your cluster.
 ### Ozone Configuration
 Ozone relies on its own configuration file called `ozone-site.xml`. It is
 just for convenience and ease of management --  you can add these settings
 to `hdfs-site.xml`, if you don't want to keep ozone settings separate.
 This document refers to `ozone-site.xml` so that ozone settings are in one
 place  and not mingled with HDFS settings.
 * _*ozone.enabled*_  This is the most important setting for ozone.
 Currently, Ozone is an opt-in subsystem of HDFS. By default, Ozone is
 disabled. Setting this flag to `true` enables ozone in the HDFS cluster.
 Here is an example,
 ```
    <property>
       <name>ozone.enabled</name>
       <value>True</value>
    </property>
 ```
 *  _*ozone.container.metadata.dirs*_ Ozone is designed with modern hardware
 in mind. It tries to use SSDs effectively. So users can specify where the
 datanode metadata must reside. Usually you pick your fastest disk (SSD if
 you have them on your datanodes). Datanodes will write the container metadata
 to these disks. This is a required setting, if this is missing datanodes will
 fail to come up. Here is an example,
 ```
   <property>
      <name>ozone.container.metadata.dirs</name>
      <value>/data/disk1/container/meta</value>
   </property>
 ```
 * _*ozone.scm.names*_ Ozone is build on top of container framework (See Ozone
 Architecture TODO). Storage container manager(SCM) is a distributed block
 service which is used by ozone and other storage services.
 This property allows datanodes to discover where SCM is, so that
 datanodes can send heartbeat to SCM. SCM is designed to be highly available
 and datanodes assume there are multiple instances of SCM which form a highly
 available ring. The HA feature of SCM is a work in progress. So we
 configure ozone.scm.names to be a single machine. Here is an example,
 ```
    <property>
      <name>ozone.scm.names</name>
      <value>scm.hadoop.apache.org</value>
    </property>
 ```
 * _*ozone.scm.datanode.id*_ Each datanode that speaks to SCM generates an ID
 just like HDFS. This ID is stored is a location pointed by this setting. If
 this setting is not valid, datanodes will fail to come up. Please note:
 This path that is will created by datanodes to store the datanode ID. Here is an example,
 ```
   <property>
      <name>ozone.scm.datanode.id</name>
      <value>/data/disk1/scm/meta/node/datanode.id</value>
   </property>
 ```
 * _*ozone.scm.block.client.address*_ Storage Container Manager(SCM) offers a
 set of services that can be used to build a distributed storage system. One
 of the services offered is the block services. KSM and HDFS would use this
 service. This property describes where KSM can discover SCM's block service
 endpoint. There is corresponding ports etc, but assuming that we are using
 default ports, the server address is the only required field. Here is an
 example,
 ```
    <property>
      <name>ozone.scm.block.client.address</name>
      <value>scm.hadoop.apache.org</value>
    </property>
 ```
 * _*ozone.ksm.address*_ KSM server address. This is used by Ozonehandler and
 Ozone File System.
 ```
    <property>
       <name>ozone.ksm.address</name>
       <value>ksm.hadoop.apache.org</value>
    </property>
 ```
 Here is a quick summary of settings needed by Ozone.
 | Setting                        | Value                        | Comment |
 |--------------------------------|------------------------------|------------------------------------------------------------------|
 | ozone.enabled                  | True                         | This enables SCM and  containers in HDFS cluster.                |
 | ozone.container.metadata.dirs  | file path                    | The container metadata will be stored here in the datanode.      |
 | ozone.scm.names                | SCM server name              | Hostname:port or or IP:port address of SCM.                      |
 | ozone.scm.datanode.id          | file path                    | Data node ID is the location of  datanode's ID file              |
 | ozone.scm.block.client.address | SCM server name              | Used by services like KSM                                        |
 | ozone.ksm.address              | KSM server name              | Used by Ozone handler and Ozone file system.                     |
 Here is a working example of`ozone-site.xml`.
 ```
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
      <property>
          <name>ozone.enabled</name>
          <value>True</value>
        </property>
        <property>
          <name>ozone.container.metadata.dirs</name>
          <value>/data/disk1/scm/meta</value>
        </property>
        <property>
          <name>ozone.scm.names</name>
          <value>scm.hadoop.apache.org</value>
        </property>
        <property>
          <name>ozone.scm.datanode.id</name>
          <value>/data/disk1/scm/meta/node/datanode.id</value>
        </property>
        <property>
          <name>ozone.scm.block.client.address</name>
          <value>scm.hadoop.apache.org</value>
        </property>
         <property>
            <name>ozone.ksm.address</name>
            <value>ksm.hadoop.apache.org</value>
          </property>
    </configuration>
 ```
 ### Starting Ozone
 Ozone is designed to run concurrently with HDFS. The simplest way to [start
 HDFS](../hadoop-common/ClusterSetup.html) is to run `start-dfs.sh` from the
 `$HADOOP/sbin/start-dfs.sh`. Once HDFS
 is running, please verify it is fully functional by running some commands like
   - *./hdfs dfs -mkdir /usr*
   - *./hdfs dfs -ls /*
 Once you are sure that HDFS is running, start Ozone. To start  ozone, you
 need to start SCM and KSM. Currently we assume that both KSM and SCM
  is running on the same node, this will change in future.
   - `./hdfs --daemon start scm`
   - `./hdfs --daemon start ksm`
 if you would like to start HDFS and Ozone together, you can do that by running
 a single command.
 - `$HADOOP/sbin/start-ozone.sh`
 This command will start HDFS and then start the ozone components.
 Once you have ozone running you can use these ozone [shell](./OzoneCommandShell.html)
 commands to  create a  volume, bucket and keys.
 ### Diagnosing issues
 Ozone tries not to pollute the existing HDFS streams of configuration and
 logging. So ozone logs are by default configured to be written to a file
 called `ozone.log`. This is controlled by the settings in `log4j.properties`
 file in the hadoop configuration directory.
 Here is the log4j properties that are added by ozone.
 ```
   #
   # Add a logger for ozone that is separate from the Datanode.
   #
   #log4j.debug=true
   log4j.logger.org.apache.hadoop.ozone=DEBUG,OZONE,FILE
   # Do not log into datanode logs. Remove this line to have single log.
   log4j.additivity.org.apache.hadoop.ozone=false
   # For development purposes, log both to console and log file.
   log4j.appender.OZONE=org.apache.log4j.ConsoleAppender
   log4j.appender.OZONE.Threshold=info
   log4j.appender.OZONE.layout=org.apache.log4j.PatternLayout
   log4j.appender.OZONE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p \
    %X{component} %X{function} %X{resource} %X{user} %X{request} - %m%n
   # Real ozone logger that writes to ozone.log
   log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender
   log4j.appender.FILE.File=${hadoop.log.dir}/ozone.log
   log4j.appender.FILE.Threshold=debug
   log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
   log4j.appender.FILE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p \
     (%F:%L) %X{function} %X{resource} %X{user} %X{request} - \
      %m%n
 ```
 If you would like to have a single datanode log instead of ozone stuff
 getting written to ozone.log, please remove this line or set this to true.
 ` log4j.additivity.org.apache.hadoop.ozone=false`
 On the SCM/KSM side, you will be able to see
  - `hadoop-hdfs-ksm-hostname.log`
  - `hadoop-hdfs-scm-hostname.log`
 Please file any issues you see under [Object store in HDFS (HDFS-7240)](https://issues.apache.org/jira/browse/HDFS-7240)
 as this is still a work in progress.
--- a/hadoop-project/src/site/site.xml
+++ b/hadoop-project/src/site/site.xml
@ -109,6 +109,12 @@
      <item name="Provided Storage" href="hadoop-project-dist/hadoop-hdfs/HdfsProvidedStorage.html"/>
    </menu>
    <menu name="Ozone" inherit="top">
      <item name="Getting Started" href="hadoop-project-dist/hadoop-hdfs/OzoneGettingStarted.html"/>
      <item name="Commands Reference" href="hadoop-project-dist/hadoop-hdfs/OzoneCommandShell.html"/>
      <item name="Ozone Metrics" href="hadoop-project-dist/hadoop-hdfs/Ozonemetrics.html"/>
    </menu>
    <menu name="MapReduce" inherit="top">
      <item name="Tutorial" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html"/>
      <item name="Commands Reference" href="hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredCommands.html"/>