From b661dcf563c0b3cb6fe6f22bb3a39f87e3ec1c57 Mon Sep 17 00:00:00 2001 From: Nanda kumar Date: Wed, 21 Aug 2019 22:47:41 +0530 Subject: [PATCH] HDDS-2002. Update documentation for 0.4.1 release. Signed-off-by: Anu Engineer --- hadoop-hdds/docs/content/beyond/Containers.md | 65 ++++--- .../docs/content/beyond/DockerCheatSheet.md | 7 +- .../docs/content/beyond/RunningWithHDFS.md | 2 +- hadoop-hdds/docs/content/concept/Datanodes.md | 6 +- hadoop-hdds/docs/content/concept/Hdds.md | 2 +- hadoop-hdds/docs/content/concept/Overview.md | 6 +- .../docs/content/concept/OzoneManager.md | 20 +-- hadoop-hdds/docs/content/interface/JavaApi.md | 8 +- hadoop-hdds/docs/content/interface/OzoneFS.md | 8 +- hadoop-hdds/docs/content/interface/S3.md | 18 +- hadoop-hdds/docs/content/recipe/Prometheus.md | 22 +-- .../docs/content/recipe/SparkOzoneFSK8S.md | 38 ++-- hadoop-hdds/docs/content/recipe/_index.md | 3 +- .../content/security/SecuityWithRanger.md | 4 +- .../docs/content/security/SecureOzone.md | 169 +++++++++--------- .../content/security/SecuringDatanodes.md | 13 +- .../docs/content/security/SecuringS3.md | 6 +- .../docs/content/security/SecuringTDE.md | 5 +- .../docs/content/security/SecurityAcls.md | 35 ++-- .../docs/content/shell/BucketCommands.md | 30 +--- hadoop-hdds/docs/content/shell/KeyCommands.md | 24 ++- .../docs/content/shell/VolumeCommands.md | 20 +-- hadoop-hdds/docs/content/start/Kubernetes.md | 2 +- hadoop-hdds/docs/content/start/OnPrem.md | 7 +- .../docs/content/start/StartFromDockerHub.md | 7 +- 25 files changed, 269 insertions(+), 258 deletions(-) diff --git a/hadoop-hdds/docs/content/beyond/Containers.md b/hadoop-hdds/docs/content/beyond/Containers.md index b4dc94fc2e..ea7e3b17c4 100644 --- a/hadoop-hdds/docs/content/beyond/Containers.md +++ b/hadoop-hdds/docs/content/beyond/Containers.md @@ -25,8 +25,9 @@ Docker heavily is used at the ozone development with three principal use-cases: * __dev__: * We use docker to start local pseudo-clusters (docker provides unified environment, but no image creation is required) * __test__: - * We create docker images from the dev branches to test ozone in kubernetes and other container orchestator system - * We provide _apache/ozone_ images for each release to make it easier the evaluation of Ozone. These images are __not__ created __for production__ usage. + * We create docker images from the dev branches to test ozone in kubernetes and other container orchestrator system + * We provide _apache/ozone_ images for each release to make it easier for evaluation of Ozone. + These images are __not__ created __for production__ usage. * __production__: - * We document how can you create your own docker image for your production cluster. + * We have documentation on how you can create your own docker image for your production cluster. Let's check out each of the use-cases in more detail: @@ -46,38 +47,41 @@ Ozone artifact contains example docker-compose directories to make it easier to From distribution: -``` +```bash cd compose/ozone docker-compose up -d ``` -After a local build +After a local build: -``` +```bash cd hadoop-ozone/dist/target/ozone-*/compose docker-compose up -d ``` These environments are very important tools to start different type of Ozone clusters at any time. -To be sure that the compose files are up-to-date, we also provide acceptance test suites which start the cluster and check the basic behaviour. +To be sure that the compose files are up-to-date, we also provide acceptance test suites which start +the cluster and check the basic behaviour. -The acceptance tests are part of the distribution, and you can find the test definitions in `./smoketest` directory. +The acceptance tests are part of the distribution, and you can find the test definitions in `smoketest` directory. You can start the tests from any compose directory: For example: -``` +```bash cd compose/ozone ./test.sh ``` ### Implementation details -`./compose` tests are based on the apache/hadoop-runner docker image. The image itself doesn't contain any Ozone jar file or binary just the helper scripts to start ozone. +`compose` tests are based on the apache/hadoop-runner docker image. The image itself does not contain +any Ozone jar file or binary just the helper scripts to start ozone. -hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone distribution itself is mounted from the including directory: +hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone distribution itself +is mounted from the including directory: (Example docker-compose fragment) @@ -91,7 +95,9 @@ hadoop-runner provdes a fixed environment to run Ozone everywhere, but the ozone ``` -The containers are conigured based on environment variables, but because the same environment variables should be set for each containers we maintain the list of the environment variables in a separated file: +The containers are configured based on environment variables, but because the same environment +variables should be set for each containers we maintain the list of the environment variables +in a separated file: ``` scm: @@ -111,23 +117,32 @@ OZONE-SITE.XML_ozone.enabled=True #... ``` -As you can see we use naming convention. Based on the name of the environment variable, the appropariate hadoop config XML (`ozone-site.xml` in our case) will be generated by a [script](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) which is included in the `hadoop-runner` base image. +As you can see we use naming convention. Based on the name of the environment variable, the +appropriate hadoop config XML (`ozone-site.xml` in our case) will be generated by a +[script](https://github.com/apache/hadoop/tree/docker-hadoop-runner-latest/scripts) which is +included in the `hadoop-runner` base image. -The [entrypoint](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter.sh) of the `hadoop-runner` image contains a helper shell script which triggers this transformation and cab do additional actions (eg. initialize scm/om storage, download required keytabs, etc.) based on environment variables. +The [entrypoint](https://github.com/apache/hadoop/blob/docker-hadoop-runner-latest/scripts/starter.sh) +of the `hadoop-runner` image contains a helper shell script which triggers this transformation and +can do additional actions (eg. initialize scm/om storage, download required keytabs, etc.) +based on environment variables. ## Test/Staging -The `docker-compose` based approach is recommended only for local test not for multi node cluster. To use containers on a multi-node cluster we need a Container Orchestrator like Kubernetes. +The `docker-compose` based approach is recommended only for local test, not for multi node cluster. +To use containers on a multi-node cluster we need a Container Orchestrator like Kubernetes. Kubernetes example files are included in the `kubernetes` folder. -*Please note*: all the provided images are based the `hadoop-runner` image which contains all the required tool for testing in staging environments. For production we recommend to create your own, hardened image with your own base image. +*Please note*: all the provided images are based the `hadoop-runner` image which contains all the +required tool for testing in staging environments. For production we recommend to create your own, +hardened image with your own base image. ### Test the release The release can be tested with deploying any of the example clusters: -``` +```bash cd kubernetes/examples/ozone kubectl apply -f ``` @@ -139,13 +154,13 @@ Plese note that in this case the latest released container will be downloaded fr To test a development build you can create your own image and upload it to your own docker registry: -``` +```bash mvn clean install -f pom.ozone.xml -DskipTests -Pdocker-build,docker-push -Ddocker.image=myregistry:9000/name/ozone ``` The configured image will be used in all the generated kubernetes resources files (`image:` keys are adjusted during the build) -``` +```bash cd kubernetes/examples/ozone kubectl apply -f ``` @@ -160,10 +175,12 @@ adjust base image, umask, security settings, user settings according to your own You can use the source of our development images as an example: - * Base image: https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile - * Docker image: https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/Dockerfile + * [Base image] (https://github.com/apache/hadoop/blob/docker-hadoop-runner-jdk11/Dockerfile) + * [Docker image] (https://github.com/apache/hadoop/blob/trunk/hadoop-ozone/dist/src/main/docker/Dockerfile) - Most of the elements are optional and just helper function but to use the provided example kubernetes resources you may need the scripts from [here](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts) + Most of the elements are optional and just helper function but to use the provided example + kubernetes resources you may need the scripts from + [here](https://github.com/apache/hadoop/tree/docker-hadoop-runner-jdk11/scripts) * The two python scripts convert environment variables to real hadoop XML config files * The start.sh executes the python scripts (and other initialization) based on environment variables. @@ -205,7 +222,7 @@ Ozone related container images and source locations: This is the base image used for testing Hadoop Ozone. This is a set of utilities that make it easy for us run ozone. - + diff --git a/hadoop-hdds/docs/content/beyond/DockerCheatSheet.md b/hadoop-hdds/docs/content/beyond/DockerCheatSheet.md index f481ccc8e4..f4f5492cf1 100644 --- a/hadoop-hdds/docs/content/beyond/DockerCheatSheet.md +++ b/hadoop-hdds/docs/content/beyond/DockerCheatSheet.md @@ -22,7 +22,9 @@ weight: 4 limitations under the License. --> -In the `compose` directory of the ozone distribution there are multiple pseudo-cluster setup which can be used to run Ozone in different way (for example with secure cluster, with tracing enabled, with prometheus etc.). +In the `compose` directory of the ozone distribution there are multiple pseudo-cluster setup which +can be used to run Ozone in different way (for example: secure cluster, with tracing enabled, +with prometheus etc.). If the usage is not document in a specific directory the default usage is the following: @@ -31,8 +33,7 @@ cd compose/ozone docker-compose up -d ``` -The data of the container is ephemeral and deleted together with the docker volumes. To force the deletion of existing data you can always delete all the temporary data: - +The data of the container is ephemeral and deleted together with the docker volumes. ```bash docker-compose down ``` diff --git a/hadoop-hdds/docs/content/beyond/RunningWithHDFS.md b/hadoop-hdds/docs/content/beyond/RunningWithHDFS.md index 6294ee71e5..154be5332b 100644 --- a/hadoop-hdds/docs/content/beyond/RunningWithHDFS.md +++ b/hadoop-hdds/docs/content/beyond/RunningWithHDFS.md @@ -56,7 +56,7 @@ To start ozone with HDFS you should start the the following components: 2. HDFS Datanode (from the Hadoop distribution with the plugin on the classpath from the Ozone distribution) 3. Ozone Manager (from the Ozone distribution) - 4. Storage Container manager (from the Ozone distribution) + 4. Storage Container Manager (from the Ozone distribution) Please check the log of the datanode whether the HDDS/Ozone plugin is started or not. Log of datanode should contain something like this: diff --git a/hadoop-hdds/docs/content/concept/Datanodes.md b/hadoop-hdds/docs/content/concept/Datanodes.md index e24bf62c2e..ea63fe46b1 100644 --- a/hadoop-hdds/docs/content/concept/Datanodes.md +++ b/hadoop-hdds/docs/content/concept/Datanodes.md @@ -36,7 +36,7 @@ actual data streams. This is the default Storage container format. From Ozone's perspective, container is a protocol spec, actual storage layouts does not matter. In other words, it is trivial to extend or bring new container layouts. Hence this should be treated as a reference implementation - of containers under Ozone. +of containers under Ozone. ## Understanding Ozone Blocks and Containers @@ -51,13 +51,13 @@ shows the logical layout out of Ozone block. The container ID lets the clients discover the location of the container. The authoritative information about where a container is located is with the -Storage Container Manager or SCM. In most cases, the container location will +Storage Container Manager (SCM). In most cases, the container location will be cached by Ozone Manager and will be returned along with the Ozone blocks. Once the client is able to locate the contianer, that is, understand which data nodes contain this container, the client will connect to the datanode -read the data the data stream specified by container ID:Local ID. In other +and read the data stream specified by _Container ID:Local ID_. In other words, the local ID serves as index into the container which describes what data stream we want to read from. diff --git a/hadoop-hdds/docs/content/concept/Hdds.md b/hadoop-hdds/docs/content/concept/Hdds.md index 4ea9111108..ad17b54d01 100644 --- a/hadoop-hdds/docs/content/concept/Hdds.md +++ b/hadoop-hdds/docs/content/concept/Hdds.md @@ -23,7 +23,7 @@ summary: Storage Container Manager or SCM is the core metadata service of Ozone Storage container manager provides multiple critical functions for the Ozone cluster. SCM acts as the cluster manager, Certificate authority, Block -manager and the replica manager. +manager and the Replica manager. {{}} SCM is in charge of creating an Ozone cluster. When an SCM is booted up via init command, SCM creates the cluster identity and root certificates needed for the SCM certificate authority. SCM manages the life cycle of a data node in the cluster. diff --git a/hadoop-hdds/docs/content/concept/Overview.md b/hadoop-hdds/docs/content/concept/Overview.md index 96dc311042..9e5746d846 100644 --- a/hadoop-hdds/docs/content/concept/Overview.md +++ b/hadoop-hdds/docs/content/concept/Overview.md @@ -56,7 +56,7 @@ Ozone. ![FunctionalOzone](FunctionalOzone.png) -Any distributed system can viewed from different perspectives. One way to +Any distributed system can be viewed from different perspectives. One way to look at Ozone is to imagine it as Ozone Manager as a name space service built on top of HDDS, a distributed block store. @@ -67,8 +67,8 @@ Another way to visualize Ozone is to look at the functional layers; we have a We have a data storage layer, which is basically the data nodes and they are managed by SCM. -The replication layer, provided by Ratis is used to replicate metadata (Ozone -Manager and SCM) and also used for consistency when data is modified at the +The replication layer, provided by Ratis is used to replicate metadata (OM and SCM) +and also used for consistency when data is modified at the data nodes. We have a management server called Recon, that talks to all other components diff --git a/hadoop-hdds/docs/content/concept/OzoneManager.md b/hadoop-hdds/docs/content/concept/OzoneManager.md index 7353d71b18..1ebdd4951d 100644 --- a/hadoop-hdds/docs/content/concept/OzoneManager.md +++ b/hadoop-hdds/docs/content/concept/OzoneManager.md @@ -21,14 +21,14 @@ summary: Ozone Manager is the principal name space service of Ozone. OM manages limitations under the License. --> -Ozone Manager or OM is the namespace manager for Ozone. +Ozone Manager (OM) is the namespace manager for Ozone. This means that when you want to write some data, you ask Ozone -manager for a block and Ozone Manager gives you a block and remembers that -information. When you want to read the that file back, you need to find the -address of the block and Ozone manager returns it you. +Manager for a block and Ozone Manager gives you a block and remembers that +information. When you want to read that file back, you need to find the +address of the block and Ozone Manager returns it you. -Ozone manager also allows users to organize keys under a volume and bucket. +Ozone Manager also allows users to organize keys under a volume and bucket. Volumes and buckets are part of the namespace and managed by Ozone Manager. Each ozone volume is the root of an independent namespace under OM. @@ -57,17 +57,17 @@ understood if we trace what happens during a key write and key read. * To write a key to Ozone, a client tells Ozone manager that it would like to write a key into a bucket that lives inside a specific volume. Once Ozone -manager determines that you are allowed to write a key to specified bucket, +Manager determines that you are allowed to write a key to the specified bucket, OM needs to allocate a block for the client to write data. -* To allocate a block, Ozone manager sends a request to Storage Container -Manager or SCM; SCM is the manager of data nodes. SCM picks three data nodes +* To allocate a block, Ozone Manager sends a request to Storage Container +Manager (SCM); SCM is the manager of data nodes. SCM picks three data nodes into which client can write data. SCM allocates the block and returns the block ID to Ozone Manager. * Ozone manager records this block information in its metadata and returns the block and a block token (a security permission to write data to the block) -the client. +to the client. * The client uses the block token to prove that it is allowed to write data to the block and writes data to the data node. @@ -82,6 +82,6 @@ Ozone manager. * Key reads are simpler, the client requests the block list from the Ozone Manager * Ozone manager will return the block list and block tokens which -allows the client to read the data from nodes. +allows the client to read the data from data nodes. * Client connects to the data node and presents the block token and reads the data from the data node. diff --git a/hadoop-hdds/docs/content/interface/JavaApi.md b/hadoop-hdds/docs/content/interface/JavaApi.md index 4b700e9974..bb18068f40 100644 --- a/hadoop-hdds/docs/content/interface/JavaApi.md +++ b/hadoop-hdds/docs/content/interface/JavaApi.md @@ -74,21 +74,21 @@ It is possible to pass an array of arguments to the createVolume by creating vol Once you have a volume, you can create buckets inside the volume. -{{< highlight bash >}} +{{< highlight java >}} // Let us create a bucket called videos. assets.createBucket("videos"); OzoneBucket video = assets.getBucket("videos"); {{< /highlight >}} -At this point we have a usable volume and a bucket. Our volume is called assets and bucket is called videos. +At this point we have a usable volume and a bucket. Our volume is called _assets_ and bucket is called _videos_. Now we can create a Key. ### Reading and Writing a Key -With a bucket object the users can now read and write keys. The following code reads a video called intro.mp4 from the local disk and stores in the video bucket that we just created. +With a bucket object the users can now read and write keys. The following code reads a video called intro.mp4 from the local disk and stores in the _video_ bucket that we just created. -{{< highlight bash >}} +{{< highlight java >}} // read data from the file, this is a user provided function. byte [] videoData = readFile("intro.mp4"); diff --git a/hadoop-hdds/docs/content/interface/OzoneFS.md b/hadoop-hdds/docs/content/interface/OzoneFS.md index 6863b46bd3..fcfef6dde3 100644 --- a/hadoop-hdds/docs/content/interface/OzoneFS.md +++ b/hadoop-hdds/docs/content/interface/OzoneFS.md @@ -21,7 +21,7 @@ summary: Hadoop Compatible file system allows any application that expects an HD limitations under the License. --> -The Hadoop compatible file system interface allpws storage backends like Ozone +The Hadoop compatible file system interface allows storage backends like Ozone to be easily integrated into Hadoop eco-system. Ozone file system is an Hadoop compatible file system. @@ -36,7 +36,7 @@ ozone sh volume create /volume ozone sh bucket create /volume/bucket {{< /highlight >}} -Once this is created, please make sure that bucket exists via the listVolume or listBucket commands. +Once this is created, please make sure that bucket exists via the _list volume_ or _list bucket_ commands. Please add the following entry to the core-site.xml. @@ -45,6 +45,10 @@ Please add the following entry to the core-site.xml. fs.o3fs.impl org.apache.hadoop.fs.ozone.OzoneFileSystem + + fs.AbstractFileSystem.o3fs.impl + org.apache.hadoop.fs.ozone.OzFs + fs.defaultFS o3fs://bucket.volume diff --git a/hadoop-hdds/docs/content/interface/S3.md b/hadoop-hdds/docs/content/interface/S3.md index dc9b451728..6a8e2d7c53 100644 --- a/hadoop-hdds/docs/content/interface/S3.md +++ b/hadoop-hdds/docs/content/interface/S3.md @@ -26,7 +26,7 @@ Ozone provides S3 compatible REST interface to use the object store data with an ## Getting started -S3 Gateway is a separated component which provides the S3 compatible. It should be started additional to the regular Ozone components. +S3 Gateway is a separated component which provides the S3 compatible APIs. It should be started additional to the regular Ozone components. You can start a docker based cluster, including the S3 gateway from the release package. @@ -93,7 +93,7 @@ If security is not enabled, you can *use* **any** AWS_ACCESS_KEY_ID and AWS_SECR If security is enabled, you can get the key and the secret with the `ozone s3 getsecret` command (*kerberos based authentication is required). -``` +```bash /etc/security/keytabs/testuser.keytab testuser/scm@EXAMPLE.COM ozone s3 getsecret awsAccessKey=testuser/scm@EXAMPLE.COM @@ -103,7 +103,7 @@ awsSecret=c261b6ecabf7d37d5f9ded654b1c724adac9bd9f13e247a235e567e8296d2999 Now, you can use the key and the secret to access the S3 endpoint: -``` +```bash export AWS_ACCESS_KEY_ID=testuser/scm@EXAMPLE.COM export AWS_SECRET_ACCESS_KEY=c261b6ecabf7d37d5f9ded654b1c724adac9bd9f13e247a235e567e8296d2999 aws s3api --endpoint http://localhost:9878 create-bucket --bucket bucket1 @@ -116,7 +116,7 @@ aws s3api --endpoint http://localhost:9878 create-bucket --bucket bucket1 To show the storage location of a S3 bucket, use the `ozone s3 path ` command. -``` +```bash aws s3api --endpoint-url http://localhost:9878 create-bucket --bucket=bucket1 ozone s3 path bucket1 @@ -128,23 +128,23 @@ Ozone FileSystem Uri is : o3fs://bucket1.s3thisisakey ### AWS Cli -`aws` CLI could be used with specifying the custom REST endpoint. +`aws` CLI could be used by specifying the custom REST endpoint. -``` +```bash aws s3api --endpoint http://localhost:9878 create-bucket --bucket buckettest ``` Or -``` +```bash aws s3 ls --endpoint http://localhost:9878 s3://buckettest ``` ### S3 Fuse driver (goofys) -Goofys is a S3 FUSE driver. It could be used to mount any Ozone bucket as posix file system: +Goofys is a S3 FUSE driver. It could be used to mount any Ozone bucket as posix file system. -``` +```bash goofys --endpoint http://localhost:9878 bucket1 /mount/bucket1 ``` diff --git a/hadoop-hdds/docs/content/recipe/Prometheus.md b/hadoop-hdds/docs/content/recipe/Prometheus.md index 9151b318ce..310d078567 100644 --- a/hadoop-hdds/docs/content/recipe/Prometheus.md +++ b/hadoop-hdds/docs/content/recipe/Prometheus.md @@ -32,28 +32,29 @@ compatible metrics endpoint where all the available hadoop metrics are published ## Monitoring with prometheus -(1) To enable the Prometheus metrics endpoint you need to add a new configuration to the `ozone-site.xml` file: +* To enable the Prometheus metrics endpoint you need to add a new configuration to the `ozone-site.xml` file. -``` + ```xml hdds.prometheus.endpoint.enabled true ``` -_Note_: for Docker compose based pseudo cluster put the `OZONE-SITE.XML_hdds.prometheus.endpoint.enabled=true` line to the `docker-config` file. +_Note_: for Docker compose based pseudo cluster put the \ +`OZONE-SITE.XML_hdds.prometheus.endpoint.enabled=true` line to the `docker-config` file. -(2) Restart the Ozone Manager and Storage Container Manager and check the prometheus endpoints: +* Restart the Ozone Manager and Storage Container Manager and check the prometheus endpoints: * http://scm:9874/prom * http://ozoneManager:9876/prom -(3) Create a prometheus.yaml configuration with the previous endpoints: +* Create a prometheus.yaml configuration with the previous endpoints: ```yaml global: - scrape_interval: 15s + scrape_interval: 15s scrape_configs: - job_name: ozone @@ -64,20 +65,21 @@ scrape_configs: - "ozoneManager:9874" ``` -(4) Start with prometheus from the directory where you have the prometheus.yaml file: +* Start with prometheus from the directory where you have the prometheus.yaml file: -``` +```bash prometheus ``` -(5) Check the active targets in the prometheus web-ui: +* Check the active targets in the prometheus web-ui: http://localhost:9090/targets ![Prometheus target page example](prometheus.png) -(6) Check any metrics on the prometheus web ui. For example: +* Check any metrics on the prometheus web ui.\ +For example: http://localhost:9090/graph?g0.range_input=1h&g0.expr=om_metrics_num_key_allocate&g0.tab=1 diff --git a/hadoop-hdds/docs/content/recipe/SparkOzoneFSK8S.md b/hadoop-hdds/docs/content/recipe/SparkOzoneFSK8S.md index c59789bd2e..9f9d3478c9 100644 --- a/hadoop-hdds/docs/content/recipe/SparkOzoneFSK8S.md +++ b/hadoop-hdds/docs/content/recipe/SparkOzoneFSK8S.md @@ -46,13 +46,13 @@ You also need the following: First of all create a docker image with the Spark image creator. Execute the following from the Spark distribution -``` +```bash ./bin/docker-image-tool.sh -r myrepo -t 2.4.0 build ``` _Note_: if you use Minikube add the `-m` flag to use the docker daemon of the Minikube image: -``` +```bash ./bin/docker-image-tool.sh -m -r myrepo -t 2.4.0 build ``` @@ -64,18 +64,22 @@ Create a new directory for customizing the created docker image. Copy the `ozone-site.xml` from the cluster: -``` +```bash kubectl cp om-0:/opt/hadoop/etc/hadoop/ozone-site.xml . ``` -And create a custom `core-site.xml`: +And create a custom `core-site.xml`. -``` +```xml fs.o3fs.impl org.apache.hadoop.fs.ozone.BasicOzoneFileSystem + + fs.AbstractFileSystem.o3fs.impl + org.apache.hadoop.fs.ozone.OzFs + ``` @@ -98,13 +102,13 @@ ENV SPARK_EXTRA_CLASSPATH=/opt/hadoop/conf ADD hadoop-ozone-filesystem-lib-legacy-0.4.0-SNAPSHOT.jar /opt/hadoop-ozone-filesystem-lib-legacy.jar ``` -``` +```bash docker build -t myrepo/spark-ozone ``` For remote kubernetes cluster you may need to push it: -``` +```bash docker push myrepo/spark-ozone ``` @@ -112,7 +116,7 @@ docker push myrepo/spark-ozone Download any text file and put it to the `/tmp/alice.txt` first. -``` +```bash kubectl port-forward s3g-0 9878:9878 aws s3api --endpoint http://localhost:9878 create-bucket --bucket=test aws s3api --endpoint http://localhost:9878 put-object --bucket test --key alice.txt --body /tmp/alice.txt @@ -130,7 +134,7 @@ Write down the ozone filesystem uri as it should be used with the spark-submit c ## Create service account to use -``` +```bash kubectl create serviceaccount spark -n yournamespace kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount=yournamespace:spark --namespace=yournamespace ``` @@ -138,13 +142,14 @@ kubectl create clusterrolebinding spark-role --clusterrole=edit --serviceaccount Execute the following spark-submit command, but change at least the following values: - * the kubernetes master url (you can check your ~/.kube/config to find the actual value) - * the kubernetes namespace (yournamespace in this example) - * serviceAccountName (you can use the _spark_ value if you folllowed the previous steps) - * container.image (in this example this is myrepo/spark-ozone. This is pushed to the registry in the previous steps) - * location of the input file (o3fs://...), use the string which is identified earlier with the `ozone s3 path ` command + * the kubernetes master url (you can check your _~/.kube/config_ to find the actual value) + * the kubernetes namespace (_yournamespace_ in this example) + * serviceAccountName (you can use the _spark_ value if you followed the previous steps) + * container.image (in this example this is _myrepo/spark-ozone_. This is pushed to the registry in the previous steps) + * location of the input file (o3fs://...), use the string which is identified earlier with the \ + `ozone s3 path ` command -``` +```bash bin/spark-submit \ --master k8s://https://kubernetes:6443 \ --deploy-mode cluster \ @@ -162,7 +167,8 @@ bin/spark-submit \ Check the available `spark-word-count-...` pods with `kubectl get pod` -Check the output of the calculation with `kubectl logs spark-word-count-1549973913699-driver` +Check the output of the calculation with \ +`kubectl logs spark-word-count-1549973913699-driver` You should see the output of the wordcount job. For example: diff --git a/hadoop-hdds/docs/content/recipe/_index.md b/hadoop-hdds/docs/content/recipe/_index.md index ba1711a378..beaab69a2d 100644 --- a/hadoop-hdds/docs/content/recipe/_index.md +++ b/hadoop-hdds/docs/content/recipe/_index.md @@ -24,5 +24,6 @@ weight: 8 {{}} - Standard How-to documents which describe how to use Ozone with other Software. For example, How to use Ozone with Apache Spark. + Standard how-to documents which describe how to use Ozone with other Software. + For example, how to use Ozone with Apache Spark. {{}} diff --git a/hadoop-hdds/docs/content/security/SecuityWithRanger.md b/hadoop-hdds/docs/content/security/SecuityWithRanger.md index 6a3d18a4db..cbbd53ec7c 100644 --- a/hadoop-hdds/docs/content/security/SecuityWithRanger.md +++ b/hadoop-hdds/docs/content/security/SecuityWithRanger.md @@ -24,8 +24,8 @@ icon: user Apache Ranger™ is a framework to enable, monitor and manage comprehensive data -security across the Hadoop platform. The next version(any version after 1.20) -of Apache Ranger is aware of Ozone, and can manage an Ozone cluster. +security across the Hadoop platform. Any version of Apache Ranger which is greater +than 1.20 is aware of Ozone, and can manage an Ozone cluster. To use Apache Ranger, you must have Apache Ranger installed in your Hadoop diff --git a/hadoop-hdds/docs/content/security/SecureOzone.md b/hadoop-hdds/docs/content/security/SecureOzone.md index cf6668b44d..d4d836fcf7 100644 --- a/hadoop-hdds/docs/content/security/SecureOzone.md +++ b/hadoop-hdds/docs/content/security/SecureOzone.md @@ -31,11 +31,13 @@ secure networks where it is possible to deploy without securing the cluster. This release of Ozone follows that model, but soon will move to _secure by default._ Today to enable security in ozone cluster, we need to set the -configuration **ozone.security.enabled** to true. +configuration **ozone.security.enabled** to _true_ and **hadoop.security.authentication** +to _kerberos_. Property|Value ----------------------|--------- -ozone.security.enabled| **true** +ozone.security.enabled| _true_ +hadoop.security.authentication| _kerberos_ # Tokens # @@ -68,7 +70,7 @@ also enabled by default when security is enabled. Each of the service daemons that make up Ozone needs a Kerberos service -principal name and a corresponding [kerberos key tab]({{https://web.mit.edu/kerberos/krb5-latest/doc/basic/keytab_def.html}}) file. +principal name and a corresponding [kerberos key tab](https://web.mit.edu/kerberos/krb5-latest/doc/basic/keytab_def.html) file. All these settings should be made in ozone-site.xml. @@ -77,101 +79,100 @@ All these settings should be made in ozone-site.xml.

Storage Container Manager

-
+
SCM requires two Kerberos principals, and the corresponding key tab files for both of these principals. -
- - - - - - - - - - - - - - - - - - - - - - - - - -
PropertyDescription
hdds.scm.kerberos.principalThe SCM service principal. e.g. scm/HOST@REALM.COM
hdds.scm.kerberos.keytab.fileThe keytab file used by SCM daemon to login as its service principal.
hdds.scm.http.kerberos.principalSCM http server service principal.
hdds.scm.http.kerberos.keytabThe keytab file used by SCM http server to login as its service principal.
+
+ + + + + + + + + + + + + + + + + + + + + +
PropertyDescription
hdds.scm.kerberos.principal + The SCM service principal.
e.g. scm/_HOST@REALM.COM
hdds.scm.kerberos.keytab.file + The keytab file used by SCM daemon to login as its service principal.
hdds.scm.http.kerberos.principal + SCM http server service principal.
hdds.scm.http.kerberos.keytab + The keytab file used by SCM http server to login as its service principal.

Ozone Manager

-
- Like SCM, OM also requires two Kerberos principals, and the - corresponding key tab files for both of these principals. -
- - - - - - - - - - - - - - - - - - - - - - - - - -
PropertyDescription
ozone.om.kerberos.principalThe OzoneManager service principal. e.g. om/_HOST@REALM - .COM
ozone.om.kerberos.keytab.fileTThe keytab file used by SCM daemon to login as its service principal.
ozone.om.http.kerberos.principalOzone Manager http server service principal.
ozone.om.http.kerberos.keytabThe keytab file used by OM http server to login as its service principal.
-

+
+ Like SCM, OM also requires two Kerberos principals, and the + corresponding key tab files for both of these principals. +
+ + + + + + + + + + + + + + + + + + + + + +
PropertyDescription
ozone.om.kerberos.principal + The OzoneManager service principal.
e.g. om/_HOST@REALM.COM
ozone.om.kerberos.keytab.file + TThe keytab file used by SCM daemon to login as its service principal.
ozone.om.http.kerberos.principal + Ozone Manager http server service principal.
ozone.om.http.kerberos.keytab + The keytab file used by OM http server to login as its service principal.
+

S3 Gateway

-
+
S3 gateway requires one service principal and here the configuration values - needed in the ozone-site.xml. -
+ needed in the ozone-site.xml. +
- - - - - - - - - - - - - - - -
PropertyDescription
ozone.s3g.keytab.fileThe keytab file used by S3 gateway
ozone.s3g.authentication.kerberos - .principalS3 Gateway principal. e.g. HTTP/_HOST@EXAMPLE.COM
-

+ + + Property + Description + + + + + ozone.s3g.authentication.kerberos.principal + S3 Gateway principal.
e.g. HTTP/_HOST@EXAMPLE.COM + + + ozone.s3g.keytab.file + The keytab file used by S3 gateway + + + +
diff --git a/hadoop-hdds/docs/content/security/SecuringDatanodes.md b/hadoop-hdds/docs/content/security/SecuringDatanodes.md index 2087dfdec8..6b7d82365c 100644 --- a/hadoop-hdds/docs/content/security/SecuringDatanodes.md +++ b/hadoop-hdds/docs/content/security/SecuringDatanodes.md @@ -32,10 +32,13 @@ However, we support the legacy Kerberos based Authentication to make it easy for the current set of users.The HDFS configuration keys are the following that is setup in hdfs-site.xml. -Property|Example Value|Comment ---------|--------------|-------------- -dfs.datanode.keytab.file| /keytab/dn.service.keytab| Keytab file. -dfs.datanode.kerberos.principal| dn/_HOST@REALM.TLD| principal name. +Property|Description +--------|-------------- +dfs.datanode.kerberos.principal|The datanode service principal.
e.g. dn/_HOST@REALM.COM +dfs.datanode.keytab.file| The keytab file used by datanode daemon to login as its service principal. +hdds.datanode.http.kerberos.principal| Datanode http server service principal. +hdds.datanode.http.kerberos.keytab| The keytab file used by datanode http server to login as its service principal. + ## How a data node becomes secure. @@ -63,7 +66,7 @@ boot time to prove the identity of the data node container (This is also work in progress.) -Once a certificate is issued, a Data node is secure and Ozone manager can +Once a certificate is issued, a data node is secure and Ozone manager can issue block tokens. If there is no data node certificates or the SCM's root certificate is not present in the data node, then data node will register itself and down load the SCM's root certificate as well get the certificates diff --git a/hadoop-hdds/docs/content/security/SecuringS3.md b/hadoop-hdds/docs/content/security/SecuringS3.md index 15ae67210e..1cb0c809e6 100644 --- a/hadoop-hdds/docs/content/security/SecuringS3.md +++ b/hadoop-hdds/docs/content/security/SecuringS3.md @@ -35,12 +35,12 @@ The user needs to `kinit` first and once they have authenticated via kerberos * S3 clients can get the secret access id and user secret from OzoneManager. -``` +```bash ozone s3 getsecret ``` This command will talk to ozone, validate the user via kerberos and generate the AWS credentials. The values will be printed out on the screen. You can -set these values up in your .aws file for automatic access while working +set these values up in your _.aws_ file for automatic access while working against Ozone S3 buckets.