HADOOP-14153. ADL module has messed doc structure. Contributed by Mingliang Liu

This commit is contained in:
Mingliang Liu 2017-03-07 16:29:19 -08:00
parent a96afae125
commit 881ec4d97b

View File

@ -14,28 +14,15 @@
# Hadoop Azure Data Lake Support # Hadoop Azure Data Lake Support
* [Introduction](#Introduction) <!-- MACRO{toc|fromDepth=1|toDepth=3} -->
* [Features](#Features)
* [Limitations](#Limitations)
* [Usage](#Usage)
* [Concepts](#Concepts)
* [OAuth2 Support](#OAuth2_Support)
* [Configuring Credentials and FileSystem](#Configuring_Credentials)
* [Using Refresh Token](#Refresh_Token)
* [Using Client Keys](#Client_Credential_Token)
* [Protecting the Credentials with Credential Providers](#Credential_Provider)
* [Enabling ADL Filesystem](#Enabling_ADL)
* [Accessing `adl` URLs](#Accessing_adl_URLs)
* [User/Group Representation](#OIDtoUPNConfiguration)
* [Testing the `hadoop-azure` Module](#Testing_the_hadoop-azure_Module)
## <a name="Introduction" />Introduction ## Introduction
The `hadoop-azure-datalake` module provides support for integration with the The `hadoop-azure-datalake` module provides support for integration with the
[Azure Data Lake Store](https://azure.microsoft.com/en-in/documentation/services/data-lake-store/). [Azure Data Lake Store](https://azure.microsoft.com/en-in/documentation/services/data-lake-store/).
This support comes via the JAR file `azure-datalake-store.jar`. This support comes via the JAR file `azure-datalake-store.jar`.
## <a name="Features" />Features ## Features
* Read and write data stored in an Azure Data Lake Storage account. * Read and write data stored in an Azure Data Lake Storage account.
* Reference file system paths using URLs using the `adl` scheme for Secure Webhdfs i.e. SSL * Reference file system paths using URLs using the `adl` scheme for Secure Webhdfs i.e. SSL
@ -46,7 +33,7 @@ This support comes via the JAR file `azure-datalake-store.jar`.
* API `setOwner()`, `setAcl`, `removeAclEntries()`, `modifyAclEntries()` accepts UPN or OID * API `setOwner()`, `setAcl`, `removeAclEntries()`, `modifyAclEntries()` accepts UPN or OID
(Object ID) as user and group names. (Object ID) as user and group names.
## <a name="Limitations" />Limitations ## Limitations
Partial or no support for the following operations : Partial or no support for the following operations :
@ -62,9 +49,9 @@ Partial or no support for the following operations :
* User and group information returned as `listStatus()` and `getFileStatus()` is * User and group information returned as `listStatus()` and `getFileStatus()` is
in the form of the GUID associated in Azure Active Directory. in the form of the GUID associated in Azure Active Directory.
## <a name="Usage" />Usage ## Usage
### <a name="Concepts" />Concepts ### Concepts
Azure Data Lake Storage access path syntax is: Azure Data Lake Storage access path syntax is:
``` ```
@ -74,7 +61,7 @@ adl://<Account Name>.azuredatalakestore.net/
For details on using the store, see For details on using the store, see
[**Get started with Azure Data Lake Store using the Azure Portal**](https://azure.microsoft.com/en-in/documentation/articles/data-lake-store-get-started-portal/) [**Get started with Azure Data Lake Store using the Azure Portal**](https://azure.microsoft.com/en-in/documentation/articles/data-lake-store-get-started-portal/)
### <a name="#OAuth2_Support" />OAuth2 Support #### OAuth2 Support
Usage of Azure Data Lake Storage requires an OAuth2 bearer token to be present as Usage of Azure Data Lake Storage requires an OAuth2 bearer token to be present as
part of the HTTPS header as per the OAuth2 specification. part of the HTTPS header as per the OAuth2 specification.
@ -86,11 +73,11 @@ and identity management service. See [*What is ActiveDirectory*](https://azure.m
Following sections describes theOAuth2 configuration in `core-site.xml`. Following sections describes theOAuth2 configuration in `core-site.xml`.
#### <a name="Configuring_Credentials" />Configuring Credentials & FileSystem ### Configuring Credentials and FileSystem
Credentials can be configured using either a refresh token (associated with a user), Credentials can be configured using either a refresh token (associated with a user),
or a client credential (analogous to a service principal). or a client credential (analogous to a service principal).
#### <a name="Refresh_Token" />Using Refresh Tokens #### Using Refresh Tokens
Add the following properties to the cluster's `core-site.xml` Add the following properties to the cluster's `core-site.xml`
@ -119,9 +106,9 @@ service associated with the client id. See [*Active Directory Library For Java*]
``` ```
### <a name="Client_Credential_Token" />Using Client Keys #### Using Client Keys
#### Generating the Service Principal ##### Generating the Service Principal
1. Go to [the portal](https://portal.azure.com) 1. Go to [the portal](https://portal.azure.com)
2. Under "Browse", look for Active Directory and click on it. 2. Under "Browse", look for Active Directory and click on it.
@ -135,13 +122,13 @@ service associated with the client id. See [*Active Directory Library For Java*]
- The token endpoint (select "View endpoints" at the bottom of the page and copy/paste the OAuth2 .0 Token Endpoint value) - The token endpoint (select "View endpoints" at the bottom of the page and copy/paste the OAuth2 .0 Token Endpoint value)
- Resource: Always https://management.core.windows.net/ , for all customers - Resource: Always https://management.core.windows.net/ , for all customers
#### Adding the service principal to your ADL Account ##### Adding the service principal to your ADL Account
1. Go to the portal again, and open your ADL account 1. Go to the portal again, and open your ADL account
2. Select Users under Settings 2. Select Users under Settings
3. Add your user name you created in Step 6 above (note that it does not show up in the list, but will be found if you searched for the name) 3. Add your user name you created in Step 6 above (note that it does not show up in the list, but will be found if you searched for the name)
4. Add "Owner" role 4. Add "Owner" role
### Configure core-site.xml ##### Configure core-site.xml
Add the following properties to your `core-site.xml` Add the following properties to your `core-site.xml`
```xml ```xml
@ -161,7 +148,7 @@ Add the following properties to your `core-site.xml`
</property> </property>
``` ```
### <a name="Credential_Provider" />Protecting the Credentials with Credential Providers #### Protecting the Credentials with Credential Providers
In many Hadoop clusters, the `core-site.xml` file is world-readable. To protect In many Hadoop clusters, the `core-site.xml` file is world-readable. To protect
these credentials, it is recommended that you use the these credentials, it is recommended that you use the
@ -171,7 +158,7 @@ All ADLS credential properties can be protected by credential providers.
For additional reading on the credential provider API, see For additional reading on the credential provider API, see
[Credential Provider API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html). [Credential Provider API](../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
#### Provisioning ##### Provisioning
```bash ```bash
hadoop credential create dfs.adls.oauth2.refresh.token -value 123 hadoop credential create dfs.adls.oauth2.refresh.token -value 123
@ -180,7 +167,7 @@ hadoop credential create dfs.adls.oauth2.credential -value 123
-provider localjceks://file/home/foo/adls.jceks -provider localjceks://file/home/foo/adls.jceks
``` ```
#### Configuring core-site.xml or command line property ##### Configuring core-site.xml or command line property
```xml ```xml
<property> <property>
@ -190,7 +177,7 @@ hadoop credential create dfs.adls.oauth2.credential -value 123
</property> </property>
``` ```
#### Running DistCp ##### Running DistCp
```bash ```bash
hadoop distcp hadoop distcp
@ -203,7 +190,7 @@ NOTE: You may optionally add the provider path property to the `distcp` command
line instead of added job specific configuration to a generic `core-site.xml`. line instead of added job specific configuration to a generic `core-site.xml`.
The square brackets above illustrate this capability.` The square brackets above illustrate this capability.`
### <a name="Accessing_adl_URLs" />Accessing adl URLs ### Accessing adl URLs
After credentials are configured in `core-site.xml`, any Hadoop component may After credentials are configured in `core-site.xml`, any Hadoop component may
reference files in that Azure Data Lake Storage account by using URLs of the following reference files in that Azure Data Lake Storage account by using URLs of the following
@ -230,7 +217,7 @@ hadoop fs -put testFile adl://yourcontainer.azuredatalakestore.net/testDir/testF
hadoop fs -cat adl://yourcontainer.azuredatalakestore.net/testDir/testFile hadoop fs -cat adl://yourcontainer.azuredatalakestore.net/testDir/testFile
test file content test file content
``` ```
### <a name="OIDtoUPNConfiguration" />User/Group Representation ### User/Group Representation
The `hadoop-azure-datalake` module provides support for configuring how The `hadoop-azure-datalake` module provides support for configuring how
User/Group information is represented during User/Group information is represented during
@ -254,7 +241,7 @@ Add the following properties to `core-site.xml`
</description> </description>
</property> </property>
``` ```
## <a name="Testing_the_hadoop-azure_Module" />Testing the azure-datalake-store Module ## Testing the azure-datalake-store Module
The `hadoop-azure` module includes a full suite of unit tests. The `hadoop-azure` module includes a full suite of unit tests.
Most of the tests will run without additional configuration by running mvn test. Most of the tests will run without additional configuration by running mvn test.
This includes tests against mocked storage, which is an in-memory emulation of Azure Data Lake Storage. This includes tests against mocked storage, which is an in-memory emulation of Azure Data Lake Storage.