Go to file
Thomas Marquardt b214bbd2d9
HADOOP-16916: ABFS: Delegation SAS generator for integration with Ranger
Contributed by Thomas Marquardt.

DETAILS:

Previously we had a SASGenerator class which generated Service SAS, but we need to add DelegationSASGenerator.
I separated SASGenerator into a base class and two subclasses ServiceSASGenerator and DelegationSASGenreator.  The
code in ServiceSASGenerator is copied from SASGenerator but the DelegationSASGenrator code is new.  The
DelegationSASGenerator code demonstrates how to use Delegation SAS with minimal permissions, as would be used
by an authorization service such as Apache Ranger.  Adding this to the tests helps us lock in this behavior.

Added a MockDelegationSASTokenProvider for testing User Delegation SAS.

Fixed the ITestAzureBlobFileSystemCheckAccess tests to assume oauth client ID so that they are ignored when that
is not configured.

To improve performance, AbfsInputStream/AbfsOutputStream re-use SAS tokens until the expiry is within 120 seconds.
After this a new SAS will be requested.  The default period of 120 seconds can be changed using the configuration
setting "fs.azure.sas.token.renew.period.for.streams".

The SASTokenProvider operation names were updated to correspond better with the ADLS Gen2 REST API, since these
operations must be provided tokens with appropriate SAS parameters to succeed.

Support for the version 2.0 AAD authentication endpoint was added to AzureADAuthenticator.

The getFileStatus method was mistakenly calling the ADLS Gen2 Get Properties API which requires read permission
while the getFileStatus call only requires execute permission.  ADLS Gen2 Get Status API is supposed to be used
for this purpose, so the underlying AbfsClient.getPathStatus API was updated with a includeProperties
parameter which is set to false for getFileStatus and true for getXAttr.

Added SASTokenProvider support for delete recursive.

Fixed bugs in AzureBlobFileSystem where public methods were not validating the Path by calling makeQualified.  This is
necessary to avoid passing null paths and to convert relative paths into absolute paths.

Canonicalized the path used for root path internally so that root path can be used with SAS tokens, which requires
that the path in the URL and the path in the SAS token match.  Internally the code was using
"//" instead of "/" for the root path, sometimes.  Also related to this, the AzureBlobFileSystemStore.getRelativePath
API was updated so that we no longer remove and then add back a preceding forward / to paths.

To run ITestAzureBlobFileSystemDelegationSAS tests follow the instructions in testing_azure.md under the heading
"To run Delegation SAS test cases".  You also need to set "fs.azure.enable.check.access" to true.

TEST RESULTS:

namespace.enabled=true
auth.type=SharedKey
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 41
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=false
auth.type=SharedKey
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 244
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=SharedKey
sas.token.provider.type=MockDelegationSASTokenProvider
enable.check.access=true
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 0, Skipped: 33
Tests run: 206, Failures: 0, Errors: 0, Skipped: 24

namespace.enabled=true
auth.type=OAuth
-------------------
$mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify
Tests run: 63, Failures: 0, Errors: 0, Skipped: 0
Tests run: 432, Failures: 0, Errors: 1, Skipped: 74
Tests run: 206, Failures: 0, Errors: 0, Skipped: 140
2020-05-12 18:35:38 +00:00
.github HADOOP-15184. Add GitHub pull request template. (#1419) 2019-09-11 11:10:11 +09:00
dev-support HADOOP-16054. Update Dockerfile to use Bionic (#1966) 2020-04-26 02:54:45 +09:00
hadoop-assemblies HADOOP-16985. Handle release package related issues (#1957) 2020-04-15 20:38:19 +05:30
hadoop-build-tools Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-client-modules Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-cloud-storage-project HADOOP-17007. hadoop-cos fails to build. Contributed by Yang Yu. 2020-04-26 12:46:53 +05:30
hadoop-common-project HDFS-15300. RBF: updateActiveNamenode() is invalid when RPC address is IP. Contributed by xuzq. 2020-05-12 21:54:54 +05:30
hadoop-dist Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-hdfs-project HDFS-15300. RBF: updateActiveNamenode() is invalid when RPC address is IP. Contributed by xuzq. 2020-05-12 21:54:54 +05:30
hadoop-mapreduce-project HADOOP-17035. fixed typos (timeout, interruped) (#2007) 2020-05-12 10:50:04 -05:00
hadoop-maven-plugins Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-minicluster HDFS-15331. Remove invalid exclusions that minicluster dependency on HDFS (#1996). Contributed by Wanqiang Ji 2020-05-06 02:06:38 +05:30
hadoop-project HADOOP-17033. Update commons-codec from 1.11 to 1.14. (#2000) 2020-05-11 08:41:14 -07:00
hadoop-project-dist Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-tools HADOOP-16916: ABFS: Delegation SAS generator for integration with Ranger 2020-05-12 18:35:38 +00:00
hadoop-yarn-project YARN-10260. Allow transitioning queue from DRAINING to RUNNING state. Contributed by Bilwa S T 2020-05-12 10:48:54 -07:00
licenses HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HADOOP-16952. Add .diff to gitignore. Contributed by Ayush Saxena. 2020-04-03 13:57:02 +05:30
BUILDING.txt HADOOP-16856. cmake is missing in the CentOS 8 section of BUILDING.txt. (#1841) 2020-02-12 21:17:33 +09:00
Jenkinsfile HADOOP-16944. Use Yetus 0.12.0 in GitHub PR (#1917) 2020-04-19 02:43:44 +09:00
LICENSE-binary YARN-10074. Update netty to 4.1.42Final in yarn-csi. Contributed by Wei-Chiu Chuang. (#1807) 2020-02-25 13:47:52 +09:00
LICENSE.txt YARN-9561. Add C changes for the new RuncContainerRuntime. Contributed by Eric Badger 2019-12-09 01:25:10 +00:00
NOTICE-binary HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml upate the hadoop.version property in the root pom.xml and hadoop.assemblies.version in hadoop-project/pom.xml (see HADOOP-15369) 2020-03-29 23:44:20 +05:30
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-16849. start-build-env.sh behaves incorrectly when username is numeric only. Contributed by Jihyun Cho. 2020-02-12 14:06:23 +09:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/