hadoop

Author	SHA1	Message	Date
bilaharith	b8454a4b10	HADOOP-17311. ABFS: Logs should redact SAS signature (#2422 ) Contributed by bilaharith. Change-Id: Iff0ed4303ac5ce41b62bfda8150ee983dafa40be	2020-11-25 14:33:29 +00:00
Mukund Thakur	9dd74141a6	HADOOP-17323. S3A getFileStatus("/") to skip IO (#2479 ) Contributed by Mukund Thakur. Change-Id: I1709ad72b829999b6dd324f0755b51bc38918d30	2020-11-24 11:34:19 +00:00
Steve Loughran	38cc47d308	HADOOP-17332. S3A MarkerTool -min and -max are inverted. (#2425 ) This patch * fixes the inversion * adds a precondition check * if the commands are supplied inverted, swaps them with a warning. This is to stop breaking any tests written to cope with the existing behavior. Contributed by Steve Loughran Change-Id: I15c40863f0db0675c7d60db477cb3bf1693cae49	2020-11-23 21:49:33 +00:00
Steve Loughran	7ca539bc1b	HADOOP-17325. WASB Test Failures Contributed by Ayush Saxena and Steve Loughran Change-Id: I4bb76815bc1d11d1804dc67bafde68b6a995b974	2020-11-23 17:25:58 +00:00
Steve Loughran	e4bc64cce0	HADOOP-17343. Upgrade AWS SDK to 1.11.901 (#2468 ) Contributed by Steve Loughran.	2020-11-23 14:09:14 +00:00
Jungtaek Lim	401cadbac5	HADOOP-17388. AbstractS3ATokenIdentifier to issue date in UTC. (#2477 ) Followup to HADOOP-17379. Contributed by Jungtaek Lim. Change-Id: I7b2fce36028d297c1e095499691a08caba92d9fd	2020-11-20 10:56:57 +00:00
Jim Brennan	e24a6b550e	HADOOP-17367. Add InetAddress api to ProxyUsers.authorize (#2449 ). Contributed by Daryn Sharp and Ahmed Hussein	2020-11-19 21:26:47 +00:00
Steve Loughran	4687c25389	HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2310 ) This fixes the S3Guard/Directory Marker Retention integration so that when fs.s3a.directory.marker.retention=keep, failures during multipart delete are handled correctly, as are incremental deletes during directory tree operations. In both cases, when a directory marker with children is deleted from S3, the directory entry in S3Guard is not deleted, because it is still critical to representing the structure of the store. Contributed by Steve Loughran. Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f	2020-11-18 12:30:43 +00:00
Steve Loughran	4bb9d593da	HADOOP-17261. s3a rename() needs s3:deleteObjectVersion permission (#2303 ) Contributed by Steve Loughran. Change-Id: I8e89a402a24bd9fb958e0fa93d1a28191093851d	2020-11-18 12:20:12 +00:00
Jungtaek Lim	22039a14ff	HADOOP-17379. AbstractS3ATokenIdentifier to set issue date == now. (#2466 ) Unless you explicitly set it, the issue date of a delegation token identifier is 0, which confuses spark renewal (SPARK-33440). This patch makes sure that all S3A DT identifiers have the current time as issue date, fixing the problem as far as S3A tokens are concerned. Contributed by Jungtaek Lim. Change-Id: Ic80ac7895612a1aa669459c73a78a9c17ecf0c0d	2020-11-17 14:56:58 +00:00
Doroszlai, Attila	bf2ff35a04	HADOOP-17376. ITestS3AContractRename failing against stricter tests. (#2462 ) Contributed by Attila Doroszlai. Change-Id: Ie15624ec07b1c5e34ca7fde0a72a54431d79e746	2020-11-16 11:26:06 +00:00
Eric E Payne	2473e8b711	YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan).	2020-11-02 17:16:28 +00:00
Anoop Sam John	8312f230eb	HADOOP-17308. WASB PageBlobOutputStream.flush succeeds even when flush to storage fails (#2392 ) Contributed by Anoop Sam John.	2020-10-26 13:31:53 +00:00
Sneha Vijayarajan	d5b4d04b0d	HADOOP-17301. ABFS: read-ahead error reporting breaks buffer management (#2369 ) Fixes read-ahead buffer management issues introduced by HADOOP-16852, "ABFS: Send error back to client for Read Ahead request failure". Contributed by Sneha Vijayarajan	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	da5db6a5a6	HADOOP-17279: ABFS: testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account. Contributed by Sneha Vijayarajan Testing: namespace.enabled=false auth.type=SharedKey $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 246 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=SharedKey $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 33 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 74 Tests run: 207, Failures: 0, Errors: 0, Skipped: 140	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	d166420302	HADOOP-17215: Support for conditional overwrite. Contributed by Sneha Vijayarajan DETAILS: This change adds config key "fs.azure.enable.conditional.create.overwrite" with a default of true. When enabled, if create(path, overwrite: true) is invoked and the file exists, the ABFS driver will first obtain its etag and then attempt to overwrite the file on the condition that the etag matches. The purpose of this is to mitigate the non-idempotency of this method. Specifically, in the event of a network error or similar, the client will retry and this can result in the file being created more than once which may result in data loss. In essense this is like a poor man's file handle, and will be addressed more thoroughly in the future when support for lease is added to ABFS. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 42 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 74 Tests run: 207, Failures: 0, Errors: 0, Skipped: 140	2020-10-14 22:29:13 +00:00
bilaharith	f208da286c	HADOOP-17166. ABFS: configure output stream thread pool (#2179 ) Adds the options to control the size of the per-output-stream threadpool when writing data through the abfs connector * fs.azure.write.max.concurrent.requests * fs.azure.write.max.requests.to.queue Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	cc7350302f	HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	4072323de4	Upgrade store REST API version to 2019-12-12 - Contributed by Sneha Vijayarajan	2020-10-14 22:29:13 +00:00
bilaharith	e481d0108a	HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	f73c90f0b0	HADOOP-17163. ABFS: Adding debug log for rename failures - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	fbf151ef6f	HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
Dongjoon Hyun	5032f8abba	HADOOP-17258. Magic S3Guard Committer to overwrite existing pendingSet file on task commit (#2371 ) Contributed by Dongjoon Hyun and Steve Loughran Change-Id: Ibaf8082e60eff5298ff4e6513edc386c5bae0274	2020-10-12 13:42:08 +01:00
Steve Loughran	963793dd48	HADOOP-17293. S3A to always probe S3 in S3A getFileStatus on non-auth paths This reverts changes in HADOOP-13230 to use S3Guard TTL in choosing when to issue a HEAD request; fixing tests to compensate. New org.apache.hadoop.fs.s3a.performance.OperationCost cost, S3GUARD_NONAUTH_FILE_STATUS_PROBE for use in cost tests. Contributed by Steve Loughran. Change-Id: I418d55d2d2562a48b2a14ec7dee369db49b4e29e	2020-10-08 15:38:32 +01:00
Mukund Thakur	475dba1ddf	HADOOP-17281 Implement FileSystem.listStatusIterator() in S3AFileSystem (#2354 ) Contains HADOOP-17300: FileSystem.DirListingIterator.next() call should return NoSuchElementException Contributed by Mukund Thakur Change-Id: I4e7e5c6e295525db9e2de6f416f32bbb81e146d3	2020-10-07 14:00:23 +01:00
bilaharith	d80dfad900	HADOOP-17183. ABFS: Enabling checkaccess on ABFS (#2331 ) Contributed by Bilahari TH Change-Id: If4224697deed733d6db44145994cdd85547c27d1	2020-10-01 21:29:48 +01:00
Mukund Thakur	7e642ec5a3	HADOOP-17023. Tune S3AFileSystem.listStatus() (#2257 ) S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur. Change-Id: I7cc5f87aa16a4819e245e0fbd2aad226bd500f3f	2020-09-21 17:30:15 +01:00
Steve Loughran	aa80bcb1ec	Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 )" This reverts commit `0c82eb0324`. Change-Id: I6bd100d9de19660b0f28ee0ab16faf747d6d9f05	2020-09-11 18:07:05 +01:00
Steve Loughran	0c82eb0324	HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 ) This changes directory tree deletion so that only files are incrementally deleted from S3Guard after the objects are deleted; the directories are left alone until metadataStore.deleteSubtree(path) is invoke. This avoids directory tombstones being added above files/child directories, which stop the treewalk and delete phase from working. Also: * Callback to delete objects splits files and dirs so that any problems deleting the dirs doesn't trigger s3guard updates * New statistic to measure #of objects deleted, alongside request count. * Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers to clarify behavior. * Test enhancements to replicate the failure and verify the fix Contributed by Steve Loughran Change-Id: I0e6ea2c35e487267033b1664228c8837279a35c7	2020-09-10 17:29:33 +01:00
Mehakmeet Singh	ccceec8af0	HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272 ) Contributed by: Mehakmeet Singh. Change-Id: I7ebfa5cd1b5d25f7a750f0c645d7d93c81e89240	2020-09-08 14:02:28 +01:00
Mehakmeet Singh	28f1ded9fe	HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154 ) Contributed by Mehakmeet Singh Change-Id: I6bbd8165385a9267ed64831bb1efa18b6554feb1	2020-09-08 14:02:02 +01:00
Mehakmeet Singh	7970710418	HADOOP-17229. No update of bytes received counter value after response failure occurs in ABFS (#2264 ) Contributed by Mehakmeet Singh Change-Id: Ia9ad1b87a460b10d27486bd00ee67c3cedd2b5b5	2020-09-08 13:26:24 +01:00
Mukund Thakur	5236c96ead	HADOOP-17167 ITestS3AEncryptionWithDefaultS3Settings failing (#2187 ) Now skips ITestS3AEncryptionWithDefaultS3Settings.testEncryptionOverRename when server side encryption is not set to sse:kms Contributed by Mukund Thakur Change-Id: Ifd83d353e9c7c6f7e1195a2c2f138d85cf876bb1	2020-09-04 15:00:30 +01:00
Steve Loughran	38354006f8	HADOOP-17227. S3A Marker Tool tuning (#2254 ) Contributed by Steve Loughran.	2020-09-04 14:58:54 +01:00
Mehakmeet Singh	f6e1ed4f6b	HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216 ) Contributed by Mehakmeet Singh. Change-Id: I120c9a068d758d8e5d071c878a3b7fbeb95e4de6	2020-08-27 11:28:37 +01:00
Mukund Thakur	0840c0c1f3	HADOOP-17074. S3A Listing to be fully asynchronous. (#2207 ) Contributed by Mukund Thakur. Change-Id: I1b0574a0c9ebc0805f285dd5280a00e5add081f1	2020-08-25 11:30:42 +01:00
swamirishi	ba4f7fb332	HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133 ) Contributed by Swaminathan Balachandran Change-Id: I86f956dd4ab0b278d923fe7b70037e6b929a8aa1	2020-08-22 18:51:10 +01:00
Steve Loughran	49f8ae965e	HADOOP-13230. S3A to optionally retain directory markers. This adds an option to disable "empty directory" marker deletion, so avoid throttling and other scale problems. This feature is not backwards compatible. Consult the documentation and use with care. Contributed by Steve Loughran. Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958	2020-08-15 20:19:49 +01:00
Mukund Thakur	571737f4ac	HADOOP-17192. ITestS3AHugeFilesSSECDiskBlock failing (#2221 ) Contributed by Mukund Thakur	2020-08-13 14:33:27 +01:00
Ayush Saxena	2943e6650f	HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui.	2020-08-07 22:20:42 +05:30
Mukund Thakur	251d2d1fa5	HADOOP-17131. Refactor S3A Listing code for better isolation. (#2148 ) Contributed by Mukund Thakur. Change-Id: I79160b236a92fdd67565a4b4974f1862e600c210	2020-08-04 17:13:06 +01:00
Sneha Vijayarajan	18ca80331c	Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger - Contributed by Sneha Vijayarajan	2020-07-25 13:13:18 +00:00
ishaniahuja	f24e2ec487	HADOOP-17058. ABFS: Support for AppendBlob in Hadoop ABFS Driver - Contributed by Ishani Ahuja	2020-07-25 13:12:32 +00:00
Mehakmeet Singh	7c9b459786	HADOOP-16961. ABFS: Adding metrics to AbfsInputStream (#2076 ) Contributed by Mehakmeet Singh.	2020-07-25 13:12:09 +00:00
Mehakmeet Singh	bbd3278d09	HADOOP-17065. Add Network Counters to ABFS (#2056 ) Contributed by Mehakmeet Singh.	2020-07-25 13:11:34 +00:00
Karthik Amarnath	8b7e77443d	HDFS-15168: ABFS enhancement to translate AAD to Linux identities. (#1978 )	2020-07-25 13:10:39 +00:00
Sneha Vijayarajan	903935da0f	HADOOP-17053. ABFS: Fix Account-specific OAuth config setting parsing Contributed by Sneha Vijayarajan	2020-07-25 13:10:30 +00:00
Sneha Vijayarajan	869a68b81e	HADOOP-16852: Report read-ahead error back Contributed by Sneha Vijayarajan	2020-07-25 13:10:19 +00:00
Sneha Vijayarajan	27b20f9689	HADOOP-17054. ABFS: Fix test AbfsClient authentication instance Contributed by Sneha Vijayarajan	2020-07-25 13:09:26 +00:00
Sneha Vijayarajan	eed06b46eb	Hadoop-17015. ABFS: Handling Rename and Delete idempotency Contributed by Sneha Vijayarajan.	2020-07-25 13:08:01 +00:00
bilaharith	1ae72d2438	HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException - Contributed by Bilahari T H Change-Id: Id9576d9509faaf057bf419ccb1879ac0cef7a07b	2020-07-22 18:26:36 +01:00
Ayush Saxena	e3b8d4eb05	HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein.	2020-07-22 18:21:14 +05:30
Steve Loughran	5aa9396a58	HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118 ) Contributed by Steve Loughran. Change-Id: I972264aed36f384b7ae23e214326ef7870261cf5	2020-07-20 10:54:22 +01:00
bilaharith	e01852181a	HADOOP-16682. ABFS: Removing unnecessary toString() invocations - Contributed by Bilahari T H Change-Id: Id55495b44d81533d1d3654de2553c709f505f7eb	2020-07-20 10:53:59 +01:00
Mehakmeet Singh	0d88ed2794	HADOOP-17129. Validating storage keys in ABFS correctly (#2141 ) Contributed by Mehakmeet Singh Change-Id: I8016ee2f9ffbc86ea867f4a3d960b134e507d099	2020-07-16 18:11:52 +01:00
Mukund Thakur	8b601ad7e6	HADOOP-17022. Tune S3AFileSystem.listFiles() API. Contributed by Mukund Thakur. Change-Id: I17f5cfdcd25670ce3ddb62c13378c7e2dc06ba52	2020-07-14 15:28:27 +01:00
Anoop Sam John	cac2fc1f58	HADOOP-16998. WASB : NativeAzureFsOutputStream#close() throwing IllegalArgumentException (#2073 ) Contributed by Anoop Sam John.	2020-07-14 14:08:46 +01:00
jimmy-zuber-amzn	79fc58def3	HADOOP-17105. S3AFS - Do not attempt to resolve symlinks in globStatus (#2113 ) Contributed by Jimmy Zuber. Change-Id: I2f247c2d2ab4f38214073e55f5cfbaa15aeaeb11	2020-07-13 19:09:50 +01:00
Steve Loughran	a51d72f0c6	HDFS-13934. Multipart uploaders to be created through FileSystem/FileContext. Contributed by Steve Loughran. Change-Id: Iebd34140c1a0aa71f44a3f4d0fee85f6bdf123a3	2020-07-13 13:32:04 +01:00
Sebastian Nagel	f9619b0b97	HADOOP-17117 Fix typos in hadoop-aws documentation (#2127 ) (cherry picked from commit `5b1ed2113b`)	2020-07-09 00:04:46 +09:00
bilaharith	19fb204011	HADOOP-17086. ABFS: Making the ListStatus response ignore unknown properties. (#2101 ) Contributed by Bilahari T H. Change-Id: I82e4683fba8481aef2abab7a6a99e5752f6fffa9	2020-07-03 19:02:21 +01:00
Steve Loughran	7de1ac0547	HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963 ) Contributed by Steve Loughran. Fixes a condition which can cause job commit to fail if a task was aborted < 60s before the job commit commenced: the task abort will shut down the thread pool with a hard exit after 60s; the job commit POST requests would be scheduled through the same pool, so be interrupted and fail. At present the access is synchronized, but presumably the executor shutdown code is calling wait() and releasing locks. Task abort is triggered from the AM when task attempts succeed but there are still active speculative task attempts running. Thus it only surfaces when speculation is enabled and the final tasks are speculating, which, given they are the stragglers, is not unheard of. Note: this problem has never been seen in production; it has surfaced in the hadoop-aws tests on a heavily overloaded desktop Change-Id: I3b433356d01fcc50d88b4353dbca018484984bc8	2020-06-30 10:52:56 +01:00
Thomas Marquardt	ee192c4826	HADOOP-17089: WASB: Update azure-storage-java SDK Contributed by Thomas Marquardt DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency bug in the Azure Storage Java SDK that can cause the results of a list blobs operation to appear empty. This causes the Filesystem listStatus and similar APIs to return empty results. This has been seen in Spark work loads when jobs use more than one executor core. See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK. TESTS: A new test was added to validate the fix. All tests are passing: wasb: mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify Tests run: 248, Failures: 0, Errors: 0, Skipped: 11 Tests run: 651, Failures: 0, Errors: 0, Skipped: 65 abfs: mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 64, Failures: 0, Errors: 0, Skipped: 0 Tests run: 437, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-25 05:43:32 +00:00
Thomas Marquardt	63d236c019	HADOOP-17076: ABFS: Delegation SAS Generator Updates Contributed by Thomas Marquardt. DETAILS: 1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client. 2) Add support and test cases for getXattr and setXAttr. 3) Update DelegationSASGenerator and related to use Duration instead of int for time periods. 4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions. 5) Cleanup SASGenerator classes to use String.equals instead of ==. TESTS: Added tests for getXAttr and setXAttr. All tests are passing against my account in eastus2euap: $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 76, Failures: 0, Errors: 0, Skipped: 0 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-19 19:19:31 +00:00
bilaharith	d639c11986	HADOOP-17004. Fixing a formatting issue Contributed by Bilahari T H.	2020-06-19 19:11:06 +00:00
bilaharith	11307f3be9	HADOOP-17004. ABFS: Improve the ABFS driver documentation Contributed by Bilahari T H.	2020-06-19 19:10:22 +00:00
Thomas Marquardt	af98f32f7d	HADOOP-16916: ABFS: Delegation SAS generator for integration with Ranger Contributed by Thomas Marquardt. DETAILS: Previously we had a SASGenerator class which generated Service SAS, but we need to add DelegationSASGenerator. I separated SASGenerator into a base class and two subclasses ServiceSASGenerator and DelegationSASGenreator. The code in ServiceSASGenerator is copied from SASGenerator but the DelegationSASGenrator code is new. The DelegationSASGenerator code demonstrates how to use Delegation SAS with minimal permissions, as would be used by an authorization service such as Apache Ranger. Adding this to the tests helps us lock in this behavior. Added a MockDelegationSASTokenProvider for testing User Delegation SAS. Fixed the ITestAzureBlobFileSystemCheckAccess tests to assume oauth client ID so that they are ignored when that is not configured. To improve performance, AbfsInputStream/AbfsOutputStream re-use SAS tokens until the expiry is within 120 seconds. After this a new SAS will be requested. The default period of 120 seconds can be changed using the configuration setting "fs.azure.sas.token.renew.period.for.streams". The SASTokenProvider operation names were updated to correspond better with the ADLS Gen2 REST API, since these operations must be provided tokens with appropriate SAS parameters to succeed. Support for the version 2.0 AAD authentication endpoint was added to AzureADAuthenticator. The getFileStatus method was mistakenly calling the ADLS Gen2 Get Properties API which requires read permission while the getFileStatus call only requires execute permission. ADLS Gen2 Get Status API is supposed to be used for this purpose, so the underlying AbfsClient.getPathStatus API was updated with a includeProperties parameter which is set to false for getFileStatus and true for getXAttr. Added SASTokenProvider support for delete recursive. Fixed bugs in AzureBlobFileSystem where public methods were not validating the Path by calling makeQualified. This is necessary to avoid passing null paths and to convert relative paths into absolute paths. Canonicalized the path used for root path internally so that root path can be used with SAS tokens, which requires that the path in the URL and the path in the SAS token match. Internally the code was using "//" instead of "/" for the root path, sometimes. Also related to this, the AzureBlobFileSystemStore.getRelativePath API was updated so that we no longer remove and then add back a preceding forward / to paths. To run ITestAzureBlobFileSystemDelegationSAS tests follow the instructions in testing_azure.md under the heading "To run Delegation SAS test cases". You also need to set "fs.azure.enable.check.access" to true. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 41 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=false auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 244 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=SharedKey sas.token.provider.type=MockDelegationSASTokenProvider enable.check.access=true ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 63, Failures: 0, Errors: 0, Skipped: 0 Tests run: 432, Failures: 0, Errors: 1, Skipped: 74 Tests run: 206, Failures: 0, Errors: 0, Skipped: 140	2020-06-19 19:00:46 +00:00
Mehakmeet Singh	a2f44344c3	HADOOP-17018. Intermittent failing of ITestAbfsStreamStatistics in ABFS (#1990 ) Contributed by: Mehakmeet Singh In some cases, ABFS-prefetch thread runs in the background which returns some bytes from the buffer and gives an extra readOp. Thus, making readOps values arbitrary and giving intermittent failures in some cases. Hence, readOps values of 2 or 3 are seen in different setups.	2020-06-19 19:00:04 +00:00
bilaharith	76ee7e5494	HADOOP-17002. ABFS: Adding config to determine if the account is HNS enabled or not Contributed by Bilahari T H.	2020-06-19 18:57:47 +00:00
Steve Loughran	5e290e702f	HADOOP-17050. S3A to support additional token issuers Contributed by Steve Loughran. S3A delegation token providers will be asked for any additional token issuers, an array can be returned, each one will be asked for tokens when DelegationTokenIssuer collects all the tokens for a filesystem. Change-Id: I1bd3035bbff98cbd8e1d1ac7fc615d937e6bb7bb	2020-06-09 14:43:02 +01:00
Mehakmeet Singh	1714589609	HADOOP-17016. Adding Common Counters in ABFS (#1991 ). Contributed by: Mehakmeet Singh. Change-Id: Ib84e7a42f28e064df4c6204fcce33e573360bf42	2020-06-03 20:02:44 +01:00
Steve Loughran	8a642caca8	HADOOP-16568. S3A FullCredentialsTokenBinding fails if local credentials are unset. (#1441 ) Contributed by Steve Loughran. Move the loading to deployUnbonded (where they are required) and add a safety check when a new DT is requested Change-Id: I03c69aa2e16accfccddca756b2771ff832e7dd58	2020-06-03 17:08:52 +01:00
Mukund Thakur	b0c9e4f1b5	HADOOP-16900. Very large files can be truncated when written through the S3A FileSystem. Contributed by Mukund Thakur and Steve Loughran. This patch ensures that writes to S3A fail when more than 10,000 blocks are written. That upper bound still exists. To write massive files, make sure that the value of fs.s3a.multipart.size is set to a size which is large enough to upload the files in fewer than 10,000 blocks. Change-Id: Icec604e2a357ffd38d7ae7bc3f887ff55f2d721a	2020-06-01 12:01:13 +01:00
Masatake Iwasaki	4d30c395f7	HADOOP-17040. Fix intermittent failure of ITestBlockingThreadPoolExecutorService. (#2020 ) (cherry picked from commit `9685314633`)	2020-05-22 18:53:53 +09:00
Steve Loughran	f2f727359b	Revert "HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default)." This reverts commit `44350fdf49`. It is related to the rollback of HADOOP-8143. Change-Id: If48e3dd670c920ada702dc36461ff398fe9d35cc	2020-05-14 19:20:34 +01:00
Steve Loughran	7f2cf334a8	Revert "HADOOP-8143. Change distcp to have -pb on by default." This reverts commit `dd65eea74b`. Change-Id: I74180cf59d5bbad8c9f66cb331535addcbea863e	2020-05-14 19:14:39 +01:00
Masatake Iwasaki	6a0aeea60f	HADOOP-17025. Fix invalid metastore configuration in S3GuardTool tests. (#1994 ) (cherry picked from commit `99840aaba6`)	2020-05-07 12:02:15 +09:00
Akira Ajisaka	dfa7f160a5	Preparing for 3.3.1 development	2020-04-30 13:33:42 +09:00
Mehakmeet Singh	2471ba8b5c	HADOOP-16914 Adding Output Stream Counters in ABFS (#1899 ) Contributed by Mehakmeet Singh.There	2020-04-23 14:38:15 +01:00
Sneha Vijayarajan	32fb174da2	Hadoop 16857. ABFS: Stop CustomTokenProvider retry logic to depend on AbfsRestOp retry policy Contributed by Sneha Vijayarajan	2020-04-23 14:37:25 +01:00
bilaharith	f53ded6185	HADOOP-16922. ABFS: Change User-Agent header (#1938 ) Contributed by Bilahari T H.	2020-04-22 14:32:30 +01:00
Mukund Thakur	98fdbb820e	HADOOP-16965. Refactor abfs stream configuration. (#1956 ) Contributed by Mukund Thakur.	2020-04-22 14:32:01 +01:00
Mehakmeet Singh	f74a571fdf	HADOOP-16910 : ABFS Streams to update FileSystem.Statistics counters on IO. (#1918 ). Contributed by Mehakmeet Singh.	2020-04-22 14:30:25 +01:00
Steve Loughran	0982f56f3a	HADOOP-16953. tuning s3guard disabled warnings (#1962 ) Contributed by Steve Loughran. The S3Guard absence warning of HADOOP-16484 has been changed so that by default the S3A connector only logs at debug when the connection to the S3 Store does not have S3Guard enabled. The option to control this log level is now fs.s3a.s3guard.disabled.warn.level and can be one of: silent, inform, warn, fail. On a failure, an ExitException is raised with exit code 49. For details on this safety feature, consult the s3guard documentation. Change-Id: If868671c9260977c2b03b3e475b9c9531c98ce79	2020-04-20 15:07:00 +01:00
Steve Loughran	de9a6b4588	HADOOP-16986. S3A to not need wildfly on the classpath. (#1948 ) Contributed by Steve Loughran. This is a successor to HADOOP-16346, which enabled the S3A connector to load the native openssl SSL libraries for better HTTPS performance. That patch required wildfly.jar to be on the classpath. This update: * Makes wildfly.jar optional except in the special case that "fs.s3a.ssl.channel.mode" is set to "openssl" * Retains the declaration of wildfly.jar as a compile-time dependency in the hadoop-aws POM. This means that unless explicitly excluded, applications importing that published maven artifact will, transitively, add the specified wildfly JAR into their classpath for compilation/testing/ distribution. This is done for packaging and to offer that optional speedup. It is not mandatory: applications importing the hadoop-aws POM can exclude it if they choose. Change-Id: I7ed3e5948d1e10ce21276b3508871709347e113d	2020-04-20 14:42:36 +01:00
Mukund Thakur	96d7ceb39a	HADOOP-13873. log DNS addresses on s3a initialization. Contributed by Mukund Thakur. If you set the log org.apache.hadoop.fs.s3a.impl.NetworkBinding to DEBUG, then when the S3A bucket probe is made -the DNS address of the S3 endpoint is calculated and printed. This is useful to see if a large set of processes are all using the same IP address from the pool of load balancers to which AWS directs clients when an AWS S3 endpoint is resolved. This can have implications for performance: if all clients access the same load balancer performance may be suboptimal. Note: if bucket probes are disabled, fs.s3a.bucket.probe = 0, the DNS logging does not take place. Change-Id: I21b3ac429dc0b543f03e357fdeb94c2d2a328dd8	2020-04-17 14:20:54 +01:00
Mukund Thakur	94da630cd2	HADOOP-16465 listLocatedStatus() optimisation (#1943 ) Contributed by Mukund Thakur Optimize S3AFileSystem.listLocatedStatus() to perform list operations directly and then fallback to head checks for files Change-Id: Ia2c0fa6fcc5967c49b914b92f41135d07dab0464	2020-04-15 17:04:55 +01:00
bilaharith	6bae8c46a8	HADOOP-16855. Changing wildfly dependency scope in hadoop-azure to compile Contributed by Biliharith	2020-04-14 19:18:15 +01:00
Steve Loughran	68a9562848	HADOOP-16941. ITestS3GuardOutOfBandOperations.testListingDelete failing on versioned bucket (#1919 ) Contributed by Steve Loughran. Removed the failing probe and replacing with two probes which will fail on both versioned and unversioned buckets.	2020-04-14 10:58:13 +01:00
Steve Loughran	e4331a73c9	HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail against S3 (#1936 ) Contributed by Steve Loughran. This strips out all the -p preservation options which have already been processed when uploading a file before deciding whether or not to query the far end for the status of the (existing/uploaded) file to see if any other attributes need changing. This will avoid 404 caching-related issues in S3, wherein a newly created file can have a 404 entry in the S3 load balancer's cache from the probes for the file's existence prior to the upload. It partially addresses a regression caused by HADOOP-8143, "Change distcp to have -pb on by default" that causes a resurfacing of HADOOP-13145, "In DistCp, prevent unnecessary getFileStatus call when not preserving metadata" Change-Id: Ibc25d19e92548e6165eb8397157ebf89446333f7	2020-04-09 18:23:47 +01:00
Steve Loughran	eaaaba12b1	HADOOP-16939 fs.s3a.authoritative.path should support multiple FS URIs (#1914 ) add unit test, new ITest and then fix the issue: different schema, bucket == skip factored out the underlying logic for unit testing; also moved maybeAddTrailingSlash to S3AUtils (while retaining/forwarnding existing method in S3AFS). tested: london, sole failure is testListingDelete[auth=true](org.apache.hadoop.fs.s3a.ITestS3GuardOutOfBandOperations) filed HADOOP-16853 Change-Id: I4b8d0024469551eda0ec70b4968cba4abed405ed	2020-03-26 12:59:11 -06:00
Steve Loughran	745a6c1e69	Revert "HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob" This reverts commit `3612317038`. Change-Id: Ie0d36f25de0b55a937894f4d9963c495bae0576a	2020-03-26 15:24:37 +00:00
Sunil G	cdb2107066	YARN-9879. Allow multiple leaf queues with the same name in CapacityScheduler. Contributed by Gergely Pollak.	2020-03-25 16:50:19 +05:30
Nicholas Chammas	25a03bfece	HADOOP-16930. Add hadoop-aws documentation for ProfileCredentialsProvider Contributed by Nicholas Chammas.	2020-03-25 10:39:35 +00:00
Steve Loughran	28afdce009	Revert ""HADOOP-16910. ABFS Streams to update FileSystem.Statistics counters on IO." This reverts commit `e2c7ac71b5`. Change-Id: I5b5a93f5a36cdb0c3d56d1b3f747c318f089de20	2020-03-24 12:11:18 +00:00
Mehakmeet Singh	e2c7ac71b5	ABFS Streams to update FileSystem.Statistics counters on IO. Contributed by Mehakmeet Singh	2020-03-23 13:50:18 +00:00
ishaniahuja	3612317038	HADOOP-16818. ABFS: Combine append+flush calls for blockblob & appendblob Contributed by Ishani Ahuja.	2020-03-20 10:27:41 +00:00
bilaharith	6ce5f8734f	HADOOP-16920 ABFS: Make list page size configurable. Contributed by Bilahari T H. The page limit is set in "fs.azure.list.max.results"; default value is 500. There's currently a limit of 5000 in the store -there are no range checks in the client code so that limit can be changed on the server without any need to update the abfs connector.	2020-03-18 14:14:18 +00:00
Gabor Bota	c91ff8c18f	HADOOP-16858. S3Guard fsck: Add option to remove orphaned entries (#1851 ). Contributed by Gabor Bota. Adding a new feature to S3GuardTool's fsck: -fix. Change-Id: I2cdb6601fea1d859b54370046b827ef06eb1107d	2020-03-18 12:48:52 +01:00
Steve Loughran	8d6373483e	HADOOP-16319. S3A Etag tests fail with default encryption enabled on bucket. Contributed by Ben Roling. ETag values are unpredictable with some S3 encryption algorithms. Skip ITestS3AMiscOperations tests which make assertions about etags when default encryption on a bucket is enabled. When testing with an AWS an account which lacks the privilege for a call to getBucketEncryption(), we don't skip the tests. In the event of failure, developers get to expand the permissions of the account or relax default encryption settings.	2020-03-17 13:31:48 +00:00

1 2 3 4 5 ...

1472 Commits