hadoop

Author	SHA1	Message	Date
Mukund Thakur	83c7c2b4c4	HADOOP-17023. Tune S3AFileSystem.listStatus() (#2257 ) S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur.	2020-09-21 17:20:16 +01:00
Sneha Vijayarajan	e31a636e92	HADOOP-17215: Support for conditional overwrite. Contributed by Sneha Vijayarajan DETAILS: This change adds config key "fs.azure.enable.conditional.create.overwrite" with a default of true. When enabled, if create(path, overwrite: true) is invoked and the file exists, the ABFS driver will first obtain its etag and then attempt to overwrite the file on the condition that the etag matches. The purpose of this is to mitigate the non-idempotency of this method. Specifically, in the event of a network error or similar, the client will retry and this can result in the file being created more than once which may result in data loss. In essense this is like a poor man's file handle, and will be addressed more thoroughly in the future when support for lease is added to ABFS. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 42 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 74 Tests run: 207, Failures: 0, Errors: 0, Skipped: 140	2020-09-19 01:28:44 +00:00
ThomasMarquardt	0dc54d0247	HADOOP-17203: Revert HADOOP-17183. ABFS: Enabling checkaccess on ABFS This reverts commit `a2610e21ed`.	2020-09-18 17:52:11 -07:00
Steve Loughran	958cab804e	Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 )" This reverts commit `9960c01a25`. Change-Id: I820534c3292f2a343693d835f625488c325fb5d6	2020-09-11 18:07:49 +01:00
Steve Loughran	9960c01a25	HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 ) This changes directory tree deletion so that only files are incrementally deleted from S3Guard after the objects are deleted; the directories are left alone until metadataStore.deleteSubtree(path) is invoke. This avoids directory tombstones being added above files/child directories, which stop the treewalk and delete phase from working. Also: * Callback to delete objects splits files and dirs so that any problems deleting the dirs doesn't trigger s3guard updates * New statistic to measure #of objects deleted, alongside request count. * Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers to clarify behavior. * Test enhancements to replicate the failure and verify the fix Contributed by Steve Loughran	2020-09-10 17:03:52 +01:00
bilaharith	85119267be	HADOOP-17166. ABFS: configure output stream thread pool (#2179 ) Adds the options to control the size of the per-output-stream threadpool when writing data through the abfs connector * fs.azure.write.max.concurrent.requests * fs.azure.write.max.requests.to.queue Contributed by Bilahari T H	2020-09-09 16:41:36 +01:00
Mehakmeet Singh	0d855159f0	HADOOP-17229. No updation of bytes received counter value after response failure occurs in ABFS (#2264 ) Contributed by Mehakmeet Singh	2020-09-08 10:14:23 +01:00
Mehakmeet Singh	84ed6adccc	HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272 ) Contributed by: Mehakmeet Singh.	2020-09-08 10:11:06 +01:00
Steve Loughran	5346cc3263	HADOOP-17227. S3A Marker Tool tuning (#2254 ) Contributed by Steve Loughran.	2020-09-04 14:58:03 +01:00
Mukund Thakur	139a43e98e	HADOOP-17167 ITestS3AEncryptionWithDefaultS3Settings failing (#2187 ) Now skips ITestS3AEncryptionWithDefaultS3Settings.testEncryptionOverRename when server side encryption is not set to sse:kms Contributed by Mukund Thakur	2020-09-03 19:35:24 +01:00
Mehakmeet Singh	d1c60a53f6	HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216 ) Contributed by Mehakmeet Singh.	2020-08-27 11:27:00 +01:00
Mukund Thakur	cc641534dc	HADOOP-17074. S3A Listing to be fully asynchronous. (#2207 ) Contributed by Mukund Thakur.	2020-08-25 11:29:43 +01:00
bilaharith	64f36b9543	HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance - Contributed by Bilahari T H	2020-08-24 12:00:55 -07:00
swamirishi	872c2909bd	HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133 ) Contributed by Swaminathan Balachandran	2020-08-22 18:48:21 +01:00
Sneha Vijayarajan	b367942fe4	Upgrade store REST API version to 2019-12-12 - Contributed by Sneha Vijayarajan	2020-08-17 10:17:18 -07:00
Steve Loughran	5092ea62ec	HADOOP-13230. S3A to optionally retain directory markers. This adds an option to disable "empty directory" marker deletion, so avoid throttling and other scale problems. This feature is not backwards compatible. Consult the documentation and use with care. Contributed by Steve Loughran. Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958	2020-08-15 12:51:08 +01:00
Mukund Thakur	4a400d3193	HADOOP-17192. ITestS3AHugeFilesSSECDiskBlock failing (#2221 ) Contributed by Mukund Thakur	2020-08-13 14:21:49 +01:00
Ayush Saxena	975b6024dd	HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui.	2020-08-07 22:19:17 +05:30
bilaharith	a2610e21ed	HADOOP-17183. ABFS: Enabling checkaccess on ABFS - Contributed by Bilahari T H	2020-08-06 14:52:02 -07:00
bilaharith	3f73facd7b	HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled - Contributed by Bilahari T H	2020-08-05 10:01:04 -07:00
bilaharith	c566cabd62	HADOOP-17163. ABFS: Adding debug log for rename failures - Contributed by Bilahari T H	2020-08-05 09:38:13 -07:00
Mukund Thakur	ac697571a1	HADOOP-17186. Fixing javadoc in ListingOperationCallbacks (#2196 )	2020-08-05 20:40:49 +09:00
Mukund Thakur	8fd4f5490f	HADOOP-17131. Refactor S3A Listing code for better isolation. (#2148 ) Contributed by Mukund Thakur.	2020-08-04 16:00:02 +01:00
Akira Ajisaka	c40cbc57fa	HADOOP-17091. [JDK11] Fix Javadoc errors (#2098 )	2020-08-03 10:46:51 +09:00
bilaharith	a7fda2e38f	HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic - Contributed by Bilahari T H	2020-07-31 12:27:57 -07:00
Mehakmeet Singh	48a7c5b6ba	HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154 ) Contributed by Mehakmeet Singh	2020-07-22 18:22:30 +01:00
Masatake Iwasaki	1b29c9bfee	HADOOP-17138. Fix spotbugs warnings surfaced after upgrade to 4.0.6. (#2155 )	2020-07-22 13:40:20 +09:00
Sneha Vijayarajan	d23cc9d85d	Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger - Contributed by Sneha Vijayarajan	2020-07-21 09:22:38 -07:00
bilaharith	b4b23ef0d1	HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException - Contributed by Bilahari T H	2020-07-21 09:18:54 -07:00
Mukund Thakur	bb459d4dd6	HADOOP-17136. ITestS3ADirectoryPerformance.testListOperations failing (#2153 ) A regression caused by HADOOP-17022: the reduction in LIST calls broken an assertion. Contributed by Mukund Thakur	2020-07-20 16:58:50 +01:00
Steve Loughran	9f407bcc88	HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118 ) Contributed by Steve Loughran.	2020-07-20 10:51:26 +01:00
bilaharith	99655167f3	HADOOP-16682. ABFS: Removing unnecessary toString() invocations - Contributed by Bilahari T H	2020-07-18 10:00:18 -07:00
Ayush Saxena	6bcb24d269	HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein.	2020-07-18 14:33:43 +05:30
Mehakmeet Singh	4083fd57b5	HADOOP-17129. Validating storage keys in ABFS correctly (#2141 ) Contributed by Mehakmeet Singh	2020-07-16 17:29:37 +01:00
Mukund Thakur	4647a60430	HADOOP-17022. Tune S3AFileSystem.listFiles() API. Contributed by Mukund Thakur. Change-Id: I17f5cfdcd25670ce3ddb62c13378c7e2dc06ba52	2020-07-14 15:27:35 +01:00
Anoop Sam John	380e0f4506	HADOOP-16998. WASB : NativeAzureFsOutputStream#close() throwing IllegalArgumentException (#2073 ) Contributed by Anoop Sam John.	2020-07-14 14:07:27 +01:00
jimmy-zuber-amzn	806d84b79c	HADOOP-17105. S3AFS - Do not attempt to resolve symlinks in globStatus (#2113 ) Contributed by Jimmy Zuber.	2020-07-13 19:07:48 +01:00
Steve Loughran	b9fa5e0182	HDFS-13934. Multipart uploaders to be created through FileSystem/FileContext. Contributed by Steve Loughran. Change-Id: Iebd34140c1a0aa71f44a3f4d0fee85f6bdf123a3	2020-07-13 13:30:02 +01:00
Sebastian Nagel	5b1ed2113b	HADOOP-17117 Fix typos in hadoop-aws documentation (#2127 )	2020-07-09 00:03:15 +09:00
ishaniahuja	d20109c171	HADOOP-17058. ABFS: Support for AppendBlob in Hadoop ABFS Driver - Contributed by Ishani Ahuja	2020-07-04 13:25:14 -07:00
bilaharith	e0cededfbd	HADOOP-17086. ABFS: Making the ListStatus response ignore unknown properties. (#2101 ) Contributed by Bilahari T H.	2020-07-03 19:00:22 +01:00
Mehakmeet Singh	3b5c9a90c0	HADOOP-16961. ABFS: Adding metrics to AbfsInputStream (#2076 ) Contributed by Mehakmeet Singh.	2020-07-03 11:41:35 +01:00
Yiqun Lin	ff8bb67200	HDFS-15374. Add documentation for fedbalance tool. Contributed by Jinglun.	2020-07-01 14:18:18 +08:00
Yiqun Lin	de2cb86260	HDFS-15410. Add separated config file hdfs-fedbalance-default.xml for fedbalance tool. Contributed by Jinglun.	2020-07-01 14:06:27 +08:00
Steve Loughran	4249c04d45	HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963 ) Contributed by Steve Loughran. Fixes a condition which can cause job commit to fail if a task was aborted < 60s before the job commit commenced: the task abort will shut down the thread pool with a hard exit after 60s; the job commit POST requests would be scheduled through the same pool, so be interrupted and fail. At present the access is synchronized, but presumably the executor shutdown code is calling wait() and releasing locks. Task abort is triggered from the AM when task attempts succeed but there are still active speculative task attempts running. Thus it only surfaces when speculation is enabled and the final tasks are speculating, which, given they are the stragglers, is not unheard of. Note: this problem has never been seen in production; it has surfaced in the hadoop-aws tests on a heavily overloaded desktop	2020-06-30 10:44:51 +01:00
Thomas Marquardt	4b5b54c73f	HADOOP-17089: WASB: Update azure-storage-java SDK Contributed by Thomas Marquardt DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency bug in the Azure Storage Java SDK that can cause the results of a list blobs operation to appear empty. This causes the Filesystem listStatus and similar APIs to return empty results. This has been seen in Spark work loads when jobs use more than one executor core. See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK. TESTS: A new test was added to validate the fix. All tests are passing: wasb: mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify Tests run: 248, Failures: 0, Errors: 0, Skipped: 11 Tests run: 651, Failures: 0, Errors: 0, Skipped: 65 abfs: mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 64, Failures: 0, Errors: 0, Skipped: 0 Tests run: 437, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-25 02:32:42 +00:00
Akira Ajisaka	201d734af3	HDFS-15428. Javadocs fails for hadoop-federation-balance. Contributed by Xieming Li.	2020-06-22 19:43:19 +09:00
Mehakmeet Singh	3472c3efc0	HADOOP-17065. Add Network Counters to ABFS (#2056 ) Contributed by Mehakmeet Singh.	2020-06-19 14:03:49 +01:00
Yiqun Lin	9cbd76cc77	HDFS-15346. FedBalance tool implementation. Contributed by Jinglun.	2020-06-18 13:33:25 +08:00
Thomas Marquardt	caf3995ac2	HADOOP-17076: ABFS: Delegation SAS Generator Updates Contributed by Thomas Marquardt. DETAILS: 1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client. 2) Add support and test cases for getXattr and setXAttr. 3) Update DelegationSASGenerator and related to use Duration instead of int for time periods. 4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions. 5) Cleanup SASGenerator classes to use String.equals instead of ==. TESTS: Added tests for getXAttr and setXAttr. All tests are passing against my account in eastus2euap: $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 76, Failures: 0, Errors: 0, Skipped: 0 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-18 02:07:08 +00:00

1 2 3 4 5 ...

1419 Commits