hadoop

Author	SHA1	Message	Date
Steve Loughran	4423a7e736	HADOOP-16906. Abortable (#2684 ) Adds an Abortable.abort() interface for streams to enable output streams to be terminated; this is implemented by the S3A connector's output stream. It allows for commit protocols to be implemented which commit/abort work by writing to the final destination and using the abort() call to cancel any write which is not intended to be committed. Consult the specification document for information about the interface and its use. Contributed by Jungtaek Lim and Steve Loughran. Change-Id: I7fcc25e9dd8c10ce6c29f383529f3a2642a201ae	2021-02-17 11:29:19 +00:00
Steve Loughran	98e4d516ea	HADOOP-13327 Output Stream Specification. (#2587 ) This defines what output streams and especially those which implement Syncable are meant to do, and documents where implementations (HDFS; S3) don't. With tests. The file:// FileSystem now supports Syncable if an application calls FileSystem.setWriteChecksum(false) before creating a file -checksumming and Syncable.hsync() are incompatible. Contributed by Steve Loughran. Change-Id: I892d768de6268f4dd6f175b3fe3b7e5bcaa91194	2021-02-10 10:31:22 +00:00
bilaharith	35c93ef5f3	HADOOP-17475. ABFS : add high performance listStatusIterator (#2548 ) The ABFS connector now implements listStatusIterator() with asynchronous prefetching of the next page(s) of results. For listing large directories this can provide tangible speedups. If for any reason this needs to be disabled, set fs.azure.enable.abfslistiterator to false. Contributed by Bilahari T H. Change-Id: Ic9a52b80df1d0ffed4c81beae92c136e2a12698c	2021-02-04 13:37:36 +00:00
Steve Loughran	70411cb1f1	HADOOP-17337. S3A NetworkBinding has a runtime dependency on shaded httpclient. (#2599 ) Contributed by Steve Loughran. Change-Id: I0471322fc88d8bc3896ac439aefb31e6a856936c	2021-02-03 14:32:55 +00:00
Steve Loughran	99337a4dd0	HADOOP-15710. ABFS checkException to map 403 to AccessDeniedException. (#2648 ) When 403 is returned from an ABFS HTTP call, an AccessDeniedException is raised. The exception text is unchanged, for any application string matching on the getMessage() contents. Contributed by Steve Loughran. Change-Id: I519d50ccd657968fd8ee72d132518099de901e15	2021-02-02 18:17:38 +00:00
Steve Loughran	2d124f2f5e	HADOOP-17483. Magic committer is enabled by default. (#2656 ) * core-default.xml updated so that fs.s3a.committer.magic.enabled = true * CommitConstants updated to match * All tests which previously enabled the magic committer now rely on default settings. This helps make sure it is enabled. * Docs cover the switch, mention its enabled and explain why you may want to disable it. Note: this doesn't switch to using the committer -it just enables the path rewriting magic which it depends on. Contributed by Steve Loughran.	2021-01-27 19:05:07 +00:00
Steve Loughran	3e1eb16837	HADOOP-17493. Revert name of DELEGATION_TOKENS_ISSUED constant/statistic (#2649 ) Follow-on to HADOOP-16830/HADOOP-17271. Contributed by Steve Loughran. Change-Id: I16db6e788c9fd628d3295671d7c2861c249d5ef1	2021-01-27 16:40:27 +00:00
Steve Loughran	fb603e81f0	HADOOP-17414. Magic committer files don't have the count of bytes written collected by spark (#2530 ) This needs SPARK-33739 in the matching spark branch in order to work Contributed by Steve Loughran. Change-Id: I4fe75b057159e35aacc072da3cb7343467c0c3f1	2021-01-26 19:42:16 +00:00
Steve Loughran	bd85f6acea	HADOOP-17480. Document that AWS S3 is consistent and that S3Guard is not needed (#2636 ) Contributed by Steve Loughran. Change-Id: I775e3ee7b60665240ec621859c337b053f747a49	2021-01-25 13:24:34 +00:00
Mehakmeet Singh	d20b2deac3	HADOOP-17272. ABFS Streams to support IOStatistics API (#2604 ) Contributed by Mehakmeet Singh. Change-Id: I3445dec84b9b9e43bb1e41f709944ea05416bd74	2021-01-22 14:21:31 +00:00
Sneha Vijayarajan	4865589bb4	HADOOP-17404. ABFS: Small write - Merge append and flush - Contributed by Sneha Vijayarajan (cherry picked from commit `b612c310c2`)	2021-01-22 10:48:04 +00:00
bilaharith	cb6729224e	HADOOP-17347. ABFS: Read optimizations - Contributed by Bilahari T H (cherry picked from commit `1448add08f`)	2021-01-22 10:48:04 +00:00
Sneha Vijayarajan	f3a0ca66c2	HADOOP-17407. ABFS: Fix NPE on delete idempotency flow - Contributed by Sneha Vijayarajan (cherry picked from commit `5ca1ea89b3`)	2021-01-22 10:48:04 +00:00
Sumangala	5f312a0d85	HADOOP-17422: ABFS: Set default ListMaxResults to max server limit (#2535 ) Contributed by Sumangala Patki TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 90, Failures: 0, Errors: 0, Skipped: 0 Tests run: 462, Failures: 0, Errors: 0, Skipped: 24 Tests run: 208, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 90, Failures: 0, Errors: 0, Skipped: 0 Tests run: 462, Failures: 0, Errors: 0, Skipped: 70 Tests run: 208, Failures: 0, Errors: 0, Skipped: 141 (cherry picked from commit `a35fc3871b`)	2021-01-22 10:48:04 +00:00
Sneha Vijayarajan	d3caa1552b	Hadoop-17413. Release elastic byte buffer pool at close - Contributed by Sneha Vijayarajan (cherry picked from commit `5bf977e6b1`)	2021-01-22 10:48:04 +00:00
Sneha Vijayarajan	a44890eb63	HADOOP-17296. ABFS: Force reads to be always of buffer size. Contributed by Sneha Vijayarajan. (cherry picked from commit `142941b96e`)	2021-01-22 10:48:04 +00:00
Maksim Bober	763157dd12	HADOOP-17484. Typo in hadop-aws index.md (#2634 ) Contributed by Maksim Bober. Change-Id: Ic5196a64abc68566a3542e9ff96042593f081bdd	2021-01-21 17:32:03 +00:00
Steve Loughran	b645e58de2	HADOOP-17433. Skipping network I/O in S3A getFileStatus(/) breaks ITestAssumeRole. (#2600 ) Contributed by Steve Loughran. Change-Id: Iece617be78e80fc7e956074eddf171f7763a2e66	2021-01-19 17:20:28 +00:00
Steve Loughran	56576f080b	HADOOP-17451. IOStatistics test failures in S3A code. (#2594 ) Caused by HADOOP-16830 and HADOOP-17271. Fixes tests which fail intermittently based on configs and in the case of the HugeFile tests, bulk runs with existing FS instances meant statistic probes sometimes ended up probing those of a previous FS. Contributed by Steve Loughran. Change-Id: I65ba3f44444e59d298df25ac5c8dc5a8781dfb7d	2021-01-14 13:21:20 +00:00
Steve Loughran	240b25310e	HADOOP-17271. S3A connector to support IOStatistics. (#2580 ) S3A connector to support the IOStatistics API of HADOOP-16830, This is a major rework of the S3A Statistics collection to * Embrace the IOStatistics APIs * Move from direct references of S3AInstrumention statistics collectors to interface/implementation classes in new packages. * Ubiquitous support of IOStatistics, including: S3AFileSystem, input and output streams, RemoteIterator instances provided in list calls. * Adoption of new statistic names from hadoop-common Regarding statistic collection, as well as all existing statistics, the connector now records min/max/mean durations of HTTP GET and HEAD requests, and those of LIST operations. Contributed by Steve Loughran. Change-Id: I182d34b6ac39e017a8b4a221dad8e930882b39cf	2021-01-14 13:21:01 +00:00
bilaharith	8204ad9d5b	HADOOP-17459. ADLS Gen1: Fixes for rename contract tests #2607 Contributed by Bilaharith	2021-01-12 14:04:37 +00:00
yzhangal	adf6ca18b4	HADOOP-17338. Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc (#2497 ) Yongjun Zhang <yongjunzhang@pinterest.com> Change-Id: Ibbc6a39afb82de1208e6ed6a63ede224cc425466	2020-12-19 12:24:16 +00:00
Chao Sun	81e533de8f	HADOOP-16080. hadoop-aws does not work with hadoop-client-api. Contributed by Chao Sun (#2522 )	2020-12-12 09:37:13 -08:00
Akira Ajisaka	71bda1a2e8	HADOOP-17138. Fix spotbugs warnings surfaced after upgrade to 4.0.6. (#2155 ) (#2538 ) (cherry picked from commit `1b29c9bfee`) Co-authored-by: Masatake Iwasaki <iwasakims@apache.org>	2020-12-11 13:58:02 +09:00
Mukund Thakur	e4cab4b7a3	HADOOP-17186. Fixing javadoc in ListingOperationCallbacks (#2196 ) (cherry picked from commit `ac697571a1`)	2020-12-10 18:32:22 +09:00
Ayush Saxena	8378ab9f92	HADOOP-17288. Use shaded guava from thirdparty. Contributed by Ayush Saxena. #2505	2020-12-10 05:50:55 +05:30
Ankit Kumar	f04a9dfda1	YARN-10491. Fix deprecation warnings in SLSWebApp.java (#2519 ) Signed-off-by: Akira Ajisaka <ajisaka@apache.org> (cherry picked from commit `aaf9e3d320`)	2020-12-09 10:53:42 +09:00
Thomas Marquardt	a5695057b1	HADOOP-17397: ABFS: SAS Test updates for version and permission update DETAILS: The previous commit for HADOOP-17397 was not the correct fix. DelegationSASGenerator.getDelegationSAS should return sp=p for the set-permission and set-acl operations. The tests have also been updated as follows: 1. When saoid and suoid are not specified, skoid must have an RBAC role assignment which grants Microsoft.Storage/storageAccounts/blobServices/containers/blobs/modifyPermissions/action and sp=p to set permissions or set ACL. 2. When saoid or suiod is specified, same as 1) but furthermore the saoid or suoid must be an owner of the file or directory in order for the operation to succeed. 3. When saoid or suiod is specified, the ownership check is bypassed by also including 'o' (ownership) in the SAS permission (for example, sp=op). Note that 'o' grants the saoid or suoid the ability to change the file or directory owner to themself, and they can also change the owning group. Generally speaking, if a trusted authorizer would like to give a user the ability to change the permissions or ACL, then that user should be the file or directory owner. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 89, Failures: 0, Errors: 0, Skipped: 0 Tests run: 461, Failures: 0, Errors: 0, Skipped: 24 Tests run: 208, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 89, Failures: 0, Errors: 0, Skipped: 0 Tests run: 461, Failures: 0, Errors: 0, Skipped: 70 Tests run: 208, Failures: 0, Errors: 0, Skipped: 141	2020-12-03 14:31:06 +00:00
Mukund Thakur	3ef0e3d615	HADOOP-17398. Skipping network I/O in S3A getFileStatus(/) breaks some tests (#2493 ) Follow-on to HADOOP-17323. Contributed by Mukund Thakur.	2020-11-26 20:26:44 +00:00
Steve Loughran	1e59bf7394	HADOOP-17385. ITestS3ADeleteCost.testDirMarkersFileCreation failure (#2473 ). Contributed by Steve Loughran The addition of deprecated S3A configuration options in HADOOP-17318 triggered a reload of default (xml resource) configurations, which breaks tests which fail if there's a per-bucket setting inconsistent with test setup. Creating an S3AFS instance before creating the Configuration() instance for test runs gets that reload out the way before test setup takes place. Along with the fix, extra changes in the failing test suite to fail fast when marker policy isn't as expected, and to log FS state better. Rather than create and discard an instance, add a new static method to S3AFS and invoke it in test setup. This forces the load Change-Id: Id52b1c46912c6fedd2ae270e2b1eb2222a360329	2020-11-26 17:28:01 +00:00
Steve Loughran	1eeb9d9d67	HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. (#2399 ) See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs Contributed by Steve Loughran. Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a	2020-11-26 17:22:56 +00:00
Sneha Vijayarajan	c48c774d6c	HADOOP-17397. ABFS: SAS Test updates for version and permission update (#2492 ) Contributed by Sneha Vijayarajan. Change-Id: I89c1061b1efb1e3bef019dd22f221d03bf015929	2020-11-26 10:21:37 +00:00
Sneha Vijayarajan	39fa2c93c4	HADOOP-17396. ABFS: testRenameFileOverExistingFile fails (#2491 ) Contributed by Sneha Vijayarajan. Change-Id: I57a866b95ff18229caee8a6028874074a29cb5bd	2020-11-26 10:13:55 +00:00
Steve Loughran	1ef34d0819	HADOOP-17313. FileSystem.get to support slow-to-instantiate FS clients. (#2396 ) This adds a semaphore to throttle the number of FileSystem instances which can be created simultaneously, set in "fs.creation.parallel.count". This is designed to reduce the impact of many threads in an application calling FileSystem.get() on a filesystem which takes time to instantiate -for example to an object where HTTPS connections are set up during initialization. Many threads trying to do this may create spurious delays by conflicting for access to synchronized blocks, when simply limiting the parallelism diminishes the conflict, so speeds up all threads trying to access the store. The default value, 64, is larger than is likely to deliver any speedup -but it does mean that there should be no adverse effects from the change. If a service appears to be blocking on all threads initializing connections to abfs, s3a or store, try a smaller (possibly significantly smaller) value. Contributed by Steve Loughran. Change-Id: I57161b026f28349e339dc8b9d74f6567a62ce196	2020-11-25 14:55:29 +00:00
bilaharith	b8454a4b10	HADOOP-17311. ABFS: Logs should redact SAS signature (#2422 ) Contributed by bilaharith. Change-Id: Iff0ed4303ac5ce41b62bfda8150ee983dafa40be	2020-11-25 14:33:29 +00:00
Mukund Thakur	9dd74141a6	HADOOP-17323. S3A getFileStatus("/") to skip IO (#2479 ) Contributed by Mukund Thakur. Change-Id: I1709ad72b829999b6dd324f0755b51bc38918d30	2020-11-24 11:34:19 +00:00
Steve Loughran	38cc47d308	HADOOP-17332. S3A MarkerTool -min and -max are inverted. (#2425 ) This patch * fixes the inversion * adds a precondition check * if the commands are supplied inverted, swaps them with a warning. This is to stop breaking any tests written to cope with the existing behavior. Contributed by Steve Loughran Change-Id: I15c40863f0db0675c7d60db477cb3bf1693cae49	2020-11-23 21:49:33 +00:00
Steve Loughran	7ca539bc1b	HADOOP-17325. WASB Test Failures Contributed by Ayush Saxena and Steve Loughran Change-Id: I4bb76815bc1d11d1804dc67bafde68b6a995b974	2020-11-23 17:25:58 +00:00
Steve Loughran	e4bc64cce0	HADOOP-17343. Upgrade AWS SDK to 1.11.901 (#2468 ) Contributed by Steve Loughran.	2020-11-23 14:09:14 +00:00
Jungtaek Lim	401cadbac5	HADOOP-17388. AbstractS3ATokenIdentifier to issue date in UTC. (#2477 ) Followup to HADOOP-17379. Contributed by Jungtaek Lim. Change-Id: I7b2fce36028d297c1e095499691a08caba92d9fd	2020-11-20 10:56:57 +00:00
Jim Brennan	e24a6b550e	HADOOP-17367. Add InetAddress api to ProxyUsers.authorize (#2449 ). Contributed by Daryn Sharp and Ahmed Hussein	2020-11-19 21:26:47 +00:00
Steve Loughran	4687c25389	HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2310 ) This fixes the S3Guard/Directory Marker Retention integration so that when fs.s3a.directory.marker.retention=keep, failures during multipart delete are handled correctly, as are incremental deletes during directory tree operations. In both cases, when a directory marker with children is deleted from S3, the directory entry in S3Guard is not deleted, because it is still critical to representing the structure of the store. Contributed by Steve Loughran. Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f	2020-11-18 12:30:43 +00:00
Steve Loughran	4bb9d593da	HADOOP-17261. s3a rename() needs s3:deleteObjectVersion permission (#2303 ) Contributed by Steve Loughran. Change-Id: I8e89a402a24bd9fb958e0fa93d1a28191093851d	2020-11-18 12:20:12 +00:00
Jungtaek Lim	22039a14ff	HADOOP-17379. AbstractS3ATokenIdentifier to set issue date == now. (#2466 ) Unless you explicitly set it, the issue date of a delegation token identifier is 0, which confuses spark renewal (SPARK-33440). This patch makes sure that all S3A DT identifiers have the current time as issue date, fixing the problem as far as S3A tokens are concerned. Contributed by Jungtaek Lim. Change-Id: Ic80ac7895612a1aa669459c73a78a9c17ecf0c0d	2020-11-17 14:56:58 +00:00
Doroszlai, Attila	bf2ff35a04	HADOOP-17376. ITestS3AContractRename failing against stricter tests. (#2462 ) Contributed by Attila Doroszlai. Change-Id: Ie15624ec07b1c5e34ca7fde0a72a54431d79e746	2020-11-16 11:26:06 +00:00
Eric E Payne	2473e8b711	YARN-10475: Scale RM-NM heartbeat interval based on node utilization. Contributed by Jim Brennan (Jim_Brennan).	2020-11-02 17:16:28 +00:00
Anoop Sam John	8312f230eb	HADOOP-17308. WASB PageBlobOutputStream.flush succeeds even when flush to storage fails (#2392 ) Contributed by Anoop Sam John.	2020-10-26 13:31:53 +00:00
Sneha Vijayarajan	d5b4d04b0d	HADOOP-17301. ABFS: read-ahead error reporting breaks buffer management (#2369 ) Fixes read-ahead buffer management issues introduced by HADOOP-16852, "ABFS: Send error back to client for Read Ahead request failure". Contributed by Sneha Vijayarajan	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	da5db6a5a6	HADOOP-17279: ABFS: testNegativeScenariosForCreateOverwriteDisabled fails for non-HNS account. Contributed by Sneha Vijayarajan Testing: namespace.enabled=false auth.type=SharedKey $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 246 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=SharedKey $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 33 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 74 Tests run: 207, Failures: 0, Errors: 0, Skipped: 140	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	d166420302	HADOOP-17215: Support for conditional overwrite. Contributed by Sneha Vijayarajan DETAILS: This change adds config key "fs.azure.enable.conditional.create.overwrite" with a default of true. When enabled, if create(path, overwrite: true) is invoked and the file exists, the ABFS driver will first obtain its etag and then attempt to overwrite the file on the condition that the etag matches. The purpose of this is to mitigate the non-idempotency of this method. Specifically, in the event of a network error or similar, the client will retry and this can result in the file being created more than once which may result in data loss. In essense this is like a poor man's file handle, and will be addressed more thoroughly in the future when support for lease is added to ABFS. TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 42 Tests run: 207, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 87, Failures: 0, Errors: 0, Skipped: 0 Tests run: 457, Failures: 0, Errors: 0, Skipped: 74 Tests run: 207, Failures: 0, Errors: 0, Skipped: 140	2020-10-14 22:29:13 +00:00
bilaharith	f208da286c	HADOOP-17166. ABFS: configure output stream thread pool (#2179 ) Adds the options to control the size of the per-output-stream threadpool when writing data through the abfs connector * fs.azure.write.max.concurrent.requests * fs.azure.write.max.requests.to.queue Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	cc7350302f	HADOOP-16915. ABFS: Ignoring the test ITestAzureBlobFileSystemRandomRead.testRandomReadPerformance - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
Sneha Vijayarajan	4072323de4	Upgrade store REST API version to 2019-12-12 - Contributed by Sneha Vijayarajan	2020-10-14 22:29:13 +00:00
bilaharith	e481d0108a	HADOOP-17149. ABFS: Fixing the testcase ITestGetNameSpaceEnabled - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	f73c90f0b0	HADOOP-17163. ABFS: Adding debug log for rename failures - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
bilaharith	fbf151ef6f	HADOOP-17137. ABFS: Makes the test cases in ITestAbfsNetworkStatistics agnostic - Contributed by Bilahari T H	2020-10-14 22:29:13 +00:00
Dongjoon Hyun	5032f8abba	HADOOP-17258. Magic S3Guard Committer to overwrite existing pendingSet file on task commit (#2371 ) Contributed by Dongjoon Hyun and Steve Loughran Change-Id: Ibaf8082e60eff5298ff4e6513edc386c5bae0274	2020-10-12 13:42:08 +01:00
Steve Loughran	963793dd48	HADOOP-17293. S3A to always probe S3 in S3A getFileStatus on non-auth paths This reverts changes in HADOOP-13230 to use S3Guard TTL in choosing when to issue a HEAD request; fixing tests to compensate. New org.apache.hadoop.fs.s3a.performance.OperationCost cost, S3GUARD_NONAUTH_FILE_STATUS_PROBE for use in cost tests. Contributed by Steve Loughran. Change-Id: I418d55d2d2562a48b2a14ec7dee369db49b4e29e	2020-10-08 15:38:32 +01:00
Mukund Thakur	475dba1ddf	HADOOP-17281 Implement FileSystem.listStatusIterator() in S3AFileSystem (#2354 ) Contains HADOOP-17300: FileSystem.DirListingIterator.next() call should return NoSuchElementException Contributed by Mukund Thakur Change-Id: I4e7e5c6e295525db9e2de6f416f32bbb81e146d3	2020-10-07 14:00:23 +01:00
bilaharith	d80dfad900	HADOOP-17183. ABFS: Enabling checkaccess on ABFS (#2331 ) Contributed by Bilahari TH Change-Id: If4224697deed733d6db44145994cdd85547c27d1	2020-10-01 21:29:48 +01:00
Mukund Thakur	7e642ec5a3	HADOOP-17023. Tune S3AFileSystem.listStatus() (#2257 ) S3AFileSystem.listStatus() is optimized for invocations where the path supplied is a non-empty directory. The number of S3 requests is significantly reduced, saving time, money, and reducing the risk of S3 throttling. Contributed by Mukund Thakur. Change-Id: I7cc5f87aa16a4819e245e0fbd2aad226bd500f3f	2020-09-21 17:30:15 +01:00
Steve Loughran	aa80bcb1ec	Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 )" This reverts commit `0c82eb0324`. Change-Id: I6bd100d9de19660b0f28ee0ab16faf747d6d9f05	2020-09-11 18:07:05 +01:00
Steve Loughran	0c82eb0324	HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280 ) This changes directory tree deletion so that only files are incrementally deleted from S3Guard after the objects are deleted; the directories are left alone until metadataStore.deleteSubtree(path) is invoke. This avoids directory tombstones being added above files/child directories, which stop the treewalk and delete phase from working. Also: * Callback to delete objects splits files and dirs so that any problems deleting the dirs doesn't trigger s3guard updates * New statistic to measure #of objects deleted, alongside request count. * Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers to clarify behavior. * Test enhancements to replicate the failure and verify the fix Contributed by Steve Loughran Change-Id: I0e6ea2c35e487267033b1664228c8837279a35c7	2020-09-10 17:29:33 +01:00
Mehakmeet Singh	ccceec8af0	HADOOP-17158. Test timeout for ITestAbfsInputStreamStatistics#testReadAheadCounters (#2272 ) Contributed by: Mehakmeet Singh. Change-Id: I7ebfa5cd1b5d25f7a750f0c645d7d93c81e89240	2020-09-08 14:02:28 +01:00
Mehakmeet Singh	28f1ded9fe	HADOOP-17113. Adding ReadAhead Counters in ABFS (#2154 ) Contributed by Mehakmeet Singh Change-Id: I6bbd8165385a9267ed64831bb1efa18b6554feb1	2020-09-08 14:02:02 +01:00
Mehakmeet Singh	7970710418	HADOOP-17229. No update of bytes received counter value after response failure occurs in ABFS (#2264 ) Contributed by Mehakmeet Singh Change-Id: Ia9ad1b87a460b10d27486bd00ee67c3cedd2b5b5	2020-09-08 13:26:24 +01:00
Mukund Thakur	5236c96ead	HADOOP-17167 ITestS3AEncryptionWithDefaultS3Settings failing (#2187 ) Now skips ITestS3AEncryptionWithDefaultS3Settings.testEncryptionOverRename when server side encryption is not set to sse:kms Contributed by Mukund Thakur Change-Id: Ifd83d353e9c7c6f7e1195a2c2f138d85cf876bb1	2020-09-04 15:00:30 +01:00
Steve Loughran	38354006f8	HADOOP-17227. S3A Marker Tool tuning (#2254 ) Contributed by Steve Loughran.	2020-09-04 14:58:54 +01:00
Mehakmeet Singh	f6e1ed4f6b	HADOOP-17194. Adding Context class for AbfsClient in ABFS (#2216 ) Contributed by Mehakmeet Singh. Change-Id: I120c9a068d758d8e5d071c878a3b7fbeb95e4de6	2020-08-27 11:28:37 +01:00
Mukund Thakur	0840c0c1f3	HADOOP-17074. S3A Listing to be fully asynchronous. (#2207 ) Contributed by Mukund Thakur. Change-Id: I1b0574a0c9ebc0805f285dd5280a00e5add081f1	2020-08-25 11:30:42 +01:00
swamirishi	ba4f7fb332	HADOOP-17122: Preserving Directory Attributes in DistCp with Atomic Copy (#2133 ) Contributed by Swaminathan Balachandran Change-Id: I86f956dd4ab0b278d923fe7b70037e6b929a8aa1	2020-08-22 18:51:10 +01:00
Steve Loughran	49f8ae965e	HADOOP-13230. S3A to optionally retain directory markers. This adds an option to disable "empty directory" marker deletion, so avoid throttling and other scale problems. This feature is not backwards compatible. Consult the documentation and use with care. Contributed by Steve Loughran. Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958	2020-08-15 20:19:49 +01:00
Mukund Thakur	571737f4ac	HADOOP-17192. ITestS3AHugeFilesSSECDiskBlock failing (#2221 ) Contributed by Mukund Thakur	2020-08-13 14:33:27 +01:00
Ayush Saxena	2943e6650f	HDFS-15514. Remove useless dfs.webhdfs.enabled. Contributed by Fei Hui.	2020-08-07 22:20:42 +05:30
Mukund Thakur	251d2d1fa5	HADOOP-17131. Refactor S3A Listing code for better isolation. (#2148 ) Contributed by Mukund Thakur. Change-Id: I79160b236a92fdd67565a4b4974f1862e600c210	2020-08-04 17:13:06 +01:00
Sneha Vijayarajan	18ca80331c	Hadoop 17132. ABFS: Fix Rename and Delete Idempotency check trigger - Contributed by Sneha Vijayarajan	2020-07-25 13:13:18 +00:00
ishaniahuja	f24e2ec487	HADOOP-17058. ABFS: Support for AppendBlob in Hadoop ABFS Driver - Contributed by Ishani Ahuja	2020-07-25 13:12:32 +00:00
Mehakmeet Singh	7c9b459786	HADOOP-16961. ABFS: Adding metrics to AbfsInputStream (#2076 ) Contributed by Mehakmeet Singh.	2020-07-25 13:12:09 +00:00
Mehakmeet Singh	bbd3278d09	HADOOP-17065. Add Network Counters to ABFS (#2056 ) Contributed by Mehakmeet Singh.	2020-07-25 13:11:34 +00:00
Karthik Amarnath	8b7e77443d	HDFS-15168: ABFS enhancement to translate AAD to Linux identities. (#1978 )	2020-07-25 13:10:39 +00:00
Sneha Vijayarajan	903935da0f	HADOOP-17053. ABFS: Fix Account-specific OAuth config setting parsing Contributed by Sneha Vijayarajan	2020-07-25 13:10:30 +00:00
Sneha Vijayarajan	869a68b81e	HADOOP-16852: Report read-ahead error back Contributed by Sneha Vijayarajan	2020-07-25 13:10:19 +00:00
Sneha Vijayarajan	27b20f9689	HADOOP-17054. ABFS: Fix test AbfsClient authentication instance Contributed by Sneha Vijayarajan	2020-07-25 13:09:26 +00:00
Sneha Vijayarajan	eed06b46eb	Hadoop-17015. ABFS: Handling Rename and Delete idempotency Contributed by Sneha Vijayarajan.	2020-07-25 13:08:01 +00:00
bilaharith	1ae72d2438	HADOOP-17092. ABFS: Making AzureADAuthenticator.getToken() throw HttpException - Contributed by Bilahari T H Change-Id: Id9576d9509faaf057bf419ccb1879ac0cef7a07b	2020-07-22 18:26:36 +01:00
Ayush Saxena	e3b8d4eb05	HADOOP-17100. Replace Guava Supplier with Java8+ Supplier in Hadoop. Contributed by Ahmed Hussein.	2020-07-22 18:21:14 +05:30
Steve Loughran	5aa9396a58	HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118 ) Contributed by Steve Loughran. Change-Id: I972264aed36f384b7ae23e214326ef7870261cf5	2020-07-20 10:54:22 +01:00
bilaharith	e01852181a	HADOOP-16682. ABFS: Removing unnecessary toString() invocations - Contributed by Bilahari T H Change-Id: Id55495b44d81533d1d3654de2553c709f505f7eb	2020-07-20 10:53:59 +01:00
Mehakmeet Singh	0d88ed2794	HADOOP-17129. Validating storage keys in ABFS correctly (#2141 ) Contributed by Mehakmeet Singh Change-Id: I8016ee2f9ffbc86ea867f4a3d960b134e507d099	2020-07-16 18:11:52 +01:00
Mukund Thakur	8b601ad7e6	HADOOP-17022. Tune S3AFileSystem.listFiles() API. Contributed by Mukund Thakur. Change-Id: I17f5cfdcd25670ce3ddb62c13378c7e2dc06ba52	2020-07-14 15:28:27 +01:00
Anoop Sam John	cac2fc1f58	HADOOP-16998. WASB : NativeAzureFsOutputStream#close() throwing IllegalArgumentException (#2073 ) Contributed by Anoop Sam John.	2020-07-14 14:08:46 +01:00
jimmy-zuber-amzn	79fc58def3	HADOOP-17105. S3AFS - Do not attempt to resolve symlinks in globStatus (#2113 ) Contributed by Jimmy Zuber. Change-Id: I2f247c2d2ab4f38214073e55f5cfbaa15aeaeb11	2020-07-13 19:09:50 +01:00
Steve Loughran	a51d72f0c6	HDFS-13934. Multipart uploaders to be created through FileSystem/FileContext. Contributed by Steve Loughran. Change-Id: Iebd34140c1a0aa71f44a3f4d0fee85f6bdf123a3	2020-07-13 13:32:04 +01:00
Sebastian Nagel	f9619b0b97	HADOOP-17117 Fix typos in hadoop-aws documentation (#2127 ) (cherry picked from commit `5b1ed2113b`)	2020-07-09 00:04:46 +09:00
bilaharith	19fb204011	HADOOP-17086. ABFS: Making the ListStatus response ignore unknown properties. (#2101 ) Contributed by Bilahari T H. Change-Id: I82e4683fba8481aef2abab7a6a99e5752f6fffa9	2020-07-03 19:02:21 +01:00
Steve Loughran	7de1ac0547	HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963 ) Contributed by Steve Loughran. Fixes a condition which can cause job commit to fail if a task was aborted < 60s before the job commit commenced: the task abort will shut down the thread pool with a hard exit after 60s; the job commit POST requests would be scheduled through the same pool, so be interrupted and fail. At present the access is synchronized, but presumably the executor shutdown code is calling wait() and releasing locks. Task abort is triggered from the AM when task attempts succeed but there are still active speculative task attempts running. Thus it only surfaces when speculation is enabled and the final tasks are speculating, which, given they are the stragglers, is not unheard of. Note: this problem has never been seen in production; it has surfaced in the hadoop-aws tests on a heavily overloaded desktop Change-Id: I3b433356d01fcc50d88b4353dbca018484984bc8	2020-06-30 10:52:56 +01:00
Thomas Marquardt	ee192c4826	HADOOP-17089: WASB: Update azure-storage-java SDK Contributed by Thomas Marquardt DETAILS: WASB depends on the Azure Storage Java SDK. There is a concurrency bug in the Azure Storage Java SDK that can cause the results of a list blobs operation to appear empty. This causes the Filesystem listStatus and similar APIs to return empty results. This has been seen in Spark work loads when jobs use more than one executor core. See Azure/azure-storage-java#546 for details on the bug in the Azure Storage SDK. TESTS: A new test was added to validate the fix. All tests are passing: wasb: mvn -T 1C -Dparallel-tests=wasb -Dscale -DtestsThreadCount=8 clean verify Tests run: 248, Failures: 0, Errors: 0, Skipped: 11 Tests run: 651, Failures: 0, Errors: 0, Skipped: 65 abfs: mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 64, Failures: 0, Errors: 0, Skipped: 0 Tests run: 437, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-25 05:43:32 +00:00
Thomas Marquardt	63d236c019	HADOOP-17076: ABFS: Delegation SAS Generator Updates Contributed by Thomas Marquardt. DETAILS: 1) The authentication version in the service has been updated from Dec19 to Feb20, so need to update the client. 2) Add support and test cases for getXattr and setXAttr. 3) Update DelegationSASGenerator and related to use Duration instead of int for time periods. 4) Cleanup DelegationSASGenerator switch/case statement that maps operations to permissions. 5) Cleanup SASGenerator classes to use String.equals instead of ==. TESTS: Added tests for getXAttr and setXAttr. All tests are passing against my account in eastus2euap: $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 76, Failures: 0, Errors: 0, Skipped: 0 Tests run: 441, Failures: 0, Errors: 0, Skipped: 33 Tests run: 206, Failures: 0, Errors: 0, Skipped: 24	2020-06-19 19:19:31 +00:00
bilaharith	d639c11986	HADOOP-17004. Fixing a formatting issue Contributed by Bilahari T H.	2020-06-19 19:11:06 +00:00
bilaharith	11307f3be9	HADOOP-17004. ABFS: Improve the ABFS driver documentation Contributed by Bilahari T H.	2020-06-19 19:10:22 +00:00

1 2 3 4 5 ...

1506 Commits