hadoop

Author	SHA1	Message	Date
Mehakmeet Singh	9e53ed3602	HADOOP-18528. Disable abfs prefetching by default (#5134 ) Disables block prefetching on ABFS InputStreams, by setting fs.azure.enable.readahead to false in core-default.xml and the matching java constant. This prevents HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests. Once a fix for that is committed, this change can be reverted. Contributed by Mehakmeet Singh.	2022-11-15 14:29:33 +00:00
Steve Loughran	b1ea32f91c	HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead (#5103 ) * HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Important: this does not address the critical data corruption issue HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests What is does do is provide a way to completely bypass the ReadBufferManager. To mitigate the problem, either fs.azure.enable.readahead needs to be set to false, or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken) ReadBufferManager code, but does't trigger the bug. For safe reading of files through the ABFS connector, readahead MUST be disabled or the followup fix to HADOOP-18521 applied Contributed by Steve Loughran	2022-11-08 13:41:31 +00:00
Sumangala Patki	2e4c5ca88f	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699 ) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki	2022-09-06 11:34:55 +01:00
sreeb-msft	5f3bc4340e	HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758 ) ITestAbfsManifestCommitProtocol to set requireRenameResilience to false for nonHNS configuration Contributed by Sree Bhattacharyya	2022-09-02 12:34:43 +01:00
Mehakmeet Singh	90b1e737d3	HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state (#4517 ) ABFS rename fails intermittently when the Storage-blob tracking metadata is in an incomplete state. This surfaces as the error code 404 and an error message of "RenameDestinationParentPathNotFound" To mitigate this issue, when a request fails with this response. the ABFS client issues a HEAD call on the source file and then retries the rename operation again ABFS filesystem statistics track when this occurs with new counters rename_recovery metadata_incomplete_rename_failures rename_path_attempts This is very rare occurrence and appears to be triggered under certain heavy load conditions, just as with HADOOP-18163. Contributed by Mehakmeet Singh.	2022-07-02 01:49:14 +05:30
Steve Loughran	cc204c9611	HADOOP-16202. Enhanced openFile(): hadoop-azure changes. (#2584/4) Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e	2022-04-27 19:24:33 +01:00
sumangala-patki	77eea7a11b	HADOOP-17682. ABFS: Support FileStatus input to OpenFileWithOptions() via OpenFileParameters (#2975 ) Change-Id: I039a0c3cb1c9b603f7dd1be0df03f795525d92bc	2022-04-27 19:22:49 +01:00
Steve Loughran	3238bdab89	HADOOP-18163. hadoop-azure support for the Manifest Committer of MAPREDUCE-7341 Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests * resilient rename * tests for job commit through the manifest committer. contains - HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls - HADOOP-16204. ABFS tests to include terasort Contributed by Steve Loughran. Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f	2022-03-17 11:47:15 +00:00
Steve Loughran	36a50ba3e0	HADOOP-18075. ABFS: Fix failure caused by listFiles() in ITestAbfsRestOperationException (#4040 ) Contributed by Sumangala Patki Change-Id: I245c08dab050d59b90ac6fdcb4c03153db77be0b	2022-03-01 13:48:39 +00:00
sumangala-patki	0ed0375413	HADOOP-17862. ABFS: Fix unchecked cast compiler warning for AbfsListStatusRemoteIterator (#3331 ) closes #3331 Contributed by Sumangala Patki Change-Id: I6cca91c8bcc34052c5233035f14a576f23086067	2022-03-01 13:48:39 +00:00
sumangala-patki	5e109705ef	HADOOP-17765. ABFS: Use Unique File Paths in Tests. (#3153 ) Contributed by Sumangala Patki Change-Id: Ic8f34bf578069504f7a811a7729982b9c9f49729	2022-03-01 12:29:03 +00:00
Sumangala Patki	a1319e2404	HADOOP-18071. ABFS: Set driver global timeout for ITestAzureBlobFileSystemBasics (#3866 ) Contributed by Sumangala Patki Change-Id: I05f0cd1f0bd277b90f06a71345c46bfde48d7e7e	2022-02-23 21:30:39 +00:00
Anmol Asrani	9b221b9599	HADOOP-18084. ABFS: Add testfilePath while verifying test contents are read correctly (#3903 ) Contributed by: Anmol Asrani Change-Id: I6e71bf349a74032f453398c7ae66f9c3305be190	2022-01-19 10:18:05 +00:00
Steve Loughran	67eaf5aa9f	HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633 ) Contributed by Steve Loughran Change-Id: I596205d788f623114c12962941445432e2036c34	2021-11-29 16:20:55 +00:00
Steve Loughran	e1267608ec	HADOOP-18002. ABFS rename idempotency broken -remove recovery (#3641 ) Cut modtime-based rename recovery as object modification time is not updated during rename operation. Applications will have to use etag API of HADOOP-17979 and implement it themselves. Why not do the HEAD and etag recovery in ABFS client? Cuts the IO capacity in half so kills job commit performance. The manifest committer of MAPREDUCE-7341 will do this recovery and act as the reference implementation of the algorithm. Contributed by: Steve Loughran Change-Id: I810054c9fd05041dac552f13d31fb15d7524721b	2021-11-17 11:53:34 +00:00
Steve Loughran	7b632dd22b	Revert "HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341 )" This reverts commit `0379aebafe`.	2021-11-05 14:22:07 +00:00
sumangala-patki	0379aebafe	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341 ) Addresses transient failures in the following test classes: * ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure * ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count closes #3341 Contributed by Sumangala Patki Change-Id: Ied4dec35c81e94efe5f999acae4bb8fde278202e	2021-11-04 15:57:42 +00:00
Anoop Sam John	913d06ad4d	HADOOP-17770 WASB : Support disabling buffered reads in positional reads (#3233 )	2021-10-22 11:45:42 +05:30
Mehakmeet Singh	8e5620cd9e	HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446 ) Addresses the problem of processes running out of memory when there are many ABFS output streams queuing data to upload, especially when the network upload bandwidth is less than the rate data is generated. ABFS Output streams now buffer their blocks of data to "disk", "bytebuffer" or "array", as set in "fs.azure.data.blocks.buffer" When buffering via disk, the location for temporary storage is set in "fs.azure.buffer.dir" For safe scaling: use "disk" (default); for performance, when confident that upload bandwidth will never be a bottleneck, experiment with the memory options. The number of blocks a single stream can have queued for uploading is set in "fs.azure.block.upload.active.blocks". The default value is 20. Contributed by Mehakmeet Singh.	2021-09-22 11:19:16 +01:00
sumangala-patki	dd30db78e7	HADOOP-17290. ABFS: Add Identifiers to Client Request Header (#2520 ) Contributed by Sumangala Patki. (cherry picked from commit `35570e414a`)	2021-09-21 16:45:51 +01:00
sumangala-patki	1cb9e747eb	HADOOP-17618. ABFS: Partially obfuscate SAS object IDs in Logs (#2845 ) Contributed by Sumangala Patki (cherry picked from commit `3450522c2f`)	2021-09-09 14:04:12 +01:00
Mukund Thakur	3b1c594355	HADOOP-17156. ABFS: Release the byte buffers held by input streams in close() (#3285 ) Contributed By: Mukund Thakur	2021-09-07 15:29:22 +05:30
Steve Loughran	26514b6534	HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. (#3240 ) This patch cuts down the size of directory trees used for distcp contract tests against object stores, so making them much faster against distant/slow stores. On abfs, the test only runs with -Dscale (as was the case for s3a already), and has the larger scale test timeout. After every test case, the FileSystem IOStatistics are logged, to provide information about what IO is taking place and what it's performance is. There are some test cases which upload files of 1+ MiB; you can increase the size of the upload in the option "scale.test.distcp.file.size.kb" Set it to zero and the large file tests are skipped. Contributed by Steve Loughran.	2021-08-02 12:58:37 +01:00
Brian Loss	37e0828e76	HADOOP-17811: ABFS ExponentialRetryPolicy doesn't pick up configuration values (#3221 ) Contributed by Brian Loss. Change-Id: I5f24196d1d02de91336c3679abaf8d55cfaed746	2021-08-02 11:37:33 +01:00
snehavarma	11825d30e8	HADOOP-17714 ABFS: testBlobBackCompatibility, testRandomRead & WasbAbfsCompatibility tests fail when triggered with default configs (#3035 ) (#3126 ) (cherry picked from commit `35e4c31fff`)	2021-07-12 11:53:46 +05:30
snehavarma	ab3809cf8d	HADOOP-17715 ABFS: Append blob tests with non HNS accounts fail (#3028 ) (#3125 ) (cherry picked from commit `4c039fafeb`)	2021-07-12 11:51:41 +05:30
sumangala-patki	aa6a9cac72	HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#3106 ) * HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#2795) . Contributed by Sumangala Patki. (cherry picked from commit `76d92eb2a2`)	2021-07-10 15:09:59 +05:30
Mukund Thakur	e8f9af6f2a	HADOOP-17250 Lot of short reads can be merged with readahead. (#3110 ) Introducing fs.azure.readahead.range parameter which can be set by the user. Data will be populated in buffer for random reads as well which leads to fewer remote calls. This patch also changes the seek implementation to perform a lazy seek. The actual seek is done when a read is initiated and data is not present in the buffer else data is returned from the buffer thus reducing the number of remote storage calls. Contributed By: Mukund Thakur Change-Id: Ib920eedd0087caa150afa4d4c23e89df56b29e83	2021-07-05 11:23:32 +01:00
Viraj Jasani	8f0ba9ee1b	HADOOP-17725. Improve error message for token providers in ABFS (#3041 ) Contributed by Viraj Jasani.	2021-06-08 22:05:01 +01:00
Mehakmeet Singh	a786847b8f	HADOOP-17670. S3AFS and ABFS to log IOStats at DEBUG mode or optionally at INFO level in close() (#2963 ) When the S3A and ABFS filesystems are closed, their IOStatistics are logged at debug in the log: org.apache.hadoop.fs.statistics.IOStatisticsLogging Set `fs.iostatistics.logging.level` to `info` for the statistics to be logged at info. (also: `warn` or `error` for even higher log levels). Contributed by: Mehakmeet Singh Change-Id: I56d44ad89fc1c0dd4baf701681834e7fd96c544f	2021-05-24 13:04:20 +01:00
sumangala-patki	b20bc668d5	HADOOP-17548. ABFS: Toggle Store Mkdirs request overwrite parameter (#2729 ) (#2781 ) Contributed by Sumangala Patki. (cherry picked from commit `fe633d4739`)	2021-05-10 11:50:01 +05:30
bilaharith	6649e5888b	HADOOP-17536. ABFS: Supporting customer provided encryption key (#2707 ) Contributed by bilahari t h Change-Id: I86216e755b81e9d14f5e87844d9fd58e8940560c	2021-04-27 13:16:33 +01:00
Mehakmeet Singh	389d3034c6	HADOOP-17471. ABFS to collect IOStatistics (#2731 ) (#2950 ) The ABFS Filesystem and its input and output streams now implement the IOStatisticSource interface and provide IOStatistics on their interactions with Azure Storage. This includes the min/max/mean durations of all REST API calls. Contributed by Mehakmeet Singh <mehakmeet.singh@cloudera.com>	2021-04-24 17:59:26 +01:00
Steve Loughran	77fddcfcb1	HADOOP-17535. ABFS: ITestAzureBlobFileSystemCheckAccess test failure if no oauth key. (#2920 ) Contributed by Steve Loughran. Change-Id: I165f5ed3a8486404403827b5c0338cf7f80c2bb1	2021-04-24 17:24:15 +01:00
billierinaldi	8170a7bb60	HADOOP-16948. Support infinite lease dirs (#1925 ). Contributed by Billie Rinaldi. (cherry picked from commit `c1fde4fe94`)	2021-04-20 14:36:54 -04:00
Steve Loughran	f30a0debae	HADOOP-17641. ITestWasbUriAndConfiguration failing. (#2937 ) This moves the mock account name --which is required to never exist-- from "mockAccount" to an account name containing a static UUID. Contributed by Steve Loughran.	2021-04-20 15:37:18 +01:00
sumangala-patki	8daa26d2e5	HADOOP-17576. ABFS: Disable throttling update for auth failures (#2761 ) (#2885 ) Contributed by Sumangala Patki (cherry picked from commit `6f640abbaf`)	2021-04-16 10:47:11 +05:30
sumangala-patki	cdaa64458d	HADOOP-17191. ABFS: Run the tests with various combinations of configurations and publish a consolidated results (#2597 ) Contributed by Bilahari T H and Sumangala Patki	2021-03-10 18:25:41 +00:00
sumangala-patki	7642ddcd6f	HADOOP-17537. ABFS: Correct assertion reversed in HADOOP-13327 Contributed Sumangala Patki.	2021-02-22 11:49:25 +00:00
Anoop Sam John	5857b781a3	HADOOP-17038 Support disabling buffered reads in ABFS positional reads. (#2646 ) - Contributed by @anoopsjohn Change-Id: Ibd11cc9d7aed0c2cc831a01e07d0a1595f7026fb	2021-02-22 11:46:35 +00:00
Steve Loughran	98e4d516ea	HADOOP-13327 Output Stream Specification. (#2587 ) This defines what output streams and especially those which implement Syncable are meant to do, and documents where implementations (HDFS; S3) don't. With tests. The file:// FileSystem now supports Syncable if an application calls FileSystem.setWriteChecksum(false) before creating a file -checksumming and Syncable.hsync() are incompatible. Contributed by Steve Loughran. Change-Id: I892d768de6268f4dd6f175b3fe3b7e5bcaa91194	2021-02-10 10:31:22 +00:00
bilaharith	35c93ef5f3	HADOOP-17475. ABFS : add high performance listStatusIterator (#2548 ) The ABFS connector now implements listStatusIterator() with asynchronous prefetching of the next page(s) of results. For listing large directories this can provide tangible speedups. If for any reason this needs to be disabled, set fs.azure.enable.abfslistiterator to false. Contributed by Bilahari T H. Change-Id: Ic9a52b80df1d0ffed4c81beae92c136e2a12698c	2021-02-04 13:37:36 +00:00
Steve Loughran	99337a4dd0	HADOOP-15710. ABFS checkException to map 403 to AccessDeniedException. (#2648 ) When 403 is returned from an ABFS HTTP call, an AccessDeniedException is raised. The exception text is unchanged, for any application string matching on the getMessage() contents. Contributed by Steve Loughran. Change-Id: I519d50ccd657968fd8ee72d132518099de901e15	2021-02-02 18:17:38 +00:00
Mehakmeet Singh	d20b2deac3	HADOOP-17272. ABFS Streams to support IOStatistics API (#2604 ) Contributed by Mehakmeet Singh. Change-Id: I3445dec84b9b9e43bb1e41f709944ea05416bd74	2021-01-22 14:21:31 +00:00
Sneha Vijayarajan	4865589bb4	HADOOP-17404. ABFS: Small write - Merge append and flush - Contributed by Sneha Vijayarajan (cherry picked from commit `b612c310c2`)	2021-01-22 10:48:04 +00:00
bilaharith	cb6729224e	HADOOP-17347. ABFS: Read optimizations - Contributed by Bilahari T H (cherry picked from commit `1448add08f`)	2021-01-22 10:48:04 +00:00
Sneha Vijayarajan	f3a0ca66c2	HADOOP-17407. ABFS: Fix NPE on delete idempotency flow - Contributed by Sneha Vijayarajan (cherry picked from commit `5ca1ea89b3`)	2021-01-22 10:48:04 +00:00
Sumangala	5f312a0d85	HADOOP-17422: ABFS: Set default ListMaxResults to max server limit (#2535 ) Contributed by Sumangala Patki TEST RESULTS: namespace.enabled=true auth.type=SharedKey ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 90, Failures: 0, Errors: 0, Skipped: 0 Tests run: 462, Failures: 0, Errors: 0, Skipped: 24 Tests run: 208, Failures: 0, Errors: 0, Skipped: 24 namespace.enabled=true auth.type=OAuth ------------------- $mvn -T 1C -Dparallel-tests=abfs -Dscale -DtestsThreadCount=8 clean verify Tests run: 90, Failures: 0, Errors: 0, Skipped: 0 Tests run: 462, Failures: 0, Errors: 0, Skipped: 70 Tests run: 208, Failures: 0, Errors: 0, Skipped: 141 (cherry picked from commit `a35fc3871b`)	2021-01-22 10:48:04 +00:00
Sneha Vijayarajan	a44890eb63	HADOOP-17296. ABFS: Force reads to be always of buffer size. Contributed by Sneha Vijayarajan. (cherry picked from commit `142941b96e`)	2021-01-22 10:48:04 +00:00
Ayush Saxena	8378ab9f92	HADOOP-17288. Use shaded guava from thirdparty. Contributed by Ayush Saxena. #2505	2020-12-10 05:50:55 +05:30

1 2 3 4 5 ...

262 Commits