hadoop

Author	SHA1	Message	Date
sreeb-msft	f324efd247	HADOOP-18012. ABFS: Enable config controlled ETag check for Rename idempotency (#5488 ) To support recovery of network failures during rename, the abfs client fetches the etag of the source file, and when recovering from a failure, uses this tag to determine whether the rename succeeded before the failure happened. * This works for files, but not directories * It adds the overhead of a HEAD request before each rename. * The option can be disabled by setting "fs.azure.enable.rename.resilience" to false Contributed by Sree Bhattacharyya	2023-04-05 15:07:39 +01:00
Pranav Saxena	054afa1180	HADOOP-18647. x-ms-client-request-id to identify the retry of an API. (#5437 ) The x-ms-client-request-id now includes a field to indicate a call is a retry of a previous operation Contributed by Pranav Saxena	2023-03-30 14:26:12 +01:00
Anmol Asrani	6306f5b2bc	HADOOP-18146: ABFS: Added changes for expect hundred continue header #4039 This change lets the client react pre-emptively to server load without getting to 503 and the exponential backoff which follows. This stops performance suffering so much as capacity limits are approached for an account. Contributed by Anmol Asranii	2023-03-28 16:32:01 +01:00
Pranav Saxena	2b156c2b32	HADOOP-18606. ABFS: Add reason in x-ms-client-request-id on a retried API call. (#5299 ) Contributed by Pranav Saxena	2023-03-28 12:00:57 +01:00
Steve Loughran	b75ced1e5d	HADOOP-17836. Improve logging on ABFS error reporting (#3281 ) Contributed by Steve Loughran.	2023-03-08 15:31:16 +00:00
Steve Loughran	c59444b160	HADOOP-18577. Followup: javadoc fix (#5232 ) Fixes a javadoc error which came with HADOOP-18577. ABFS: Add probes of readahead fix (#5205) Part of the HADOOP-18521 ABFS readahead fix; MUST be included. Contributed by Steve Loughran	2022-12-18 12:20:41 +00:00
Steve Loughran	daa33aafff	HADOOP-18577. ABFS: Add probes of readahead fix (#5205 ) Followup patch to HADOOP-18456 as part of HADOOP-18521, ABFS ReadBufferManager buffer sharing across concurrent HTTP requests Add probes of readahead fix aid in checking safety of hadoop ABFS client across different releases. * ReadBufferManager constructor logs the fact it is safe at TRACE * AbfsInputStream declares it is fixed in toString() by including fs.azure.capability.readahead.safe" in the result. The ABFS FileSystem hasPathCapability("fs.azure.capability.readahead.safe") probe returns true to indicate the client's readahead manager has been fixed to be safe when prefetching. All Hadoop releases for which probe this returns false and for which the probe "fs.capability.etags.available" returns true at risk of returning invalid data when reading ADLS Gen2/Azure storage data. Contributed by Steve Loughran.	2022-12-15 17:11:22 +00:00
Pranav Saxena	50a0f33cc9	HADOOP-18546. ABFS. disable purging list of in progress reads in abfs stream close() (#5176 ) This addresses HADOOP-18521, "ABFS ReadBufferManager buffer sharing across concurrent HTTP requests" by not trying to cancel in progress reads. It supercedes HADOOP-18528, which disables the prefetching. If that patch is applied after this one, prefetching will be disabled. As well as changing the default value in the code, core-default.xml is updated to set fs.azure.enable.readahead = true As a result, if Configuration.get("fs.azure.enable.readahead") returns a non-null value, then it can be inferred that it was set in or core-default.xml (the fix is present) or in core-site.xml (someone asked for it). Note: this commit contains the followup commit: That is needed to avoid race conditions in the test. Contributed by Pranav Saxena.	2022-12-09 13:49:14 +00:00
Anmol Asrani	1cc8cb68f2	HADOOP-18457. ABFS: Support account level throttling (#5034 ) This allows abfs request throttling to be shared across all abfs connections talking to containers belonging to the same abfs storage account -as that is the level at which IO throttling is applied. The option is enabled/disabled in the configuration option "fs.azure.account.throttling.enabled"; The default is "true" Contributed by Anmol Asrani	2022-11-30 13:14:11 +00:00
sreeb-msft	00249619a0	HADOOP-18498. ABFS: Remove unwanted ? prefix from SAS Tokens (#5136 ) This commit parses SAS Tokens and removes the unwanted prefix of '?' from them, if present. At present, SAS Tokens are provided to the driver through customer implementations of the SASTokenProvider interface. The SAS token providers should not assume that the token will be the first query parameter in the URIs that communicate with the backend. However, it was observed that certain public interfaces provided by Storage to generate SAS can include the '?' as the first character of the SAS Token, which would ideally be the case when it is the first query parameter. Thus, tokens that contain this prefix will lead to an error in the driver due to a clash of query parameters. To avoid failures for use of such SAS tokens, after receiving the SAS Token from the provider, the code checks for whether any ? prefix is present or not. If yes, it is removed before further usage of the token. This way, users would not have to manually remove the prefix before passing it on as a configuration. Contributed by Sree Bhattacharya	2022-11-28 11:40:06 +00:00
Mehakmeet Singh	9e53ed3602	HADOOP-18528. Disable abfs prefetching by default (#5134 ) Disables block prefetching on ABFS InputStreams, by setting fs.azure.enable.readahead to false in core-default.xml and the matching java constant. This prevents HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests. Once a fix for that is committed, this change can be reverted. Contributed by Mehakmeet Singh.	2022-11-15 14:29:33 +00:00
Steve Loughran	b1ea32f91c	HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead (#5103 ) * HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead Adds new config option to turn off readahead * also allows it to be passed in through openFile(), * extends ITestAbfsReadWriteAndSeek to use the option, including one replicated test...that shows that turning it off is slower. Important: this does not address the critical data corruption issue HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests What is does do is provide a way to completely bypass the ReadBufferManager. To mitigate the problem, either fs.azure.enable.readahead needs to be set to false, or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken) ReadBufferManager code, but does't trigger the bug. For safe reading of files through the ABFS connector, readahead MUST be disabled or the followup fix to HADOOP-18521 applied Contributed by Steve Loughran	2022-11-08 13:41:31 +00:00
PJ Fanning	ea851c5e4a	HADOOP-15983. Use jersey-json that is built to use jackson2 ((#3988 ) Moves from com.sun.jersey 1.19 to the artifact com.github.pjfanning:jersey-json:1.20 This allows jackson 1 to be removed from the classpath. Contains * HADOOP-16908. Prune Jackson 1 from the codebase and restrict its usage for future * HADOOP-18219. Fix shaded client test failure These are needed for the HADOOP-15983 changes to build. Contributed by PJ Fanning.	2022-10-20 17:37:56 +01:00
Steve Loughran	7a18ceb269	HADOOP-18476. Abfs and S3A FileContext bindings to close wrapped filesystems in finalizer (#4966 ) This is to try and close the underlying filesystems when the FileContext APIs are used. Without this, threads may be leaked Contributed by Steve Loughran	2022-10-18 15:28:55 +01:00
Sumangala Patki	2e4c5ca88f	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699 ) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki	2022-09-06 11:34:55 +01:00
sreeb-msft	5f3bc4340e	HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758 ) ITestAbfsManifestCommitProtocol to set requireRenameResilience to false for nonHNS configuration Contributed by Sree Bhattacharyya	2022-09-02 12:34:43 +01:00
Mehakmeet Singh	90b1e737d3	HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state (#4517 ) ABFS rename fails intermittently when the Storage-blob tracking metadata is in an incomplete state. This surfaces as the error code 404 and an error message of "RenameDestinationParentPathNotFound" To mitigate this issue, when a request fails with this response. the ABFS client issues a HEAD call on the source file and then retries the rename operation again ABFS filesystem statistics track when this occurs with new counters rename_recovery metadata_incomplete_rename_failures rename_path_attempts This is very rare occurrence and appears to be triggered under certain heavy load conditions, just as with HADOOP-18163. Contributed by Mehakmeet Singh.	2022-07-02 01:49:14 +05:30
Steve Loughran	cc204c9611	HADOOP-16202. Enhanced openFile(): hadoop-azure changes. (#2584/4) Stops the abfs connector warning if openFile().withFileStatus() is invoked with a FileStatus is not an abfs VersionedFileStatus. Contributed by Steve Loughran. Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e	2022-04-27 19:24:33 +01:00
sumangala-patki	77eea7a11b	HADOOP-17682. ABFS: Support FileStatus input to OpenFileWithOptions() via OpenFileParameters (#2975 ) Change-Id: I039a0c3cb1c9b603f7dd1be0df03f795525d92bc	2022-04-27 19:22:49 +01:00
Steve Loughran	3238bdab89	HADOOP-18163. hadoop-azure support for the Manifest Committer of MAPREDUCE-7341 Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests * resilient rename * tests for job commit through the manifest committer. contains - HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls - HADOOP-16204. ABFS tests to include terasort Contributed by Steve Loughran. Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f	2022-03-17 11:47:15 +00:00
Steve Loughran	36a50ba3e0	HADOOP-18075. ABFS: Fix failure caused by listFiles() in ITestAbfsRestOperationException (#4040 ) Contributed by Sumangala Patki Change-Id: I245c08dab050d59b90ac6fdcb4c03153db77be0b	2022-03-01 13:48:39 +00:00
sumangala-patki	0ed0375413	HADOOP-17862. ABFS: Fix unchecked cast compiler warning for AbfsListStatusRemoteIterator (#3331 ) closes #3331 Contributed by Sumangala Patki Change-Id: I6cca91c8bcc34052c5233035f14a576f23086067	2022-03-01 13:48:39 +00:00
sumangala-patki	5e109705ef	HADOOP-17765. ABFS: Use Unique File Paths in Tests. (#3153 ) Contributed by Sumangala Patki Change-Id: Ic8f34bf578069504f7a811a7729982b9c9f49729	2022-03-01 12:29:03 +00:00
Sumangala Patki	a1319e2404	HADOOP-18071. ABFS: Set driver global timeout for ITestAzureBlobFileSystemBasics (#3866 ) Contributed by Sumangala Patki Change-Id: I05f0cd1f0bd277b90f06a71345c46bfde48d7e7e	2022-02-23 21:30:39 +00:00
Anmol Asrani	9b221b9599	HADOOP-18084. ABFS: Add testfilePath while verifying test contents are read correctly (#3903 ) Contributed by: Anmol Asrani Change-Id: I6e71bf349a74032f453398c7ae66f9c3305be190	2022-01-19 10:18:05 +00:00
Steve Loughran	8ccc586af6	HADOOP-17409. Remove s3guard from S3A module (#3534 ) Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran.	2022-01-18 18:04:48 +00:00
Anoop Sam John	9a1c8d2f41	HADOOP-17643 WASB : Make metadata checks case insensitive (#3103 )	2021-12-10 10:44:31 +05:30
Steve Loughran	67eaf5aa9f	HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633 ) Contributed by Steve Loughran Change-Id: I596205d788f623114c12962941445432e2036c34	2021-11-29 16:20:55 +00:00
Steve Loughran	e1267608ec	HADOOP-18002. ABFS rename idempotency broken -remove recovery (#3641 ) Cut modtime-based rename recovery as object modification time is not updated during rename operation. Applications will have to use etag API of HADOOP-17979 and implement it themselves. Why not do the HEAD and etag recovery in ABFS client? Cuts the IO capacity in half so kills job commit performance. The manifest committer of MAPREDUCE-7341 will do this recovery and act as the reference implementation of the algorithm. Contributed by: Steve Loughran Change-Id: I810054c9fd05041dac552f13d31fb15d7524721b	2021-11-17 11:53:34 +00:00
Steve Loughran	7b632dd22b	Revert "HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341 )" This reverts commit `0379aebafe`.	2021-11-05 14:22:07 +00:00
sumangala-patki	689dd7bf17	HADOOP-17863. ABFS: Fix compiler deprecation warning in TextFileBasedIdentityHandler (#3332 ) Closes #3332 Contributed by Sumangala Patki Change-Id: I2abd33bd62bb734a431cccfc50a52bdeb2bf7db6	2021-11-05 12:55:45 +00:00
sumangala-patki	0379aebafe	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341 ) Addresses transient failures in the following test classes: * ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure * ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count closes #3341 Contributed by Sumangala Patki Change-Id: Ied4dec35c81e94efe5f999acae4bb8fde278202e	2021-11-04 15:57:42 +00:00
Anoop Sam John	913d06ad4d	HADOOP-17770 WASB : Support disabling buffered reads in positional reads (#3233 )	2021-10-22 11:45:42 +05:30
Josh Elser	feeaebeb84	HADOOP-17934. ABFS: Make sure the AbfsHttpOperation is non-null before using it (#3477 ) Contributed by: Josh Elser Change-Id: I24a2e0322d8cae2d72d65c7f3d8a74580a418317	2021-10-04 20:54:39 +01:00
Mehakmeet Singh	8e5620cd9e	HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446 ) Addresses the problem of processes running out of memory when there are many ABFS output streams queuing data to upload, especially when the network upload bandwidth is less than the rate data is generated. ABFS Output streams now buffer their blocks of data to "disk", "bytebuffer" or "array", as set in "fs.azure.data.blocks.buffer" When buffering via disk, the location for temporary storage is set in "fs.azure.buffer.dir" For safe scaling: use "disk" (default); for performance, when confident that upload bandwidth will never be a bottleneck, experiment with the memory options. The number of blocks a single stream can have queued for uploading is set in "fs.azure.block.upload.active.blocks". The default value is 20. Contributed by Mehakmeet Singh.	2021-09-22 11:19:16 +01:00
sumangala-patki	dd30db78e7	HADOOP-17290. ABFS: Add Identifiers to Client Request Header (#2520 ) Contributed by Sumangala Patki. (cherry picked from commit `35570e414a`)	2021-09-21 16:45:51 +01:00
sumangala-patki	1cb9e747eb	HADOOP-17618. ABFS: Partially obfuscate SAS object IDs in Logs (#2845 ) Contributed by Sumangala Patki (cherry picked from commit `3450522c2f`)	2021-09-09 14:04:12 +01:00
Mukund Thakur	3b1c594355	HADOOP-17156. ABFS: Release the byte buffers held by input streams in close() (#3285 ) Contributed By: Mukund Thakur	2021-09-07 15:29:22 +05:30
Steve Loughran	26514b6534	HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. (#3240 ) This patch cuts down the size of directory trees used for distcp contract tests against object stores, so making them much faster against distant/slow stores. On abfs, the test only runs with -Dscale (as was the case for s3a already), and has the larger scale test timeout. After every test case, the FileSystem IOStatistics are logged, to provide information about what IO is taking place and what it's performance is. There are some test cases which upload files of 1+ MiB; you can increase the size of the upload in the option "scale.test.distcp.file.size.kb" Set it to zero and the large file tests are skipped. Contributed by Steve Loughran.	2021-08-02 12:58:37 +01:00
Brian Loss	37e0828e76	HADOOP-17811: ABFS ExponentialRetryPolicy doesn't pick up configuration values (#3221 ) Contributed by Brian Loss. Change-Id: I5f24196d1d02de91336c3679abaf8d55cfaed746	2021-08-02 11:37:33 +01:00
snehavarma	11825d30e8	HADOOP-17714 ABFS: testBlobBackCompatibility, testRandomRead & WasbAbfsCompatibility tests fail when triggered with default configs (#3035 ) (#3126 ) (cherry picked from commit `35e4c31fff`)	2021-07-12 11:53:46 +05:30
snehavarma	ab3809cf8d	HADOOP-17715 ABFS: Append blob tests with non HNS accounts fail (#3028 ) (#3125 ) (cherry picked from commit `4c039fafeb`)	2021-07-12 11:51:41 +05:30
sumangala-patki	aa6a9cac72	HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#3106 ) * HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#2795) . Contributed by Sumangala Patki. (cherry picked from commit `76d92eb2a2`)	2021-07-10 15:09:59 +05:30
Mukund Thakur	e8f9af6f2a	HADOOP-17250 Lot of short reads can be merged with readahead. (#3110 ) Introducing fs.azure.readahead.range parameter which can be set by the user. Data will be populated in buffer for random reads as well which leads to fewer remote calls. This patch also changes the seek implementation to perform a lazy seek. The actual seek is done when a read is initiated and data is not present in the buffer else data is returned from the buffer thus reducing the number of remote storage calls. Contributed By: Mukund Thakur Change-Id: Ib920eedd0087caa150afa4d4c23e89df56b29e83	2021-07-05 11:23:32 +01:00
Viraj Jasani	8f0ba9ee1b	HADOOP-17725. Improve error message for token providers in ABFS (#3041 ) Contributed by Viraj Jasani.	2021-06-08 22:05:01 +01:00
Mehakmeet Singh	a786847b8f	HADOOP-17670. S3AFS and ABFS to log IOStats at DEBUG mode or optionally at INFO level in close() (#2963 ) When the S3A and ABFS filesystems are closed, their IOStatistics are logged at debug in the log: org.apache.hadoop.fs.statistics.IOStatisticsLogging Set `fs.iostatistics.logging.level` to `info` for the statistics to be logged at info. (also: `warn` or `error` for even higher log levels). Contributed by: Mehakmeet Singh Change-Id: I56d44ad89fc1c0dd4baf701681834e7fd96c544f	2021-05-24 13:04:20 +01:00
sumangala-patki	b20bc668d5	HADOOP-17548. ABFS: Toggle Store Mkdirs request overwrite parameter (#2729 ) (#2781 ) Contributed by Sumangala Patki. (cherry picked from commit `fe633d4739`)	2021-05-10 11:50:01 +05:30
bilaharith	6649e5888b	HADOOP-17536. ABFS: Supporting customer provided encryption key (#2707 ) Contributed by bilahari t h Change-Id: I86216e755b81e9d14f5e87844d9fd58e8940560c	2021-04-27 13:16:33 +01:00
Mehakmeet Singh	389d3034c6	HADOOP-17471. ABFS to collect IOStatistics (#2731 ) (#2950 ) The ABFS Filesystem and its input and output streams now implement the IOStatisticSource interface and provide IOStatistics on their interactions with Azure Storage. This includes the min/max/mean durations of all REST API calls. Contributed by Mehakmeet Singh <mehakmeet.singh@cloudera.com>	2021-04-24 17:59:26 +01:00
Steve Loughran	77fddcfcb1	HADOOP-17535. ABFS: ITestAzureBlobFileSystemCheckAccess test failure if no oauth key. (#2920 ) Contributed by Steve Loughran. Change-Id: I165f5ed3a8486404403827b5c0338cf7f80c2bb1	2021-04-24 17:24:15 +01:00

1 2 3 4 5 ...

362 Commits