Commit Graph

1520 Commits

Author SHA1 Message Date
Mehakmeet Singh
8e5620cd9e
HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446)
Addresses the problem of processes running out of memory when
there are many ABFS output streams queuing data to upload,
especially when the network upload bandwidth is less than the rate
data is generated.

ABFS Output streams now buffer their blocks of data to
"disk", "bytebuffer" or "array", as set in
"fs.azure.data.blocks.buffer"

When buffering via disk, the location for temporary storage
is set in "fs.azure.buffer.dir"

For safe scaling: use "disk" (default); for performance, when
confident that upload bandwidth will never be a bottleneck,
experiment with the memory options.

The number of blocks a single stream can have queued for uploading
is set in "fs.azure.block.upload.active.blocks".
The default value is 20.

Contributed by Mehakmeet Singh.
2021-09-22 11:19:16 +01:00
sumangala-patki
dd30db78e7
HADOOP-17290. ABFS: Add Identifiers to Client Request Header (#2520)
Contributed by Sumangala Patki.

(cherry picked from commit 35570e414a)
2021-09-21 16:45:51 +01:00
sumangala-patki
1cb9e747eb
HADOOP-17618. ABFS: Partially obfuscate SAS object IDs in Logs (#2845)
Contributed by Sumangala Patki

(cherry picked from commit 3450522c2f)
2021-09-09 14:04:12 +01:00
Steve Loughran
a2242df10a
HADOOP-17894. CredentialProviderFactory.getProviders() recursion loading JCEKS file from S3A (#3393)
* CredentialProviderFactory to detect and report on recursion.
* S3AFS to remove incompatible providers.
* Integration Test for this.

Contributed by Steve Loughran.

Change-Id: Ia247b3c9fe8488ffdb7f57b40eb6e37c57e522ef
2021-09-08 17:00:20 +01:00
Mukund Thakur
3b1c594355 HADOOP-17156. ABFS: Release the byte buffers held by input streams in close() (#3285)
Contributed By: Mukund Thakur
2021-09-07 15:29:22 +05:30
Dongjoon Hyun
8606b2cddd
HADOOP-17869. fs.s3a.connection.maximum should be bigger than fs.s3a.threads.max (#3337).
The value of `fs.s3a.connection.maximum` has been increased to 96

Contributed by Dongjoon Hyun

Change-Id: I9020a2bfd2a67fa7a2ec0598ed9d63e78ee99c73
2021-08-30 18:31:57 +01:00
Steve Loughran
c1ad91e72d
HADOOP-17822. fs.s3a.acl.default not working after S3A Audit feature (#3249)
Fixes the regression caused by HADOOP-17511 by moving where the
option  fs.s3a.acl.default is read -doing it before the RequestFactory
is created.

Adds

* A unit test in TestRequestFactory to verify the ACLs are set
  on all file write operations.
* A new ITestS3ACannedACLs test which verifies that ACLs really
  do get all the way through.
* S3A Assumed Role delegation tokens to include the IAM permission
  s3:PutObjectAcl in the generated role.

Contributed by Steve Loughran

Change-Id: I3abac6a1b9e150b6b6df0af7c2c70093f8f518cb
2021-08-02 15:33:34 +01:00
Steve Loughran
26514b6534 HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. (#3240)
This patch cuts down the size of directory trees used for
distcp contract tests against object stores, so making
them much faster against distant/slow stores.

On abfs, the test only runs with -Dscale (as was the case for s3a already),
and has the larger scale test timeout.

After every test case, the FileSystem IOStatistics are logged,
to provide information about what IO is taking place and
what it's performance is.

There are some test cases which upload files of 1+ MiB; you can
increase the size of the upload in the option
"scale.test.distcp.file.size.kb" 
Set it to zero and the large file tests are skipped.

Contributed by Steve Loughran.
2021-08-02 12:58:37 +01:00
Bobby Wang
904cdd0b00
HADOOP-17812. NPE in S3AInputStream read() after failure to reconnect to store (#3222)
This improves error handling after multiple failures reading data
-when the read fails and attempts to reconnect() also fail.

Contributed by Bobby Wang.

Change-Id: If17dee395ad6b9b7c738021bad20d0a13eb4011e
2021-08-02 12:58:25 +01:00
Petre Bogdan Stolojan
f2cec5cb88
HADOOP-17139 Re-enable optimized copyFromLocal implementation in S3AFileSystem (#3101)
This work
* Defines the behavior of FileSystem.copyFromLocal in filesystem.md
* Implements a high performance implementation of copyFromLocalOperation
  for S3
* Adds a contract test for the operation: AbstractContractCopyFromLocalTest
* Implements the contract tests for Local and S3A FileSystems

Contributed by: Bogdan Stolojan

Change-Id: I25d502102775c3626c4264e5a14c649879730050
2021-08-02 11:58:36 +01:00
Brian Loss
37e0828e76
HADOOP-17811: ABFS ExponentialRetryPolicy doesn't pick up configuration values (#3221)
Contributed by Brian Loss.

Change-Id: I5f24196d1d02de91336c3679abaf8d55cfaed746
2021-08-02 11:37:33 +01:00
bshashikant
18bd66e5b0 HDFS-16145. CopyListing fails with FNF exception with snapshot diff. (#3234)
(cherry picked from commit dac10fcc20)
2021-07-28 09:38:06 +01:00
Petre Bogdan Stolojan
e89d30b6b7
HADOOP-17458. S3A to treat "SdkClientException: Data read has a different length than the expected" as EOFException (#3040)
Some network exceptions can raise SdkClientException with message
`Data read has a different length than the expected`.

These should be recoverable.

Contributed by Bogdan Stolojan

Change-Id: Ia22fd77d90971e9e02b4f947398a4749eebe5909
2021-07-23 14:46:59 +01:00
Mehakmeet Singh
14a3e74c5c
HADOOP-17801. No error message reported when bucket doesn't exist in S3AFS (#3202)
Contributed by: Mehakmeet Singh.

Change-Id: I26c2a85ef6bbfd1b8269a23fc44d9a55d7fa091c
2021-07-16 15:36:54 +01:00
Mehakmeet Singh
cd15b0cb8a HADOOP-17803. Remove WARN logging from LoggingAuditor when executing a request outside an audit span (#3207)
Followup to HADOOP-17511. "Add audit/telemetry logging to S3A connector"

Contributed by Mehakmeet Singh
2021-07-16 11:52:37 +01:00
snehavarma
11825d30e8
HADOOP-17714 ABFS: testBlobBackCompatibility, testRandomRead & WasbAbfsCompatibility tests fail when triggered with default configs (#3035) (#3126)
(cherry picked from commit 35e4c31fff)
2021-07-12 11:53:46 +05:30
snehavarma
ab3809cf8d
HADOOP-17715 ABFS: Append blob tests with non HNS accounts fail (#3028) (#3125)
(cherry picked from commit 4c039fafeb)
2021-07-12 11:51:41 +05:30
sumangala-patki
aa6a9cac72
HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#3106)
* HADOOP-17596. ABFS: Change default Readahead Queue Depth from num(processors) to const (#2795)
. Contributed by Sumangala Patki.

(cherry picked from commit 76d92eb2a2)
2021-07-10 15:09:59 +05:30
litao
7cb91db575 HDFS-16122. Fix DistCpContext#toString() (#3191). Contributed by tomscut.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-07-10 13:56:36 +05:30
Mukund Thakur
e8f9af6f2a
HADOOP-17250 Lot of short reads can be merged with readahead. (#3110)
Introducing fs.azure.readahead.range parameter which can be set by the user.
Data will be populated in buffer for random reads as well which leads to fewer
remote calls.

This patch also changes the seek implementation to perform a lazy seek. The
actual seek is done when a read is initiated and data is not present in the buffer else
data is returned from the buffer thus reducing the number of remote storage calls.

Contributed By: Mukund Thakur

Change-Id: Ib920eedd0087caa150afa4d4c23e89df56b29e83
2021-07-05 11:23:32 +01:00
Mehakmeet Singh
f1a14df9e6
HADOOP-17774. S3A bytesRead FS statistic showing twice the correct value (#3144)
Contributed by: Mehakmeet Singh

Change-Id: I3302654ca36474a5f399aa848f88bce4587022d8
2021-07-02 14:13:26 +01:00
Zamil Majdy
80859d714d
HADOOP-17764. S3AInputStream read does not re-open the input stream on the second read retry attempt (#3109)
Contributed by Zamil Majdy.

Change-Id: I680d9c425c920ff1a7cd4764d62e10e6ac78bee4
2021-06-25 20:47:11 +01:00
Steve Loughran
39e6f2d191
HADOOP-17771. S3AFS creation fails "Unable to find a region via the region provider chain." (#3133)
This addresses the regression in Hadoop 3.3.1 where if no S3 endpoint
is set in fs.s3a.endpoint, S3A filesystem creation may fail on
non-EC2 deployments, depending on the local host environment setup.

* If fs.s3a.endpoint is empty/null, and fs.s3a.endpoint.region
  is null, the region is set to "us-east-1".
* If fs.s3a.endpoint.region is explicitly set to "" then the client
  falls back to the SDK region resolution chain; this works on EC2
* Details in troubleshooting.md, including a workaround for Hadoop-3.3.1+
* Also contains some minor restructuring of troubleshooting.md

Contributed by Steve Loughran.

Change-Id: Ife482cff513307cd52d59eec56beac0a33e031f5
2021-06-24 16:38:55 +01:00
Takanobu Asanuma
25138c98bf HADOOP-17760. Delete hadoop.ssl.enabled and dfs.https.enable from docs and core-default.xml (#3099)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 9e7c7ad129)
2021-06-17 10:00:36 +09:00
Petre Bogdan Stolojan
254d943126
HADOOP-17547 Magic committer to downgrade abort in cleanup if list uploads fails with access denied (#3051)
Contributed by Bogdan Stolojan

Change-Id: I32d6dc4f72087783a3ea12473d11690ac14fe3cb
2021-06-12 17:46:11 +01:00
Viraj Jasani
8f0ba9ee1b
HADOOP-17725. Improve error message for token providers in ABFS (#3041)
Contributed by Viraj Jasani.
2021-06-08 22:05:01 +01:00
Akira Ajisaka
37516726d7 HDFS-16050. Some dynamometer tests fail. (#3079)
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit 57a3613e5d)
2021-06-07 15:03:06 +09:00
zhengchenyu
7feb41b73d
MAPREDUCE-7287. Distcp will delete exists file , If we use "-delete and -update" options and distcp file. (#2852)
Contributed by zhengchenyu

Change-Id: I61edf9a443c0c6cd5b5dd911901708530cf131ed
2021-05-28 20:27:00 +01:00
Steve Loughran
464bbd5b7c
HADOOP-17511. Add audit/telemetry logging to S3A connector (#2807)
The S3A connector supports
"an auditor", a plugin which is invoked
at the start of every filesystem API call,
and whose issued "audit span" provides a context
for all REST operations against the S3 object store.

The standard auditor sets the HTTP Referrer header
on the requests with information about the API call,
such as process ID, operation name, path,
and even job ID.

If the S3 bucket is configured to log requests, this
information will be preserved there and so can be used
to analyze and troubleshoot storage IO.

Contributed by Steve Loughran.

Change-Id: Ic0a105c194342ed2d529833ecc42608e8ba2f258
2021-05-25 12:55:38 +01:00
Mehakmeet Singh
b82a0fa9e6
HADOOP-17705. S3A to add Config to set AWS region (#3020)
The option `fs.s3a.endpoint.region` can be used
to explicitly set the AWS region of a bucket.

This is needed when using AWS Private Link, as
the region cannot be automatically determined.

Contributed by Mehakmeet Singh

Change-Id: I4b52f85d7af0ddd56b2b0505ac0124d4fcc67ca0
2021-05-24 13:13:37 +01:00
Mehakmeet Singh
a786847b8f
HADOOP-17670. S3AFS and ABFS to log IOStats at DEBUG mode or optionally at INFO level in close() (#2963)
When the S3A and ABFS filesystems are closed,
their IOStatistics are logged at debug in the log:

org.apache.hadoop.fs.statistics.IOStatisticsLogging

Set `fs.iostatistics.logging.level` to `info` for the statistics
to be logged at info. (also: `warn` or `error` for even higher
log levels).

Contributed by: Mehakmeet Singh

Change-Id: I56d44ad89fc1c0dd4baf701681834e7fd96c544f
2021-05-24 13:04:20 +01:00
Wei-Chiu Chuang
fa4915fdbb
Preparing for 3.3.2 development 2021-05-19 21:52:37 +08:00
Aryan Gupta
7142dfec04
HADOOP-17302. Upgrade to jQuery 3.5.1 in hadoop-sls. (#2379)
Co-authored-by: Aryan Gupta
(cherry picked from commit d60d5fe43d)
2021-05-13 12:16:24 +08:00
bilaharith
1c6c3920b6
HADOOP-17444. ADLS Gen1: Update adls SDK to 2.3.9 (#2842)
Contributed by bilaharith
2021-05-12 18:15:27 +01:00
sumangala-patki
b20bc668d5
HADOOP-17548. ABFS: Toggle Store Mkdirs request overwrite parameter (#2729) (#2781)
Contributed by Sumangala Patki.

(cherry picked from commit fe633d4739)
2021-05-10 11:50:01 +05:30
Takanobu Asanuma
df50e210dd
HADOOP-17375. Fix the error of TestDynamometerInfra. (#2471)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 207210263a)
2021-05-07 14:09:03 +09:00
Steve Loughran
e944cc0338
HADOOP-16742. NullPointerException in S3A MultiObjectDeleteSupport
Contributed by Tor Arvid Lund.

Change-Id: Iadfe9b2f355cf373031075bfbe681705a2c65bdc
2021-05-04 16:15:13 +01:00
bilaharith
6649e5888b
HADOOP-17536. ABFS: Supporting customer provided encryption key (#2707)
Contributed by bilahari t h

Change-Id: I86216e755b81e9d14f5e87844d9fd58e8940560c
2021-04-27 13:16:33 +01:00
Steve Loughran
8308aab658
HADOOP-17112. S3A committers can't handle whitespace in paths. (#2953)
Contributed by Krzysztof Adamski.

Change-Id: I2746aabcfeb0fbb138a80b02c4d5bbf2a8cf75da
2021-04-25 18:43:25 +01:00
Steve Loughran
fb71e6c91e
HADOOP-17597. Optionally downgrade on S3A Syncable calls (#2801)
Followup to HADOOP-13327, which changed S3A output stream hsync/hflush calls
to raise an exception.

Adds a new option fs.s3a.downgrade.syncable.exceptions

When true, calls to Syncable hsync/hflush on S3A output streams will
log once at warn (for entire process life, not just the stream), then
increment IOStats with the relevant operation counter

With the downgrade option false (default)
* IOStats are incremented
* The UnsupportedOperationException current raised includes a link to the
  JIRA.

Contributed by Steve Loughran.

Change-Id: I967e077eda1d1a1a3795b4d22e003fe7997b6679
2021-04-24 18:32:39 +01:00
Gabor Bota
1b2bc77923
HADOOP-17454. [s3a] Disable bucket existence check - set fs.s3a.bucket.probe to 0 (#2593)
Also fixes HADOOP-16995. ITestS3AConfiguration proxy tests failures when bucket probes == 0
The improvement should include the fix, because the test would fail by default otherwise.

Change-Id: I9a7e4b5e6d4391ebba096c15e84461c038a2ec59
2021-04-24 18:28:21 +01:00
Mukund Thakur
33f9ceb3cc HADOOP-17136. ITestS3ADirectoryPerformance.testListOperations failing (#2153)
A regression caused by HADOOP-17022: the reduction in LIST calls broken an assertion.

Contributed by Mukund Thakur

Change-Id: Ib9725165906931634567fd1f62a81e3a6ea5620c
2021-04-24 18:28:14 +01:00
Mehakmeet Singh
389d3034c6
HADOOP-17471. ABFS to collect IOStatistics (#2731) (#2950)
The ABFS Filesystem and its input and output streams now implement
the IOStatisticSource interface and provide IOStatistics on
their interactions with Azure Storage.

This includes the min/max/mean durations of all REST API calls.

Contributed by Mehakmeet Singh <mehakmeet.singh@cloudera.com>
2021-04-24 17:59:26 +01:00
Steve Loughran
77fddcfcb1
HADOOP-17535. ABFS: ITestAzureBlobFileSystemCheckAccess test failure if no oauth key. (#2920)
Contributed by Steve Loughran.

Change-Id: I165f5ed3a8486404403827b5c0338cf7f80c2bb1
2021-04-24 17:24:15 +01:00
Ayush Saxena
b743d56eb4 HADOOP-17620. DistCp: Use Iterator for listing target directory as well. (#2861). Contributed by Ayush Saxena.
Signed-off-by: Vinayakumar B <vinayakumarb@apache.org>
2021-04-23 22:49:28 +05:30
billierinaldi
8170a7bb60 HADOOP-16948. Support infinite lease dirs (#1925). Contributed by Billie Rinaldi.
(cherry picked from commit c1fde4fe94)
2021-04-20 14:36:54 -04:00
Steve Loughran
f30a0debae HADOOP-17641. ITestWasbUriAndConfiguration failing. (#2937)
This moves the mock account name --which is required to never exist-- from
"mockAccount"  to an account name containing a static UUID.

Contributed by Steve Loughran.
2021-04-20 15:37:18 +01:00
sumangala-patki
8daa26d2e5
HADOOP-17576. ABFS: Disable throttling update for auth failures (#2761) (#2885)
Contributed by Sumangala Patki

(cherry picked from commit 6f640abbaf)
2021-04-16 10:47:11 +05:30
Viraj Jasani
8b4b3d6fe6 HADOOP-17622. Avoid usage of deprecated IOUtils#cleanup API. (#2862)
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit 3f2682b92b)
2021-04-06 14:18:31 +09:00
Ayush Saxena
9c9b16c957
HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (#2808). Contributed by Ayush Saxena.
* HADOOP-17531. DistCp: Reduce memory usage on copying huge directories. (#2732).

* HADOOP-17531.Addendum: DistCp: Reduce memory usage on copying huge directories. (#2820)

Signed-off-by: Steve Loughran <stevel@apache.org>
2021-03-27 09:25:25 +05:30