This is a mapreduce/spark output committer optimized for
performance and correctness on Azure ADLS Gen 2 storage
(via the abfs connector) and Google Cloud Storage
(via the external gcs connector library).
* It is safe to use with HDFS, however it has not been optimized
for that use.
* It is *not* safe for use with S3, and will fail if an attempt
is made to do so.
Contributed by Steve Loughran
Change-Id: I6f3502e79c578b9fd1a8c1485f826784b5421fca
* New statistic names in StoreStatisticNames
(for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist
+ tests.
This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.
Contributed by Steve Loughran
Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654
(cherry picked from commit d0fa9b5775)
Conflicts:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java
Co-authored-by: Gautham B A <gautham.bangalore@gmail.com>
Multi object delete of size more than 1000 is not supported by S3 and
fails with MalformedXML error. So implementing paging of requests to
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size"
Contributed By: Mukund Thakur
Create unix domain socket in java.io.tmpdir instead of
test.build.dir to avoid 'File name too long' error.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 7fd90cdcbe)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.
Construct it with a factory method and optional callback
for notification on loss and regeneration.
WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
new WeakReferenceThreadMap<>(
(k) -> getUnbondedSpan(),
this::noteSpanReferenceLost);
This is used in ActiveAuditManagerS3A for span tracking.
Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.
Contributed by Steve Loughran.
Change-Id: Ibf7bb082fd47298f7ebf46d92f56e80ca9b2aaf8
Part of HADOOP-17198. Support S3 Access Points.
HADOOP-18068. "upgrade AWS SDK to 1.12.132" broke the access point endpoint
translation.
Correct endpoints should start with "s3-accesspoint.", after SDK upgrade they start with
"s3.accesspoint-" which messes up tests + region detection by the SDK.
Contributed by Bogdan Stolojan
Change-Id: I0c0181628ab803afc39036003777eaec79aa378c
Add support for S3 Access Points. This provides extra security as it
ensures applications are not working with buckets belong to third parties.
To bind a bucket to an access point, set the access point (ap) ARN,
which must be done for each specific bucket, using the pattern
fs.s3a.bucket.$BUCKET.accesspoint.arn = ARN
* The global/bucket option `fs.s3a.accesspoint.required` to
mandate that buckets must declare their access point.
* This is not compatible with S3Guard.
Consult the documentation for further details.
Contributed by Bogdan Stolojan
(this commit contains the changes to TestArnResource from HADOOP-18068,
"upgrade AWS SDK to 1.12.132" so that it works with the later SDK.)
Change-Id: I3fac213e52ca6ec1c813effb8496c353964b8e1b