Commit Graph

1637 Commits

Author SHA1 Message Date
Steve Loughran
e0cd0a82e0
HADOOP-16202. Enhanced openFile(): hadoop-aws changes. (#2584/3)
S3A input stream support for the few fs.option.openfile settings.
As well as supporting the read policy option and values,
if the file length is declared in fs.option.openfile.length
then no HEAD request will be issued when opening a file.
This can cut a few tens of milliseconds off the operation.

The patch adds a new openfile parameter/FS configuration option
fs.s3a.input.async.drain.threshold (default: 16000).
It declares the number of bytes remaining in the http input stream
above which any operation to read and discard the rest of the stream,
"draining", is executed asynchronously.
This asynchronous draining offers some performance benefit on seek-heavy
file IO.

Contributed by Steve Loughran.

Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39
2022-04-24 17:33:05 +01:00
Steve Loughran
6999acf520
HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2)
These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.

As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.

This commit depends on the associated hadoop-common patch,
which must be committed first.

Contributed by Steve Loughran.

Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
2022-04-24 17:33:05 +01:00
GuoPhilipse
214f369073
HDFS-16556. Fix typos in distcp (#4217) 2022-04-22 14:01:20 -04:00
Daniel Carl Jones
a6ebc42671
HADOOP-18201. Remove endpoint config overrides for ITestS3ARequesterPays (#4169)
Contributed by Daniel Carl Jones.
2022-04-14 16:21:34 +01:00
Viraj Jasani
7c20602b17
HDFS-16522. Set Http and Ipc ports for Datanodes in MiniDFSCluster (#4108)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-06 18:17:02 +09:00
litao
966b773a7c
HDFS-16527. Add global timeout rule for TestRouterDistCpProcedure (#4129)
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-06 14:34:24 +09:00
9uapaw
4b1a6bfb10 YARN-11102. Fix spotbugs error in hadoop-sls module. Contributed by Szilard Nemeth, Andras Gyori. 2022-04-01 18:24:37 +02:00
Szilard Nemeth
94031b729d YARN-11103. SLS cleanup after previously merged SLS refactor jiras. Contributed by Szilard Nemeth 2022-03-31 14:29:59 +02:00
Szilard Nemeth
ab8c360620 YARN-10550. Decouple NM runner logic from SLSRunner. Contributed by Szilard Nemeth 2022-03-30 19:53:10 +02:00
9uapaw
e386d6a661 YARN-10549. Decouple RM runner logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-29 09:58:27 +02:00
9uapaw
adbaf48082 YARN-11100. Fix StackOverflowError in SLS scheduler event handling. Contributed by Szilard Nemeth. 2022-03-26 21:43:10 +01:00
9uapaw
08a77a765b YARN-10548. Decouple AM runner logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-25 18:48:56 +01:00
Benjamin Teke
ffa0eab488 YARN-11094. Follow up changes for YARN-10547. Contributed by Szilard Nemeth 2022-03-25 12:01:44 +01:00
9uapaw
526142447a YARN-10552. Eliminate code duplication in SLSCapacityScheduler and SLSFairScheduler. Contributed by Szilard Nemeth. 2022-03-24 16:24:33 +01:00
9uapaw
077c6c62d6 YARN-10547. Decouple job parsing logic from SLSRunner. Contributed by Szilard Nemeth. 2022-03-24 06:16:26 +01:00
Daniel Carl Jones
9edfe30a60
HADOOP-14661. Add S3 requester pays bucket support to S3A (#3962)
Adds the option fs.s3a.requester.pays.enabled, which, if set to true, allows
the client to access S3 buckets where the requester is billed for the IO.

Contributed by Daniel Carl Jones
2022-03-23 20:00:50 +00:00
Steve Loughran
708a0ce21b
HADOOP-13704. Optimized S3A getContentSummary()
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.

Contributed by Ahmar Suhail.

Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
2022-03-22 13:21:12 +00:00
Steve Loughran
8294bd5a37
HADOOP-18163. hadoop-azure support for the Manifest Committer of MAPREDUCE-7341
Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests

* resilient rename
* tests for job commit through the manifest committer.

contains
- HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls
- HADOOP-16204. ABFS tests to include terasort

Contributed by Steve Loughran.

Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f
2022-03-17 11:24:51 +00:00
Mukund Thakur
672e380c4f
HADOOP-18112: Implement paging during multi object delete. (#4045)
Multi object delete of size more than 1000 is not supported by S3 and 
fails with MalformedXML error. So implementing paging of requests to 
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size" 

 Contributed By: Mukund Thakur
2022-03-11 13:05:45 +05:30
Viraj Jasani
66b72406bd
HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2022-03-08 17:27:04 +09:00
Mehakmeet Singh
6995374b54
HADOOP-18150. Fix ITestAuditManagerDisabled test in S3A. (#4044)
Contributed by Mehakmeet Singh
2022-03-03 18:44:28 +00:00
Steve Loughran
b56af00114
HADOOP-18075. ABFS: Fix failure caused by listFiles() in ITestAbfsRestOperationException (#4040)
Contributed by Sumangala Patki
2022-03-01 11:48:10 +00:00
Sumangala Patki
c18b646020
HADOOP-18071. ABFS: Set driver global timeout for ITestAzureBlobFileSystemBasics (#3866)
Contributed by Sumangala Patki
2022-02-23 19:38:10 +00:00
monthonk
1f157f802d
HADOOP-17386. Change default fs.s3a.buffer.dir to be under Yarn container path on yarn applications (#3908)
Co-authored-by: Monthon Klongklaew <monthonk@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-02-22 13:50:27 +09:00
Mohanad Elsafty
a4f459097b
HADOOP-18117. Add an option to preserve root directory permissions (#3970) 2022-02-18 19:12:50 +08:00
Ayush Saxena
fe583c4b63
HADOOP-18096. Distcp: Sync moves filtered file to home directory rather than deleting. (#3940). Contributed by Ayush Saxena.
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: stack <stack@apache.org>
2022-02-11 01:59:40 +05:30
Steve Loughran
efdec92cab
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.
2022-02-10 12:31:41 +00:00
Joey Krabacher
a08e69d33e
HADOOP-18114. Documentation correction in assumed_roles.md (#3949)
Fixes typo in hadoop-aws/assumed_roles.md

Contributed by Joey Krabacher
2022-02-09 10:35:11 +00:00
Petre Bogdan Stolojan
5e7ce26e66
HADOOP-18085. S3 SDK Upgrade causes AccessPoint ARN endpoint mistranslation (#3902)
Part of HADOOP-17198. Support S3 Access Points.

HADOOP-18068. "upgrade AWS SDK to 1.12.132" broke the access point endpoint
translation.

Correct endpoints should start with "s3-accesspoint.", after SDK upgrade they start with
"s3.accesspoint-" which messes up tests + region detection by the SDK.

Contributed by Bogdan Stolojan
2022-02-04 15:37:08 +00:00
Steve Loughran
b795f6f9a8
HADOOP-18094. Disable S3A auditing by default.
See HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

* Adds a new option fs.s3a.audit.enabled to controls whether or not auditing
is enabled. This is false by default.

* When false, the S3A auditing manager is NoopAuditManagerS3A,
which was formerly only used for unit tests and
during filsystem initialization.

* When true, ActiveAuditManagerS3A is used for managing auditing,
allowing auditing events to be reported.

* updates documentation and tests.

This patch does not fix the underlying leak. When auditing is enabled,
long-lived threads will retain references to the audit managers
of S3A filesystem instances which have already been closed.

Contributed by Steve Loughran.
2022-01-24 13:37:33 +00:00
Anmol Asrani
7c97c0f969
HADOOP-18084. ABFS: Add testfilePath while verifying test contents are read correctly (#3903)
Contributed by: Anmol Asrani
2022-01-19 10:13:13 +00:00
Steve Loughran
d8ab84275e
HADOOP-18068. upgrade AWS SDK to 1.12.132 (#3864)
With this update, the versions of key shaded dependencies are

  jackson    2.12.3
  httpclient 4.5.13

Contributed by Steve Loughran
2022-01-18 10:31:28 +00:00
Steve Loughran
14ba19af06
HADOOP-17409. Remove s3guard from S3A module (#3534)
Completely removes S3Guard support from the S3A codebase.

If the connector is configured to use any metastore other than
the null and local stores (i.e. DynamoDB is selected) the s3a client
will raise an exception and refuse to initialize.

This is to ensure that there is no mix of S3Guard enabled and disabled
deployments with the same configuration but different hadoop releases
-it must be turned off completely.

The "hadoop s3guard" command has been retained -but the supported
subcommands have been reduced to those which are not purely S3Guard
related: "bucket-info" and "uploads".

This is major change in terms of the number of files
changed; before cherry picking subsequent s3a patches into
older releases, this patch will probably need backporting
first.

Goodbye S3Guard, your work is done. Time to die.

Contributed by Steve Loughran.
2022-01-17 18:08:57 +00:00
monthonk
b27732c69b
HADOOP-14334. S3 SSEC tests to downgrade when running against a mandatory encryption object store (#3870)
Contributed by Monthon Klongklaew
2022-01-09 18:01:47 +00:00
Viraj Jasani
08c803ea30
MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855) 2022-01-09 00:18:10 +09:00
Ayush Saxena
657a2882e9
HADOOP-18056. DistCp: Filter duplicates in the source paths. (#3825). Contributed by Ayush Saxena.
Reviewed-by: tomscut <litao@bigo.sg>
Reviewed-by: Steve Loughran <stevel@apache.org>
2022-01-05 23:53:07 +05:30
Akira Ajisaka
dba139cd0f
HADOOP-18045. Disable TestDynamometerInfra (#3829)
Reviewed-by: Fei Hui <feihui.ustc@gmail.com>
2021-12-28 13:22:12 +09:00
Ashutosh Gupta
ebdbe7eb82
HADOOP-18057. Fix typo: validateEncrytionSecrets -> validateEncryptionSecrets (#3826) 2021-12-27 16:51:17 +08:00
Viraj Jasani
04b6b9a87b
HADOOP-16908. Prune Jackson 1 from the codebase and restrict it's usage for future (#3789)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2021-12-20 16:01:34 +09:00
Szilard Nemeth
a967033a9f
YARN-10427. Duplicate Job IDs in SLS output (#3809). Contributed by Szilard Nemeth 2021-12-17 00:34:16 +01:00
Akira Ajisaka
9b9e2ef87f
HADOOP-18040. Use maven.test.failure.ignore instead of ignoreTestFailure (#3774)
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
2021-12-10 01:36:31 +09:00
GuoPhilipse
c65c87f211
HADOOP-18026. Fix default value of Magic committer (#3723)
Contributed by guophilipse
2021-11-29 15:50:30 +00:00
Viraj Jasani
215388beea
HADOOP-18022. Add restrict-imports-enforcer-rule for Guava Preconditions and remove remaining usages (#3712)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2021-11-29 17:37:30 +09:00
Steve Loughran
98fe0d0fc3
HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633)
Contributed by Steve Loughran
2021-11-24 17:33:12 +00:00
Mehakmeet Singh
a35f7dec25
HADOOP-18016. Make certain methods LimitedPrivate in S3AUtils.java (#3685)
Contributed By: Mehakmeet Singh
2021-11-24 13:32:59 +05:30
Andrew Chung
5b1b2c8ef6
YARN-11003. Make RMNode aware of all (OContainer inclusive) allocated resources (#3646) 2021-11-23 13:20:08 -08:00
Viraj Jasani
c7ec1897c4
HADOOP-18018. unguava: remove Preconditions from hadoop-tools modules (#3688) 2021-11-23 13:34:10 +09:00
Steve Loughran
3391b69692
HADOOP-18002. ABFS rename idempotency broken -remove recovery (#3641)
Cut modtime-based rename recovery as object modification time
is not updated during rename operation.
Applications will have to use etag API of HADOOP-17979
and implement it themselves.

Why not do the HEAD and etag recovery in ABFS client?
Cuts the IO capacity in half so kills job commit performance.
The manifest committer of MAPREDUCE-7341 will do this recovery
and act as the reference implementation of the algorithm.

Contributed by: Steve Loughran
2021-11-16 16:47:44 +05:30
Steve Loughran
45f164a854 Revert "HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341)"
This reverts commit 82658a22d6.
2021-11-05 14:21:15 +00:00
sumangala-patki
e1ac10ceae
HADOOP-17863. ABFS: Fix compiler deprecation warning in TextFileBasedIdentityHandler (#3332)
Closes #3332 

Contributed by Sumangala Patki
2021-11-05 12:53:51 +00:00