Commit Graph

5407 Commits

Author SHA1 Message Date
GuoPhilipse
7512714475
HDFS-16449. Fix hadoop web site release notes and changelog not available (#3967)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit b68964336d)
2022-02-14 05:40:16 +09:00
daimin
9071c9646c
Fix thread safety of EC decoding during concurrent preads (#3881)
(cherry picked from commit 0e74f1e467)
2022-02-11 10:20:45 +08:00
Steve Loughran
088684ec60
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.

Change-Id: Ibf7bb082fd47298f7ebf46d92f56e80ca9b2aaf8
2022-02-10 12:33:40 +00:00
Abhishek Das
8b03514eaf HADOOP-18100: Change scope of inner classes in InodeTree to make them accessible outside package
Fixes #3950

Signed-off-by: Owen O'Malley <omalley@apache.org>

Cherry-picked from 3684c7f6 by Owen O'Malley
2022-02-04 12:13:10 -08:00
Petre Bogdan Stolojan
664075f35d
HADOOP-17198. Support S3 Access Points (#3260)
Add support for S3 Access Points. This provides extra security as it
ensures applications are not working with buckets belong to third parties.

To bind a bucket to an access point, set the access point (ap) ARN,
which must be done for each specific bucket, using the pattern

fs.s3a.bucket.$BUCKET.accesspoint.arn = ARN

* The global/bucket option `fs.s3a.accesspoint.required` to
mandate that buckets must declare their access point.
* This is not compatible with S3Guard.

Consult the documentation for further details.

Contributed by Bogdan Stolojan

(this commit contains the changes to TestArnResource from HADOOP-18068,
 "upgrade AWS SDK to 1.12.132" so that it works with the later SDK.)

Change-Id: I3fac213e52ca6ec1c813effb8496c353964b8e1b
2022-02-04 16:21:35 +00:00
Xing Lin
d613776b64
HADOOP-18093. Better exception handling for testFileStatusOnMountLink() in ViewFsBaseTest.java (#3918). Contributed by Xing Lin. (#3929)
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 0d17b629ff)
2022-01-26 21:55:32 +05:30
Steve Loughran
8ccc586af6
HADOOP-17409. Remove s3guard from S3A module (#3534)
Completely removes S3Guard support from the S3A codebase.

If the connector is configured to use any metastore other than
the null and local stores (i.e. DynamoDB is selected) the s3a client
will raise an exception and refuse to initialize.

This is to ensure that there is no mix of S3Guard enabled and disabled
deployments with the same configuration but different hadoop releases
-it must be turned off completely.

The "hadoop s3guard" command has been retained -but the supported
subcommands have been reduced to those which are not purely S3Guard
related: "bucket-info" and "uploads".

This is major change in terms of the number of files
changed; before cherry picking subsequent s3a patches into
older releases, this patch will probably need backporting
first.

Goodbye S3Guard, your work is done. Time to die.

Contributed by Steve Loughran.
2022-01-18 18:04:48 +00:00
Viraj Jasani
5e9e779ed2
HADOOP-17152. Provide Hadoop's own Lists utility to reduce dependency on Guava (#3061)
Change-Id: I52e55b9d9826ad661e9ad7dc15f007aa168f0fe1
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2022-01-18 11:57:25 +00:00
ahmarsuhail
6649c2813e
HADOOP-16223. Remove misleading fs.s3a.delegation.tokens.enabled prompt (#3879)
Contributed by Ahmar Suhail

Change-Id: I6a33043831a059325c58b0f76c925e52c6ae14f7
2022-01-12 17:27:53 +00:00
Mukund Thakur
60c1c6d93c HADOOP-18065. ExecutorHelper.logThrowableFromAfterExecute() is too noisy. (#3860)
Downgrading warn logs to debug in case of InterruptedException

Contributed By: Mukund Thakur
2022-01-10 13:52:02 +05:30
Wei-Chiu Chuang
350b51f287 Make upstream aware of 3.3.1 release 2022-01-04 14:48:49 -08:00
jianghuazhu
fd75c4a158 HADOOP-18063. Remove unused import AbstractJavaKeyStoreProvider in Shell class. (#3846)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 7398a0f1b2)
2022-01-04 11:26:07 +09:00
Ashutosh Gupta
6535a183b2 HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)
Co-authored-by: xuzq <xuzengqiang@kuaishou.com>
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit caab29ec88)
2021-12-28 22:00:48 +09:00
Akira Ajisaka
cd30687a15 Revert "HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)"
This reverts commit 05b43f2057.
2021-12-28 21:51:40 +09:00
Ashutosh Gupta
05b43f2057 HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)
Co-authored-by: xuzq <xuzengqiang@kuaishou.com>
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit caab29ec88)
2021-12-28 21:49:06 +09:00
Dhananjay Badaya
e7b1f87665 HADOOP-13500. Synchronizing iteration of Configuration properties object (#3775)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4483607a4e)
2021-12-17 16:07:46 +09:00
Akira Ajisaka
35c5c6bb83 HADOOP-18040. Use maven.test.failure.ignore instead of ignoreTestFailure (#3774)
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
(cherry picked from commit 9b9e2ef87f)

 Conflicts:
	hadoop-tools/hadoop-federation-balance/pom.xml
2021-12-10 01:38:26 +09:00
Steve Loughran
67eaf5aa9f
HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633)
Contributed by Steve Loughran

Change-Id: I596205d788f623114c12962941445432e2036c34
2021-11-29 16:20:55 +00:00
smarthan
bc40a41064 HADOOP-18023. Allow cp command to run with multi threads. (#3721)
(cherry picked from commit 932a78fe38)
2021-11-29 12:47:02 +00:00
Viraj Jasani
6094e1ec9a
HDFS-16171. De-flake testDecommissionStatus (#3280)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 6342d5e523)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
2021-11-25 14:15:10 +09:00
Istvan Fajth
48e95d8109 HADOOP-17975. Fallback to simple auth does not work for a secondary DistributedFileSystem instance. (#3579)
(cherry picked from commit ae3ba45db5)
2021-11-24 10:47:49 +00:00
smarthan
cbb3ba135c HADOOP-17998. Allow get command to run with multi threads. (#3645)
(cherry picked from commit 63018dc73f)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
2021-11-22 12:14:32 +00:00
Abhishek Das
f456dc1837 HADOOP-17999. No-op implementation of setWriteChecksum and setVerifyChecksum in ViewFileSystem. Contributed by Abhishek Das. (#3639)
(cherry picked from commit 54a1d78e16)
2021-11-16 22:40:24 -08:00
litao
026d5860cb
HDFS-16315. Add metrics related to Transfer and NativeCopy for DataNode (#3666) 2021-11-17 11:06:53 +09:00
Chao Sun
e079fa6577 Preparing for 3.3.3 development 2021-11-16 16:02:34 -08:00
litao
340dee4469
HDFS-16319. Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount (#3653). Contributed by tomscut.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-11-14 20:12:13 +05:30
litao
421013825f
HADOOP-18005. Correct log format for LdapGroupsMapping (#3647). Contributed by tomscut.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-11-14 20:01:09 +05:30
Steve Loughran
a68671eaf7
HADOOP-17928. Syncable: S3A to warn and downgrade (#3585)
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.

Change-Id: I0a02ec1e622343619f147f94158c18928a73a885
2021-11-04 14:41:42 +00:00
Mehakmeet Singh
bd077c3814
HADOOP-17953. S3A: Tests to lookup global or per-bucket configuration for encryption algorithm (#3525)
Followup to S3-CSE work of HADOOP-13887

Contributed by Mehakmeet Singh
2021-10-21 12:03:50 +01:00
Szilard Nemeth
6f45666d0b HADOOP-17857. Check real user ACLs in addition to proxied user ACLs. Contributed by Eric Payne
(cherry picked from commit 5428d36b56)
2021-10-19 20:40:30 +00:00
Steve Loughran
b8f3e54ff7 HADOOP-17945. JsonSerialization raises EOFException reading JSON data stored on google GCS (#3501)
Contributed By: Steve Loughran
2021-10-19 15:36:10 +05:30
Xing Lin
af920f138b HADOOP-16532. Fix TestViewFsTrash to use the correct homeDir. Contributed by Xing Lin. (#3514)
(cherry picked from commit 97c0f96879)
2021-10-13 14:58:08 -07:00
Masatake Iwasaki
9e2936f8d1
HADOOP-17424. Replace HTrace with No-Op tracer (#3520)
(cherry picked from commit 1a205cc3ad)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
2021-10-12 00:07:09 +09:00
Viraj Jasani
77ee5a4266
HADOOP-17950. Provide replacement for deprecated APIs of commons-io IOUtils (#3515)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 8071dbb9c6)
2021-10-07 11:00:19 +09:00
Ahmed Hussein
2cdc6a245d HADOOP-17930. implement non-guava Precondition checkState (#3522)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit c36f9402dc)
2021-10-07 10:57:20 +09:00
Mehakmeet Singh
769059c2f5
HADOOP-17871. S3A CSE: minor tuning (#3412)
This migrates the fs.s3a-server-side encryption configuration options
to a name which covers client-side encryption too.

fs.s3a.server-side-encryption-algorithm becomes fs.s3a.encryption.algorithm
fs.s3a.server-side-encryption.key becomes fs.s3a.encryption.key

The existing keys remain valid, simply deprecated and remapped
to the new values. If you want server-side encryption options
to be picked up regardless of hadoop versions, use
the old keys.

(the old key also works for CSE, though as no version of Hadoop
with CSE support has shipped without this remapping, it's less
relevant)

Contributed by: Mehakmeet Singh

Change-Id: I51804b21b287dbce18864f0a6ad17126aba2b281
2021-10-05 11:39:25 +01:00
Mehakmeet Singh
aee975a136
HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK (#2706)
This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh

Change-Id: Ie1a27a036a39db66a67e9c6d33bc78d54ea708a0
2021-10-05 11:37:41 +01:00
Ahmed Hussein
31b44c519c
HADOOP-17929. implement non-guava Precondition checkArgument (#3473)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 0c498f21de)
2021-10-01 16:49:07 +08:00
litao
5ed4274f38
HADOOP-17938. Print lockWarningThreshold in InstrumentedLock#logWarni… (#3485)
Reviewed-by: Hui Fei <ferhui@apache.org>
(cherry picked from commit 211db3fe08)
2021-10-01 10:24:33 +08:00
Chao Sun
6931b70a00
HADOOP-17936. Fix test failure after reverting HADOOP-16878 from branch-3.3 (#3478) 2021-09-27 13:56:44 -07:00
Chao Sun
ff26a7700d Revert "HADOOP-16878. FileUtil.copy() to throw IOException if the source and destination are the same (#2383)"
This reverts commit 54c40cbf49.
2021-09-23 15:04:27 -07:00
Mehakmeet Singh
8e5620cd9e
HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446)
Addresses the problem of processes running out of memory when
there are many ABFS output streams queuing data to upload,
especially when the network upload bandwidth is less than the rate
data is generated.

ABFS Output streams now buffer their blocks of data to
"disk", "bytebuffer" or "array", as set in
"fs.azure.data.blocks.buffer"

When buffering via disk, the location for temporary storage
is set in "fs.azure.buffer.dir"

For safe scaling: use "disk" (default); for performance, when
confident that upload bandwidth will never be a bottleneck,
experiment with the memory options.

The number of blocks a single stream can have queued for uploading
is set in "fs.azure.block.upload.active.blocks".
The default value is 20.

Contributed by Mehakmeet Singh.
2021-09-22 11:19:16 +01:00
Neil
9700d98eac
HADOOP-17893. Improve PrometheusSink for Namenode TopMetrics (#3426)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit ae2c5ccfcf)
2021-09-21 10:44:51 +09:00
Rintaro Ikeda
92af6cd3bc HADOOP-17919. Fix command line example in Hadoop Cluster Setup documentation. (#3453)
(cherry picked from commit 607c20c612)
2021-09-17 13:34:07 +00:00
Steve Loughran
9188fa8cce
HADOOP-17126. implement non-guava Precondition checkNotNull
This adds a new class org.apache.hadoop.util.Preconditions which is

* @Private/@Unstable
* Intended to allow us to move off Google Guava
* Is designed to be trivially backportable
  (i.e contains no references to guava classes internally)

Please use this instead of the guava equivalents, where possible.

Contributed by: Ahmed Hussein

Change-Id: Ic392451bcfe7d446184b7c995734bcca8c07286e
2021-09-17 11:06:59 +01:00
Adam Binford
59a955dfa0
HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values (#3369)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4ced012f33)
2021-09-09 16:51:04 +09:00
Steve Loughran
a2242df10a
HADOOP-17894. CredentialProviderFactory.getProviders() recursion loading JCEKS file from S3A (#3393)
* CredentialProviderFactory to detect and report on recursion.
* S3AFS to remove incompatible providers.
* Integration Test for this.

Contributed by Steve Loughran.

Change-Id: Ia247b3c9fe8488ffdb7f57b40eb6e37c57e522ef
2021-09-08 17:00:20 +01:00
Masatake Iwasaki
76393e1359 HADOOP-17899. Avoid using implicit dependency on junit-jupiter-api. (#3399)
(cherry picked from commit ce7a5bfbd3)
2021-09-08 09:11:39 +00:00
Yellow Flash
09e8e5c5cb
HADOOP-17870. Http Filesystem to qualify relative paths. (#3338)
Contributed by Yellowflash

Change-Id: I217da06a1a2e5c0ca2b324f8e21baa0846f64858
2021-09-07 10:54:35 +01:00
Chris Nauroth
cc90b4f987 HADOOP-15129. Datanode caches namenode DNS lookup failure and cannot startup (#3348)
Co-authored-by:  Karthik Palaniappan

Change-Id: Id079a5319e5e83939d5dcce5fb9ebe3715ee864f
2021-09-03 18:48:07 +00:00