Commit Graph

5573 Commits

Author SHA1 Message Date
Steve Loughran
67eaf5aa9f
HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633)
Contributed by Steve Loughran

Change-Id: I596205d788f623114c12962941445432e2036c34
2021-11-29 16:20:55 +00:00
smarthan
bc40a41064 HADOOP-18023. Allow cp command to run with multi threads. (#3721)
(cherry picked from commit 932a78fe38)
2021-11-29 12:47:02 +00:00
Viraj Jasani
6094e1ec9a
HDFS-16171. De-flake testDecommissionStatus (#3280)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 6342d5e523)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
2021-11-25 14:15:10 +09:00
Istvan Fajth
48e95d8109 HADOOP-17975. Fallback to simple auth does not work for a secondary DistributedFileSystem instance. (#3579)
(cherry picked from commit ae3ba45db5)
2021-11-24 10:47:49 +00:00
smarthan
cbb3ba135c HADOOP-17998. Allow get command to run with multi threads. (#3645)
(cherry picked from commit 63018dc73f)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
2021-11-22 12:14:32 +00:00
Abhishek Das
f456dc1837 HADOOP-17999. No-op implementation of setWriteChecksum and setVerifyChecksum in ViewFileSystem. Contributed by Abhishek Das. (#3639)
(cherry picked from commit 54a1d78e16)
2021-11-16 22:40:24 -08:00
litao
026d5860cb
HDFS-16315. Add metrics related to Transfer and NativeCopy for DataNode (#3666) 2021-11-17 11:06:53 +09:00
Chao Sun
e079fa6577 Preparing for 3.3.3 development 2021-11-16 16:02:34 -08:00
litao
340dee4469
HDFS-16319. Add metrics doc for ReadLockLongHoldCount and WriteLockLongHoldCount (#3653). Contributed by tomscut.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-11-14 20:12:13 +05:30
litao
421013825f
HADOOP-18005. Correct log format for LdapGroupsMapping (#3647). Contributed by tomscut.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-11-14 20:01:09 +05:30
Steve Loughran
a68671eaf7
HADOOP-17928. Syncable: S3A to warn and downgrade (#3585)
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.

This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).

Contributed by Steve Loughran.

Change-Id: I0a02ec1e622343619f147f94158c18928a73a885
2021-11-04 14:41:42 +00:00
Mehakmeet Singh
bd077c3814
HADOOP-17953. S3A: Tests to lookup global or per-bucket configuration for encryption algorithm (#3525)
Followup to S3-CSE work of HADOOP-13887

Contributed by Mehakmeet Singh
2021-10-21 12:03:50 +01:00
Szilard Nemeth
6f45666d0b HADOOP-17857. Check real user ACLs in addition to proxied user ACLs. Contributed by Eric Payne
(cherry picked from commit 5428d36b56)
2021-10-19 20:40:30 +00:00
Steve Loughran
b8f3e54ff7 HADOOP-17945. JsonSerialization raises EOFException reading JSON data stored on google GCS (#3501)
Contributed By: Steve Loughran
2021-10-19 15:36:10 +05:30
Xing Lin
af920f138b HADOOP-16532. Fix TestViewFsTrash to use the correct homeDir. Contributed by Xing Lin. (#3514)
(cherry picked from commit 97c0f96879)
2021-10-13 14:58:08 -07:00
Masatake Iwasaki
9e2936f8d1
HADOOP-17424. Replace HTrace with No-Op tracer (#3520)
(cherry picked from commit 1a205cc3ad)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
2021-10-12 00:07:09 +09:00
Viraj Jasani
77ee5a4266
HADOOP-17950. Provide replacement for deprecated APIs of commons-io IOUtils (#3515)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 8071dbb9c6)
2021-10-07 11:00:19 +09:00
Ahmed Hussein
2cdc6a245d HADOOP-17930. implement non-guava Precondition checkState (#3522)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit c36f9402dc)
2021-10-07 10:57:20 +09:00
Viraj Jasani
fd3069d70c HADOOP-17947. Additional element types for VisibleForTesting (ADDENDUM) (#3521)
(cherry picked from commit 783e4805e7)
2021-10-06 02:18:54 +09:00
Mehakmeet Singh
769059c2f5
HADOOP-17871. S3A CSE: minor tuning (#3412)
This migrates the fs.s3a-server-side encryption configuration options
to a name which covers client-side encryption too.

fs.s3a.server-side-encryption-algorithm becomes fs.s3a.encryption.algorithm
fs.s3a.server-side-encryption.key becomes fs.s3a.encryption.key

The existing keys remain valid, simply deprecated and remapped
to the new values. If you want server-side encryption options
to be picked up regardless of hadoop versions, use
the old keys.

(the old key also works for CSE, though as no version of Hadoop
with CSE support has shipped without this remapping, it's less
relevant)

Contributed by: Mehakmeet Singh

Change-Id: I51804b21b287dbce18864f0a6ad17126aba2b281
2021-10-05 11:39:25 +01:00
Mehakmeet Singh
aee975a136
HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK (#2706)
This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh

Change-Id: Ie1a27a036a39db66a67e9c6d33bc78d54ea708a0
2021-10-05 11:37:41 +01:00
Viraj Jasani
da011baf85 HADOOP-17947. Provide alternative to Guava VisibleForTesting (#3505)
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit 5b1d594005)
2021-10-05 10:01:07 +09:00
Ahmed Hussein
31b44c519c
HADOOP-17929. implement non-guava Precondition checkArgument (#3473)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 0c498f21de)
2021-10-01 16:49:07 +08:00
litao
5ed4274f38
HADOOP-17938. Print lockWarningThreshold in InstrumentedLock#logWarni… (#3485)
Reviewed-by: Hui Fei <ferhui@apache.org>
(cherry picked from commit 211db3fe08)
2021-10-01 10:24:33 +08:00
Chao Sun
6931b70a00
HADOOP-17936. Fix test failure after reverting HADOOP-16878 from branch-3.3 (#3478) 2021-09-27 13:56:44 -07:00
Chao Sun
ff26a7700d Revert "HADOOP-16878. FileUtil.copy() to throw IOException if the source and destination are the same (#2383)"
This reverts commit 54c40cbf49.
2021-09-23 15:04:27 -07:00
Mehakmeet Singh
8e5620cd9e
HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446)
Addresses the problem of processes running out of memory when
there are many ABFS output streams queuing data to upload,
especially when the network upload bandwidth is less than the rate
data is generated.

ABFS Output streams now buffer their blocks of data to
"disk", "bytebuffer" or "array", as set in
"fs.azure.data.blocks.buffer"

When buffering via disk, the location for temporary storage
is set in "fs.azure.buffer.dir"

For safe scaling: use "disk" (default); for performance, when
confident that upload bandwidth will never be a bottleneck,
experiment with the memory options.

The number of blocks a single stream can have queued for uploading
is set in "fs.azure.block.upload.active.blocks".
The default value is 20.

Contributed by Mehakmeet Singh.
2021-09-22 11:19:16 +01:00
Neil
9700d98eac
HADOOP-17893. Improve PrometheusSink for Namenode TopMetrics (#3426)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit ae2c5ccfcf)
2021-09-21 10:44:51 +09:00
Rintaro Ikeda
92af6cd3bc HADOOP-17919. Fix command line example in Hadoop Cluster Setup documentation. (#3453)
(cherry picked from commit 607c20c612)
2021-09-17 13:34:07 +00:00
Steve Loughran
9188fa8cce
HADOOP-17126. implement non-guava Precondition checkNotNull
This adds a new class org.apache.hadoop.util.Preconditions which is

* @Private/@Unstable
* Intended to allow us to move off Google Guava
* Is designed to be trivially backportable
  (i.e contains no references to guava classes internally)

Please use this instead of the guava equivalents, where possible.

Contributed by: Ahmed Hussein

Change-Id: Ic392451bcfe7d446184b7c995734bcca8c07286e
2021-09-17 11:06:59 +01:00
Adam Binford
59a955dfa0
HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values (#3369)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4ced012f33)
2021-09-09 16:51:04 +09:00
Steve Loughran
a2242df10a
HADOOP-17894. CredentialProviderFactory.getProviders() recursion loading JCEKS file from S3A (#3393)
* CredentialProviderFactory to detect and report on recursion.
* S3AFS to remove incompatible providers.
* Integration Test for this.

Contributed by Steve Loughran.

Change-Id: Ia247b3c9fe8488ffdb7f57b40eb6e37c57e522ef
2021-09-08 17:00:20 +01:00
Masatake Iwasaki
76393e1359 HADOOP-17899. Avoid using implicit dependency on junit-jupiter-api. (#3399)
(cherry picked from commit ce7a5bfbd3)
2021-09-08 09:11:39 +00:00
Yellow Flash
09e8e5c5cb
HADOOP-17870. Http Filesystem to qualify relative paths. (#3338)
Contributed by Yellowflash

Change-Id: I217da06a1a2e5c0ca2b324f8e21baa0846f64858
2021-09-07 10:54:35 +01:00
Chris Nauroth
cc90b4f987 HADOOP-15129. Datanode caches namenode DNS lookup failure and cannot startup (#3348)
Co-authored-by:  Karthik Palaniappan

Change-Id: Id079a5319e5e83939d5dcce5fb9ebe3715ee864f
2021-09-03 18:48:07 +00:00
Viraj Jasani
7a4eaeb8bf
HADOOP-17874. ExceptionsHandler to add terse/suppressed Exceptions in thread-safe manner (#3343)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 99a157fa4a)
2021-09-03 10:27:06 +09:00
Uma Maheswara Rao G
580b6c400b
HDFS-16192: ViewDistributedFileSystem#rename wrongly using src in the place of dst. (#3353)
Co-authored-by: Uma Maheswara Rao G <umagangumalla@cloudera.com>
(cherry picked from commit 164608b546)
2021-08-31 12:27:43 +08:00
Dongjoon Hyun
8606b2cddd
HADOOP-17869. fs.s3a.connection.maximum should be bigger than fs.s3a.threads.max (#3337).
The value of `fs.s3a.connection.maximum` has been increased to 96

Contributed by Dongjoon Hyun

Change-Id: I9020a2bfd2a67fa7a2ec0598ed9d63e78ee99c73
2021-08-30 18:31:57 +01:00
jianghuazhu
7c663043b2
HDFS-16173.Improve CopyCommands#Put#executor queue configurability. (#3302)
Co-authored-by: zhujianghua <zhujianghua@zhujianghuadeMacBook-Pro.local>
Reviewed-by: Hui Fei <ferhui@apache.org>
Reviewed-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 4c94831364)
2021-08-27 12:06:26 +08:00
jianghuazhu
2b2f8f575b
HDFS-16175.Improve the configurable value of Server #PURGE_INTERVAL_NANOS. (#3307)
Co-authored-by: zhujianghua <zhujianghua@zhujianghuadeMacBook-Pro.local>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit ad54f5195c)
2021-08-25 17:35:50 +08:00
Viraj Jasani
fc6b1cafd4 HADOOP-17858. Avoid possible class loading deadlock with VerifierNone initialization (#3321)
(cherry picked from commit fc566ad9b0)
2021-08-24 22:44:11 +09:00
Szilard Nemeth
224b42108d YARN-10814. Fallback to RandomSecretProvider if the secret file is empty. Contributed by Tamas Domok 2021-08-24 14:16:15 +02:00
jianghuazhu
0a5f76b814
HDFS-16151. Improve the parameter comments related to ProtobufRpcEngine2#Server(). (#3256)
Co-authored-by: zhujianghua <zhujianghua@zhujianghuadeMacBook-Pro.local>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 0c7b951e03)
2021-08-08 14:57:08 +09:00
Bryan Beaudreault
2fda130260 HADOOP-17837: Add unresolved endpoint value to UnknownHostException (ADDENDUM) (#3276)
(cherry picked from commit b0b867e977)
2021-08-06 21:57:46 +05:30
Bryan Beaudreault
7659b62682
HADOOP-17837: Add unresolved endpoint value to UnknownHostException (#3272)
(cherry picked from commit 5e54d92e6e)
2021-08-06 17:32:01 +08:00
Viraj Jasani
b3077543cf HADOOP-17808. Avoid excessive logging for interruption (ADDENDUM) (#3267)
(cherry picked from commit 9fe1f24ec1)
2021-08-06 09:30:43 +08:00
Steve Loughran
c1ad91e72d
HADOOP-17822. fs.s3a.acl.default not working after S3A Audit feature (#3249)
Fixes the regression caused by HADOOP-17511 by moving where the
option  fs.s3a.acl.default is read -doing it before the RequestFactory
is created.

Adds

* A unit test in TestRequestFactory to verify the ACLs are set
  on all file write operations.
* A new ITestS3ACannedACLs test which verifies that ACLs really
  do get all the way through.
* S3A Assumed Role delegation tokens to include the IAM permission
  s3:PutObjectAcl in the generated role.

Contributed by Steve Loughran

Change-Id: I3abac6a1b9e150b6b6df0af7c2c70093f8f518cb
2021-08-02 15:33:34 +01:00
Steve Loughran
26514b6534 HADOOP-17628. Distcp contract test is really slow with ABFS and S3A; timing out. (#3240)
This patch cuts down the size of directory trees used for
distcp contract tests against object stores, so making
them much faster against distant/slow stores.

On abfs, the test only runs with -Dscale (as was the case for s3a already),
and has the larger scale test timeout.

After every test case, the FileSystem IOStatistics are logged,
to provide information about what IO is taking place and
what it's performance is.

There are some test cases which upload files of 1+ MiB; you can
increase the size of the upload in the option
"scale.test.distcp.file.size.kb" 
Set it to zero and the large file tests are skipped.

Contributed by Steve Loughran.
2021-08-02 12:58:37 +01:00
Petre Bogdan Stolojan
f2cec5cb88
HADOOP-17139 Re-enable optimized copyFromLocal implementation in S3AFileSystem (#3101)
This work
* Defines the behavior of FileSystem.copyFromLocal in filesystem.md
* Implements a high performance implementation of copyFromLocalOperation
  for S3
* Adds a contract test for the operation: AbstractContractCopyFromLocalTest
* Implements the contract tests for Local and S3A FileSystems

Contributed by: Bogdan Stolojan

Change-Id: I25d502102775c3626c4264e5a14c649879730050
2021-08-02 11:58:36 +01:00
hchaverr
6cc1426b63 HADOOP-17819. Add extensions to ProtobufRpcEngine RequestHeaderProto. Contributed by Hector Sandoval Chaverri. (#3242)
(cherry picked from commit 3c8a48e681)
2021-07-28 15:48:51 -07:00