Commit Graph

2156 Commits

Author SHA1 Message Date
Viraj Jasani
4ba463069b
HADOOP-18288. Total requests and total requests per sec served by RPC servers (#4485)
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-23 17:30:01 +08:00
Steve Loughran
aeb2a2f860
HADOOP-17833. Improve Magic Committer performance (#3289) (#4470)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.
2022-06-21 10:49:37 +01:00
slfan1989
91f19bf8fa
HADOOP-18244. Fix Hadoop-Common JavaDoc Error on branch-3.3 (#4327). Contributed by fanshilun. 2022-05-29 11:31:16 +05:30
hchaverr
1e043b937a
HADOOP-18222. Prevent DelegationTokenSecretManagerMetrics from registering multiple times
Fixes #4266

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-05-10 14:52:31 -07:00
Steve Loughran
75950e47e7
HADOOP-16202. Enhanced openFile(): hadoop-common changes. (#2584/1)
This defines standard option and values for the
openFile() builder API for opening a file:

fs.option.openfile.read.policy
 A list of the desired read policy, in preferred order.
 standard values are
 adaptive, default, random, sequential, vector, whole-file

fs.option.openfile.length
 How long the file is.

fs.option.openfile.split.start
 start of a task's split

fs.option.openfile.split.end
 end of a task's split

These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data

The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.

Contributed by Steve Loughran.

Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
2022-04-27 19:23:10 +01:00
Viraj Jasani
bb13e228bc
HADOOP-17956. Replace all default Charset usage with UTF-8 (#3529)
Change-Id: I0094a84619ce19acf340d8dd1040cfe9bd88184e
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-27 10:30:07 +01:00
hchaverri
4043ef0485 HADOOP-18167. Add metrics to track delegation token secret manager op… (#4092)
* HADOOP-18167. Add metrics to track delegation token secret manager operations
2022-04-26 14:28:25 -07:00
Masatake Iwasaki
160b6d106d
HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
2022-04-07 08:33:13 +09:00
Xing Lin
ecafd38c09
HADOOP-18144: getTrashRoot in ViewFileSystem should return a path in ViewFS. (#4123)
To get the new behavior, define fs.viewfs.trash.force-inside-mount-point to be true.

If the trash root for path p is in the same mount point as path p,
and one of:
* The mount point isn't at the top of the target fs.
* The resolved path of path is root (eg it is the fallback FS).
* The trash root isn't in user's target fs home directory.
get the corresponding viewFS path for the trash root and return it.
Otherwise, use <mnt>/.Trash/<user>.

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
(cherry picked from commit 8b8158f02d)

Co-authored-by: Xing Lin <xinglin@linkedin.com>
2022-03-31 20:26:09 +00:00
Owen O'Malley
e24bd1c15b HDFS-16517 Distance metric is wrong for non-DN machines in 2.10. Fixed in HADOOP-16161, but
this test case adds value to ensure the two getWeight methods stay in sync.

Fixes #4091

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2022-03-30 14:58:03 -07:00
zhongjingxiong
1ee93f7947
HADOOP-18145. Fileutil's unzip method causes unzipped files to lose their original permissions (#4036)
Contributed by jingxiong zhong

Change-Id: Ie252e609798719dc658364f9bae48b34dc72a79c
2022-03-30 12:52:52 +01:00
Steve Loughran
105e0dbd92
HADOOP-13704. Optimized S3A getContentSummary()
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.

Contributed by Ahmar Suhail.

Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
2022-03-22 13:31:13 +00:00
Abhishek Das
c3a4ce8ee8 HADOOP-18129: Change URI to String in INodeLink to reduce memory footprint of ViewFileSystem
Fixes #3996
2022-03-17 17:27:26 -07:00
Steve Loughran
e06ed88012
HADOOP-18162. hadoop-common support for MAPREDUCE-7341 Manifest Committer
* New statistic names in StoreStatisticNames
  (for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
  added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist

+ tests.

This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.

Contributed by Steve Loughran

Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654
2022-03-17 11:46:11 +00:00
litao
0029f22d7d HADOOP-18003. Add a method appendIfAbsent for CallerContext (#3644)
Cherry-picked from 573b358f by Owen O'Malley
2022-03-14 10:29:23 -07:00
Fei Hui
5a38ed2f22 HADOOP-17276. Extend CallerContext to make it include many items (#2327)
Cherry-picked from d0d10f7e by Owen O'Malley
2022-03-14 10:28:38 -07:00
Wei-Chiu Chuang
743db6e7b4
HADOOP-18155. Refactor tests in TestFileUtil (#4063)
(cherry picked from commit d0fa9b5775)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileUtil.java
	hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/TestFileUtil.java

Co-authored-by: Gautham B A <gautham.bangalore@gmail.com>
2022-03-14 09:40:17 +09:00
Mukund Thakur
e0619b702a HADOOP-18112: Implement paging during multi object delete. (#4045)
Multi object delete of size more than 1000 is not supported by S3 and 
fails with MalformedXML error. So implementing paging of requests to 
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size" 

 Contributed By: Mukund Thakur
2022-03-11 13:16:51 +05:30
Steve Loughran
94a0a04113
HADOOP-18136. Verify FileUtils.unTar() handling of missing .tar files.
Contributed by Steve Loughran

Change-Id: I3856afa821dbc8c2e3cb1cbe33793ec1734e2e24
2022-02-21 17:09:36 +00:00
Steve Loughran
088684ec60
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references (#3930)
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.

Construct it with a factory method and optional callback
for notification on loss and regeneration.

 WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
      new WeakReferenceThreadMap<>(
          (k) -> getUnbondedSpan(),
          this::noteSpanReferenceLost);

This is used in ActiveAuditManagerS3A for span tracking.

Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.

Contributed by Steve Loughran.

Change-Id: Ibf7bb082fd47298f7ebf46d92f56e80ca9b2aaf8
2022-02-10 12:33:40 +00:00
Xing Lin
d613776b64
HADOOP-18093. Better exception handling for testFileStatusOnMountLink() in ViewFsBaseTest.java (#3918). Contributed by Xing Lin. (#3929)
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit 0d17b629ff)
2022-01-26 21:55:32 +05:30
Viraj Jasani
5e9e779ed2
HADOOP-17152. Provide Hadoop's own Lists utility to reduce dependency on Guava (#3061)
Change-Id: I52e55b9d9826ad661e9ad7dc15f007aa168f0fe1
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2022-01-18 11:57:25 +00:00
Ashutosh Gupta
6535a183b2 HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)
Co-authored-by: xuzq <xuzengqiang@kuaishou.com>
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit caab29ec88)
2021-12-28 22:00:48 +09:00
Akira Ajisaka
cd30687a15 Revert "HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)"
This reverts commit 05b43f2057.
2021-12-28 21:51:40 +09:00
Ashutosh Gupta
05b43f2057 HDFS-14099. Unknown frame descriptor when decompressing multiple frames (#3836)
Co-authored-by: xuzq <xuzengqiang@kuaishou.com>
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit caab29ec88)
2021-12-28 21:49:06 +09:00
Dhananjay Badaya
e7b1f87665 HADOOP-13500. Synchronizing iteration of Configuration properties object (#3775)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4483607a4e)
2021-12-17 16:07:46 +09:00
Steve Loughran
67eaf5aa9f
HADOOP-17979. Add Interface EtagSource to allow FileStatus subclasses to provide etags (#3633)
Contributed by Steve Loughran

Change-Id: I596205d788f623114c12962941445432e2036c34
2021-11-29 16:20:55 +00:00
smarthan
bc40a41064 HADOOP-18023. Allow cp command to run with multi threads. (#3721)
(cherry picked from commit 932a78fe38)
2021-11-29 12:47:02 +00:00
Viraj Jasani
6094e1ec9a
HDFS-16171. De-flake testDecommissionStatus (#3280)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 6342d5e523)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDecommissioningStatus.java
2021-11-25 14:15:10 +09:00
Istvan Fajth
48e95d8109 HADOOP-17975. Fallback to simple auth does not work for a secondary DistributedFileSystem instance. (#3579)
(cherry picked from commit ae3ba45db5)
2021-11-24 10:47:49 +00:00
smarthan
cbb3ba135c HADOOP-17998. Allow get command to run with multi threads. (#3645)
(cherry picked from commit 63018dc73f)

 Conflicts:
	hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java
2021-11-22 12:14:32 +00:00
Abhishek Das
f456dc1837 HADOOP-17999. No-op implementation of setWriteChecksum and setVerifyChecksum in ViewFileSystem. Contributed by Abhishek Das. (#3639)
(cherry picked from commit 54a1d78e16)
2021-11-16 22:40:24 -08:00
Mehakmeet Singh
bd077c3814
HADOOP-17953. S3A: Tests to lookup global or per-bucket configuration for encryption algorithm (#3525)
Followup to S3-CSE work of HADOOP-13887

Contributed by Mehakmeet Singh
2021-10-21 12:03:50 +01:00
Szilard Nemeth
6f45666d0b HADOOP-17857. Check real user ACLs in addition to proxied user ACLs. Contributed by Eric Payne
(cherry picked from commit 5428d36b56)
2021-10-19 20:40:30 +00:00
Steve Loughran
b8f3e54ff7 HADOOP-17945. JsonSerialization raises EOFException reading JSON data stored on google GCS (#3501)
Contributed By: Steve Loughran
2021-10-19 15:36:10 +05:30
Xing Lin
af920f138b HADOOP-16532. Fix TestViewFsTrash to use the correct homeDir. Contributed by Xing Lin. (#3514)
(cherry picked from commit 97c0f96879)
2021-10-13 14:58:08 -07:00
Masatake Iwasaki
9e2936f8d1
HADOOP-17424. Replace HTrace with No-Op tracer (#3520)
(cherry picked from commit 1a205cc3ad)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
	hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/tracing/TestTracing.java

Co-authored-by: Siyao Meng <50227127+smengcl@users.noreply.github.com>
2021-10-12 00:07:09 +09:00
Viraj Jasani
77ee5a4266
HADOOP-17950. Provide replacement for deprecated APIs of commons-io IOUtils (#3515)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 8071dbb9c6)
2021-10-07 11:00:19 +09:00
Ahmed Hussein
2cdc6a245d HADOOP-17930. implement non-guava Precondition checkState (#3522)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit c36f9402dc)
2021-10-07 10:57:20 +09:00
Mehakmeet Singh
aee975a136
HADOOP-13887. Support S3 client side encryption (S3-CSE) using AWS-SDK (#2706)
This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.

Read the documentation in encryption.md very, very carefully before
use and consider it unstable.

S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":

fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>

You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.

* Filesystem list/get status operations subtract 16 bytes from the length
  of all files >= 16 bytes long to compensate for the padding which CSE
  adds.
* The SDK always warns about the specific algorithm chosen being
  deprecated. It is critical to use this algorithm for ranged
  GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
  The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
  written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.

Contributed by Mehakmeet Singh

Change-Id: Ie1a27a036a39db66a67e9c6d33bc78d54ea708a0
2021-10-05 11:37:41 +01:00
Ahmed Hussein
31b44c519c
HADOOP-17929. implement non-guava Precondition checkArgument (#3473)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
(cherry picked from commit 0c498f21de)
2021-10-01 16:49:07 +08:00
Chao Sun
6931b70a00
HADOOP-17936. Fix test failure after reverting HADOOP-16878 from branch-3.3 (#3478) 2021-09-27 13:56:44 -07:00
Chao Sun
ff26a7700d Revert "HADOOP-16878. FileUtil.copy() to throw IOException if the source and destination are the same (#2383)"
This reverts commit 54c40cbf49.
2021-09-23 15:04:27 -07:00
Mehakmeet Singh
8e5620cd9e
HADOOP-17195. ABFS: OutOfMemory error while uploading huge files (#3446)
Addresses the problem of processes running out of memory when
there are many ABFS output streams queuing data to upload,
especially when the network upload bandwidth is less than the rate
data is generated.

ABFS Output streams now buffer their blocks of data to
"disk", "bytebuffer" or "array", as set in
"fs.azure.data.blocks.buffer"

When buffering via disk, the location for temporary storage
is set in "fs.azure.buffer.dir"

For safe scaling: use "disk" (default); for performance, when
confident that upload bandwidth will never be a bottleneck,
experiment with the memory options.

The number of blocks a single stream can have queued for uploading
is set in "fs.azure.block.upload.active.blocks".
The default value is 20.

Contributed by Mehakmeet Singh.
2021-09-22 11:19:16 +01:00
Neil
9700d98eac
HADOOP-17893. Improve PrometheusSink for Namenode TopMetrics (#3426)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit ae2c5ccfcf)
2021-09-21 10:44:51 +09:00
Steve Loughran
9188fa8cce
HADOOP-17126. implement non-guava Precondition checkNotNull
This adds a new class org.apache.hadoop.util.Preconditions which is

* @Private/@Unstable
* Intended to allow us to move off Google Guava
* Is designed to be trivially backportable
  (i.e contains no references to guava classes internally)

Please use this instead of the guava equivalents, where possible.

Contributed by: Ahmed Hussein

Change-Id: Ic392451bcfe7d446184b7c995734bcca8c07286e
2021-09-17 11:06:59 +01:00
Adam Binford
59a955dfa0
HADOOP-17804. Expose prometheus metrics only after a flush and dedupe with tag values (#3369)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 4ced012f33)
2021-09-09 16:51:04 +09:00
Masatake Iwasaki
76393e1359 HADOOP-17899. Avoid using implicit dependency on junit-jupiter-api. (#3399)
(cherry picked from commit ce7a5bfbd3)
2021-09-08 09:11:39 +00:00
Yellow Flash
09e8e5c5cb
HADOOP-17870. Http Filesystem to qualify relative paths. (#3338)
Contributed by Yellowflash

Change-Id: I217da06a1a2e5c0ca2b324f8e21baa0846f64858
2021-09-07 10:54:35 +01:00
Chris Nauroth
cc90b4f987 HADOOP-15129. Datanode caches namenode DNS lookup failure and cannot startup (#3348)
Co-authored-by:  Karthik Palaniappan

Change-Id: Id079a5319e5e83939d5dcce5fb9ebe3715ee864f
2021-09-03 18:48:07 +00:00