Commit Graph

27349 Commits

Author SHA1 Message Date
Felix Nguyen
f5c5d35eb0
HDFS-17529. RBF: Improve router state store cache entry deletion (#6833) 2024-05-24 09:41:08 +08:00
Anmol Asrani
d168d3ffee
HADOOP-18325: ABFS: Add correlated metric support for ABFS operations (#6314)
Adds support for metric collection at the filesystem instance level.
Metrics are pushed to the store upon the closure of a filesystem instance, encompassing all operations
that utilized that specific instance.

Collected Metrics:

- Number of successful requests without any retries.
- Count of requests that succeeded after a specified number of retries (x retries).
- Request count subjected to throttling.
- Number of requests that failed despite exhausting all retry attempts. etc.
Implementation Details:

Incorporated logic in the AbfsClient to facilitate metric pushing through an additional request.
This occurs in scenarios where no requests are sent to the backend for a defined idle period.
By implementing these enhancements, we ensure comprehensive monitoring and analysis of filesystem interactions, enabling a deeper understanding of success rates, retry scenarios, throttling instances, and exhaustive failure scenarios. Additionally, the AbfsClient logic ensures that metrics are proactively pushed even during idle periods, maintaining a continuous and accurate representation of filesystem performance.

Contributed by Anmol Asrani
2024-05-23 15:10:10 +01:00
Benjamin Teke
d876505b67
YARN-11681. Update the cgroup documentation with v2 support (#6834)
Co-authored-by: Benjamin Teke <bteke@cloudera.com>
Co-authored-by: K0K0V0K <109747532+K0K0V0K@users.noreply.github.com>
2024-05-21 17:41:32 +02:00
hfutatzhanghb
fb156e8f05
HDFS-17464. Improve some logs output in class FsDatasetImpl (#6724) 2024-05-21 09:46:21 +08:00
slfan1989
be28467374
Revert "Bump org.apache.derby:derby in /hadoop-project (#6816)" (#6841)
This reverts commit b5a90d9500.
2024-05-21 08:46:14 +08:00
Sebb
f11a8cfa6e
HADOOP-13147. Constructors must not call overrideable methods in PureJavaCrc32C (#6408). Contributed by Sebb. 2024-05-21 00:08:08 +05:30
Mukund Thakur
47be1ab3b6
HADOOP-18679. Add API for bulk/paged delete of files (#6726)
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
2024-05-20 17:05:25 +01:00
Kaiyao Ke
41eacf4914
MAPREDUCE-7475. Fix non-idempotent unit tests (#6785)
Contributed by Kaiyao Ke
2024-05-17 14:51:47 +01:00
LiuGuH
8f92cda35c
HDFS-17509. RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. (#6784) 2024-05-17 10:37:50 +08:00
skyskyhu
3c00093cb5
HADOOP-19167 Bug Fix: Change of Codec configuration does not work (#6807) 2024-05-17 10:27:39 +08:00
Vikas Kumar
f8dce6c501
HADOOP-18851. Performance improvement for DelegationTokenSecretManager (#6803) 2024-05-16 12:30:52 +08:00
Mukund Thakur
a97e3022de
HADOOP-19013. Adding x-amz-server-side-encryption-aws-kms-key-id in the get file attributes for S3A. (#6646)
Contributed by: Mukund Thakur
2024-05-15 11:54:54 -05:00
Peter Szucs
129d91f7b2
YARN-11692. Support mixed cgroup v1/v2 controller structure (#6821) 2024-05-15 16:32:49 +02:00
Steve Loughran
cfdf1f5e8e
HADOOP-19172. S3A: upgrade AWS v1 sdk to 1.12.720 (#6823)
+remove reference in LICENSE-binary as it is no longer shipped

Contributed by Steve Loughran
2024-05-15 14:40:39 +01:00
xuzifu666
cf9559eb27
HADOOP-19073 WASB: Fix connection leak in FolderRenamePending (#6534)
Contributed by xuyu
2024-05-15 14:38:06 +01:00
ZanderXu
cab0f4c9ec
HDFS-17520. [BugFix] TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed (#6812) Contributed by Zengqiang Xu.
Reviewed-by: Vinayakumar B <vinayakumarb@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-05-15 07:55:24 +08:00
Christopher Tubbs
2e77b7b02c
[HADOOP-18786] Use CDN instead of ASF archive (#5789)
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile

Contributed by Christopher Tubbs
2024-05-14 20:09:52 +01:00
zhihui wang
39dee8ea19
HADOOP-18958. Improve UserGroupInformation debug log. (#6255)
Contributed by zhihui wang
2024-05-14 20:03:49 +01:00
Tsz-Wo Nicholas Sze
bda7045070
HADOOP-19152. Do not hard code security providers. (#6739) 2024-05-14 11:19:57 -07:00
Simbarashe Dzinamarira
6a4f0be854
HDFS-17514: RBF: Routers should unset cached stateID when namenode does not set stateID in RPC response header. (#6804) 2024-05-14 08:09:56 -07:00
ConfX
8d9d58dfc8
HDFS-17099. Fix Null Pointer Exception when stop namesystem in HDFS.(#6034). Contributed by ConfX.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-14 11:14:55 +08:00
zhengchenyu
4cb4d5dd08
HADOOP-19170. Fixes compilation issues on non-Linux systems (#6822)
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
2024-05-13 20:04:01 -07:00
Steve Loughran
c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
zhihui wang
12e0ca6b24
HDFS-17522. JournalNode web interfaces lack configs for X-FRAME-OPTIONS protection (#6814). Contributed by wangzhihui.
Signed-off-by: Vinayakumar B <vinayakumarb@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-13 22:10:08 +08:00
Benjamin Teke
ce7d01fac8
YARN-11689. Update the cgroup v2 init error handling (#6810) 2024-05-13 12:56:26 +02:00
dependabot[bot]
b5a90d9500
Bump org.apache.derby:derby in /hadoop-project (#6816)
Bumps org.apache.derby:derby from 10.14.2.0 to 10.17.1.0.

---
updated-dependencies:
- dependency-name: org.apache.derby:derby
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-13 12:47:31 +08:00
dependabot[bot]
1d09a64e34
Bump org.bouncycastle:bcprov-jdk18on in /hadoop-project (#6811)
Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.77 to 1.78.
- [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html)
- [Commits](https://github.com/bcgit/bc-java/commits)

---
updated-dependencies:
- dependency-name: org.bouncycastle:bcprov-jdk18on
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-12 18:38:36 +05:30
Felix Nguyen
fb0519253d
HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759) 2024-05-11 15:37:43 +08:00
Zilong Zhu
700b3e4800
HDFS-17503. Unreleased volume references because of OOM. (#6782) 2024-05-10 10:34:40 +08:00
Sammi Chen
43e8ca428e Revert "HADOOP-18851: Performance improvement for DelegationTokenSecretManager. (#6001). Contributed by Vikas Kumar."
This reverts commit e283375cdf.
2024-05-07 13:29:32 +08:00
kulkabhay
edf985e269
HDFS-17500: Add missing operation name while authorizing some operations (#6776). Contributed by kulkabhay.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-06 12:44:30 +08:00
Doroszlai, Attila
2645898450
HADOOP-19160. hadoop-auth should not depend on kerb-simplekdc (#6788) 2024-05-03 12:57:26 +02:00
dannytbecker
881034ad45
CachedRecordStore should check if the record state is expired (#6783) 2024-05-01 13:56:53 -07:00
Viraj Jasani
a8a58944bd
HADOOP-19146. S3A: noaa-cors-pds test bucket access with global endpoint fails (#6723)
HADOOP-19057 switched the hadoop-aws test bucket from landsat-pds to 
noaa-cors-pds 

This new bucket isn't accessible if the client configuration
sets an fs.s3a.endpoint/region value other than us-east-1.

Contributed by Viraj Jasani
2024-04-30 12:16:36 +01:00
Peter Szucs
910cb6b887
YARN-11685. Create a config to enable/disable cgroup v2 functionality (#6770) 2024-04-30 11:25:16 +02:00
fuchaohong
0c9e0b4398
HDFS-17456. Fix the incorrect dfsused statistics of datanode when appending a file. (#6713). Contributed by fuchaohong.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:22:53 +08:00
fuchaohong
ddb805951e
HDFS-17471. Correct the percentage of sample range. (#6742). Contributed by fuchaohong.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:18:47 +08:00
Tsz-Wo Nicholas Sze
78987a71a6
HADOOP-19151. Support configurable SASL mechanism. (#6740) 2024-04-29 10:02:23 -07:00
Anuj Modi
a6f2c4617e
HADOOP-19150: [ABFS] Fixing Test Code for ITestAbfsRestOperationException#testAuthFailException (#6756)
Contributed by: Anuj Modi
2024-04-29 11:48:34 -05:00
Xi Chen
aa169e1093
HADOOP-19159. S3A. Fix documentation of fs.s3a.committer.abort.pending.uploads (#6778)
The description of `fs.s3a.committer.abort.pending.uploads` in the section `Concurrent Jobs writing to the same destination` is not correct. Its default value is `true`.

Contributed by Xi Chen
2024-04-29 15:49:35 +01:00
Peter Szucs
08419c4233
YARN-11675. Update MemoryResourceHandler implementation for cgroup v2 support (#6760) 2024-04-29 16:26:18 +02:00
zhtttylz
daafc8a0b8
HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735) Contributed by Hualong Zhang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:36:11 +08:00
slfan1989
88ad7db80d
HADOOP-19071. Update maven-surefire-plugin from 3.0.0 to 3.2.5. (#6664) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:30:21 +08:00
dannytbecker
027b4c3259
Remove empty queues from the queueByBlockId map (#6772) 2024-04-26 14:25:15 -07:00
Benjamin Teke
399299104c
YARN-11674. Add CPUResourceHandler for cgroup v2. (#6751) 2024-04-26 15:00:00 +02:00
Benjamin Teke
579b3bcea9
YARN-11690. Update container executor to use CGROUP2_SUPER_MAGIC in cgroup 2 scenarios (#6771) 2024-04-26 13:21:29 +02:00
Tamas Domok
ecf665c6fa
YARN-11191. Fix potentional deadlock in GlobalScheduler refreshQueues (#6732) 2024-04-24 14:58:50 +02:00
Benjamin Teke
5d0a40c143
YARN-11672. Create a CgroupHandler implementation for cgroup v2 (#6734) 2024-04-24 11:33:50 +02:00
cxzl25
23286b0632
HDFS-17469. Audit log for reportBadBlocks RPC (#6731) 2024-04-24 09:39:57 +08:00
Jian Zhang
782c501441
HDFS-17451. RBF: fix spotbugs for redundant nullcheck of dns. (#6697) 2024-04-23 19:11:51 +08:00