Commit Graph

27377 Commits

Author SHA1 Message Date
Steve Loughran
c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
zhihui wang
12e0ca6b24
HDFS-17522. JournalNode web interfaces lack configs for X-FRAME-OPTIONS protection (#6814). Contributed by wangzhihui.
Signed-off-by: Vinayakumar B <vinayakumarb@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-13 22:10:08 +08:00
Benjamin Teke
ce7d01fac8
YARN-11689. Update the cgroup v2 init error handling (#6810) 2024-05-13 12:56:26 +02:00
dependabot[bot]
b5a90d9500
Bump org.apache.derby:derby in /hadoop-project (#6816)
Bumps org.apache.derby:derby from 10.14.2.0 to 10.17.1.0.

---
updated-dependencies:
- dependency-name: org.apache.derby:derby
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-13 12:47:31 +08:00
dependabot[bot]
1d09a64e34
Bump org.bouncycastle:bcprov-jdk18on in /hadoop-project (#6811)
Bumps [org.bouncycastle:bcprov-jdk18on](https://github.com/bcgit/bc-java) from 1.77 to 1.78.
- [Changelog](https://github.com/bcgit/bc-java/blob/main/docs/releasenotes.html)
- [Commits](https://github.com/bcgit/bc-java/commits)

---
updated-dependencies:
- dependency-name: org.bouncycastle:bcprov-jdk18on
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-12 18:38:36 +05:30
Felix Nguyen
fb0519253d
HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759) 2024-05-11 15:37:43 +08:00
Zilong Zhu
700b3e4800
HDFS-17503. Unreleased volume references because of OOM. (#6782) 2024-05-10 10:34:40 +08:00
Sammi Chen
43e8ca428e Revert "HADOOP-18851: Performance improvement for DelegationTokenSecretManager. (#6001). Contributed by Vikas Kumar."
This reverts commit e283375cdf.
2024-05-07 13:29:32 +08:00
kulkabhay
edf985e269
HDFS-17500: Add missing operation name while authorizing some operations (#6776). Contributed by kulkabhay.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-06 12:44:30 +08:00
Doroszlai, Attila
2645898450
HADOOP-19160. hadoop-auth should not depend on kerb-simplekdc (#6788) 2024-05-03 12:57:26 +02:00
dannytbecker
881034ad45
CachedRecordStore should check if the record state is expired (#6783) 2024-05-01 13:56:53 -07:00
Viraj Jasani
a8a58944bd
HADOOP-19146. S3A: noaa-cors-pds test bucket access with global endpoint fails (#6723)
HADOOP-19057 switched the hadoop-aws test bucket from landsat-pds to 
noaa-cors-pds 

This new bucket isn't accessible if the client configuration
sets an fs.s3a.endpoint/region value other than us-east-1.

Contributed by Viraj Jasani
2024-04-30 12:16:36 +01:00
Peter Szucs
910cb6b887
YARN-11685. Create a config to enable/disable cgroup v2 functionality (#6770) 2024-04-30 11:25:16 +02:00
fuchaohong
0c9e0b4398
HDFS-17456. Fix the incorrect dfsused statistics of datanode when appending a file. (#6713). Contributed by fuchaohong.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:22:53 +08:00
fuchaohong
ddb805951e
HDFS-17471. Correct the percentage of sample range. (#6742). Contributed by fuchaohong.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:18:47 +08:00
Tsz-Wo Nicholas Sze
78987a71a6
HADOOP-19151. Support configurable SASL mechanism. (#6740) 2024-04-29 10:02:23 -07:00
Anuj Modi
a6f2c4617e
HADOOP-19150: [ABFS] Fixing Test Code for ITestAbfsRestOperationException#testAuthFailException (#6756)
Contributed by: Anuj Modi
2024-04-29 11:48:34 -05:00
Xi Chen
aa169e1093
HADOOP-19159. S3A. Fix documentation of fs.s3a.committer.abort.pending.uploads (#6778)
The description of `fs.s3a.committer.abort.pending.uploads` in the section `Concurrent Jobs writing to the same destination` is not correct. Its default value is `true`.

Contributed by Xi Chen
2024-04-29 15:49:35 +01:00
Peter Szucs
08419c4233
YARN-11675. Update MemoryResourceHandler implementation for cgroup v2 support (#6760) 2024-04-29 16:26:18 +02:00
zhtttylz
daafc8a0b8
HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735) Contributed by Hualong Zhang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:36:11 +08:00
slfan1989
88ad7db80d
HADOOP-19071. Update maven-surefire-plugin from 3.0.0 to 3.2.5. (#6664) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:30:21 +08:00
dannytbecker
027b4c3259
Remove empty queues from the queueByBlockId map (#6772) 2024-04-26 14:25:15 -07:00
Benjamin Teke
399299104c
YARN-11674. Add CPUResourceHandler for cgroup v2. (#6751) 2024-04-26 15:00:00 +02:00
Benjamin Teke
579b3bcea9
YARN-11690. Update container executor to use CGROUP2_SUPER_MAGIC in cgroup 2 scenarios (#6771) 2024-04-26 13:21:29 +02:00
Tamas Domok
ecf665c6fa
YARN-11191. Fix potentional deadlock in GlobalScheduler refreshQueues (#6732) 2024-04-24 14:58:50 +02:00
Benjamin Teke
5d0a40c143
YARN-11672. Create a CgroupHandler implementation for cgroup v2 (#6734) 2024-04-24 11:33:50 +02:00
cxzl25
23286b0632
HDFS-17469. Audit log for reportBadBlocks RPC (#6731) 2024-04-24 09:39:57 +08:00
Jian Zhang
782c501441
HDFS-17451. RBF: fix spotbugs for redundant nullcheck of dns. (#6697) 2024-04-23 19:11:51 +08:00
Pranav Saxena
6404692c09
HADOOP-19102. [ABFS] FooterReadBufferSize should not be greater than readBufferSize (#6617)
Contributed by  Pranav Saxena
2024-04-22 18:36:12 +01:00
Ayush Saxena
eec9cd2997
HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
Tamas Domok
a386ac1f56
YARN-11684. Fix general contract violation in PriorityQueueComparator. (#6725) Contributed by Tamas Domok.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-19 08:37:05 +08:00
Madhan Neethiraj
e8b2c28dec
HDFS-17478. FSPermissionChecker optimization by initializing AccessControlEnforcer in constructor (#6749) 2024-04-18 15:43:31 -07:00
dannytbecker
0c35cf0982
HDFS-17477. IncrementalBlockReport race condition additional edge cases (#6748) 2024-04-18 09:04:08 -07:00
zj619
922c44a339
HADOOP-19130. FTPFileSystem rename with full qualified path broken (#6678). Contributed by shawn
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-17 23:12:38 +05:30
章锡平
87cc2f1a1f
HDFS-17465. RBF: Use ProportionRouterRpcFairnessPolicyController get “'ava.Lang. Error: Maximum permit count exceeded' (#6727) 2024-04-15 09:28:05 -07:00
Lei313
f49a4df797
HDFS-17383:Datanode current block token should come from active NameNode in HA mode (#6562). Contributed by lei w.
Reviewed-by: Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-04-15 18:35:53 +08:00
Anuj Modi
bd1a08b2cf
HADOOP-19129: [ABFS] Test Fixes and Test Script Bug Fixes (#6676)
Contributed by Anuj Modi
2024-04-12 17:52:47 +01:00
huhaiyang
6ccb223c9c
HDFS-17461. Fix spotbugs in PeerCache#getInternal (#6721). Contributed by Haiyang Hu.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-12 19:53:25 +08:00
PJ Fanning
d194ad0242
HADOOP-19079. HttpExceptionUtils to verify that loaded class is really an exception before instantiation (#6557)
Security hardening

+ Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings
  can all be found in the toString() value of a raised exception

Contributed by PJ Fanning
2024-04-11 19:38:15 +01:00
huhaiyang
81b05977f2
HDFS-17455. Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt (#6710). Contributed by Haiyang Hu.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-11 18:04:57 +08:00
dannytbecker
05964ad07a
HDFS-17453. IncrementalBlockReport can have race condition with Edit Log Tailer (#6708) 2024-04-10 09:30:24 -07:00
Anuj Modi
dbe2d61258
HADOOP-19096. [ABFS] [CST Optimization] Enhance Client-Side Throttling Metrics Logic (#6276)
ABFS has a client-side throttling mechanism which works on the metrics collected
from past requests

When requests are fail due to server-side throttling it updates its
metrics and recalculates any client side backoff.

The choice of which requests should be used to compute client side
backoff interval is based on the http status code:

- Status code in 2xx range: Successful Operations should contribute.
- Status code in 3xx range: Redirection Operations should not contribute.
- Status code in 4xx range: User Errors should not contribute.
- Status code is 503: Throttling Error should contribute only if they
  are due to client limits breach as follows:
  * 503, Ingress Over Account Limit: Should Contribute
  * 503, Egress Over Account Limit: Should Contribute
  * 503, TPS Over Account Limit: Should Contribute
  * 503, Other Server Throttling: Should not Contribute.
- Status code in 5xx range other than 503: Should not Contribute.
- IOException and UnknownHostExceptions: Should not Contribute.

Contributed by Anuj Modi
2024-04-10 14:46:23 +01:00
Cheng Pan
281e2d288d
Revert "HADOOP-16822. Provide source artifacts for hadoop-client-api. Contributed by Karel Kolman." (#6458)
This reverts commit 2c4ab72a60.

Justification: this was making debugging through IDEs worse, rather than better.
2024-04-10 12:03:59 +01:00
Gautham B A
f7bb4f1595
HADOOP-18135. Produce Windows binaries of Hadoop (#6673)
This PR enables one to create the Hadoop
release tarball on Windows, complete with
the native binaries (including winutils.exe).
This PR contains the following changes -

* Prevents splitting during array element
  expansion - this is needed since we need
  to pass the arguments correctly to maven.
* Install Python 3.11.8 and pip to the
  Windows docker image for building
  Hadoop.
* pom file changes to get maven to invoke
  the releasedocmaker script through
  bash.exe on Windows.
2024-04-09 22:15:05 +05:30
Yang Jiandan
3f8af73913
YARN-11670. Add CallerContext in NodeManager (#6688) 2024-04-08 22:50:41 -04:00
slfan1989
8c378d1ea1
YARN-11444. Improve YARN md documentation format. (#6711) Contributed by Shilun Fan.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-07 20:50:46 +08:00
ConfX
73e6931ed0
HDFS-17449. Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error (#6691). Contributed by ConfX
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-06 13:38:09 +05:30
slfan1989
a1ae35e691
HADOOP-19135. Remove Jcache 1.0-alpha. (#6695) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-05 22:09:15 +08:00
Anuj Modi
6ed73896f6
HADOOP-18656. [ABFS] Add Support for Paginated Delete for Large Directories in HNS Account (#6409)
Contributed by Anuj Modi
2024-04-04 19:48:25 +01:00
Dongjoon Hyun
d7157b4aa9
HADOOP-19141. Vector IO: Update default values consistently (#6702)
Contributed by Dongjoon Hyun
2024-04-04 10:56:40 +01:00