Commit Graph

27394 Commits

Author SHA1 Message Date
zj619
922c44a339
HADOOP-19130. FTPFileSystem rename with full qualified path broken (#6678). Contributed by shawn
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-17 23:12:38 +05:30
章锡平
87cc2f1a1f
HDFS-17465. RBF: Use ProportionRouterRpcFairnessPolicyController get “'ava.Lang. Error: Maximum permit count exceeded' (#6727) 2024-04-15 09:28:05 -07:00
Lei313
f49a4df797
HDFS-17383:Datanode current block token should come from active NameNode in HA mode (#6562). Contributed by lei w.
Reviewed-by: Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-04-15 18:35:53 +08:00
Anuj Modi
bd1a08b2cf
HADOOP-19129: [ABFS] Test Fixes and Test Script Bug Fixes (#6676)
Contributed by Anuj Modi
2024-04-12 17:52:47 +01:00
huhaiyang
6ccb223c9c
HDFS-17461. Fix spotbugs in PeerCache#getInternal (#6721). Contributed by Haiyang Hu.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-12 19:53:25 +08:00
PJ Fanning
d194ad0242
HADOOP-19079. HttpExceptionUtils to verify that loaded class is really an exception before instantiation (#6557)
Security hardening

+ Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings
  can all be found in the toString() value of a raised exception

Contributed by PJ Fanning
2024-04-11 19:38:15 +01:00
huhaiyang
81b05977f2
HDFS-17455. Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt (#6710). Contributed by Haiyang Hu.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-11 18:04:57 +08:00
dannytbecker
05964ad07a
HDFS-17453. IncrementalBlockReport can have race condition with Edit Log Tailer (#6708) 2024-04-10 09:30:24 -07:00
Anuj Modi
dbe2d61258
HADOOP-19096. [ABFS] [CST Optimization] Enhance Client-Side Throttling Metrics Logic (#6276)
ABFS has a client-side throttling mechanism which works on the metrics collected
from past requests

When requests are fail due to server-side throttling it updates its
metrics and recalculates any client side backoff.

The choice of which requests should be used to compute client side
backoff interval is based on the http status code:

- Status code in 2xx range: Successful Operations should contribute.
- Status code in 3xx range: Redirection Operations should not contribute.
- Status code in 4xx range: User Errors should not contribute.
- Status code is 503: Throttling Error should contribute only if they
  are due to client limits breach as follows:
  * 503, Ingress Over Account Limit: Should Contribute
  * 503, Egress Over Account Limit: Should Contribute
  * 503, TPS Over Account Limit: Should Contribute
  * 503, Other Server Throttling: Should not Contribute.
- Status code in 5xx range other than 503: Should not Contribute.
- IOException and UnknownHostExceptions: Should not Contribute.

Contributed by Anuj Modi
2024-04-10 14:46:23 +01:00
Cheng Pan
281e2d288d
Revert "HADOOP-16822. Provide source artifacts for hadoop-client-api. Contributed by Karel Kolman." (#6458)
This reverts commit 2c4ab72a60.

Justification: this was making debugging through IDEs worse, rather than better.
2024-04-10 12:03:59 +01:00
Gautham B A
f7bb4f1595
HADOOP-18135. Produce Windows binaries of Hadoop (#6673)
This PR enables one to create the Hadoop
release tarball on Windows, complete with
the native binaries (including winutils.exe).
This PR contains the following changes -

* Prevents splitting during array element
  expansion - this is needed since we need
  to pass the arguments correctly to maven.
* Install Python 3.11.8 and pip to the
  Windows docker image for building
  Hadoop.
* pom file changes to get maven to invoke
  the releasedocmaker script through
  bash.exe on Windows.
2024-04-09 22:15:05 +05:30
Yang Jiandan
3f8af73913
YARN-11670. Add CallerContext in NodeManager (#6688) 2024-04-08 22:50:41 -04:00
slfan1989
8c378d1ea1
YARN-11444. Improve YARN md documentation format. (#6711) Contributed by Shilun Fan.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-07 20:50:46 +08:00
ConfX
73e6931ed0
HDFS-17449. Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error (#6691). Contributed by ConfX
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-06 13:38:09 +05:30
slfan1989
a1ae35e691
HADOOP-19135. Remove Jcache 1.0-alpha. (#6695) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-05 22:09:15 +08:00
Anuj Modi
6ed73896f6
HADOOP-18656. [ABFS] Add Support for Paginated Delete for Large Directories in HNS Account (#6409)
Contributed by Anuj Modi
2024-04-04 19:48:25 +01:00
Dongjoon Hyun
d7157b4aa9
HADOOP-19141. Vector IO: Update default values consistently (#6702)
Contributed by Dongjoon Hyun
2024-04-04 10:56:40 +01:00
Steve Loughran
62182b1f74
HADOOP-19098. Vector IO: test failure followup (#6701)
Revert changes in ITestDelegatedMRJob which came in with HADOOP-19098

Contributed by Steve Loughran
2024-04-03 14:40:41 -05:00
PJ Fanning
eede5b1315
HADOOP-19114. Upgrade to commons-compress 1.26.1 due to CVEs. (#6636)
This addresses two CVEs triggered by malformed archives

Important: Denial of Service CVE-2024-25710
Moderate: Denial of Service CVE-2024-26308

Contributed by PJ Fanning
2024-04-03 19:32:15 +01:00
Steve Loughran
87fb977777
HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation

Contributed by Steve Loughran

Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
2024-04-03 13:17:52 +01:00
Steve Loughran
b4f9d8e6fa
Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently."
This reverts commit ba7faf90c8.
2024-04-03 13:15:05 +01:00
PJ Fanning
1357bb162d
HADOOP-19123. Update to commons-configuration2 2.10.1 due to CVE (#6661). Contributed by PJ Fanning
Reviewed-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-03 01:20:00 +05:30
Steve Loughran
ba7faf90c8
HADOOP-19098. Vector IO: Specify and validate ranges consistently.
Clarifies behaviour of VectorIO methods with contract tests as well as specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.  

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation

Contributed by Steve Loughran
2024-04-02 20:16:38 +01:00
Lei313
36c22400b2
HDFS-17408:Reduce the number of quota calculations in FSDirRenameOp (#6653). Contributed by lei w.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-04-02 10:40:28 +08:00
slfan1989
5f3eb446f7
YARN-11663. [Federation] Add Cache Entity Nums Limit. (#6662) Contributed by Shilun Fan.
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-02 07:47:59 +08:00
PJ Fanning
f7d1ec2d9e
HADOOP-19077. Remove use of javax.ws.rs.core.HttpHeaders (#6554). Contributed by PJ Fanning
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-01 12:43:39 +05:30
PJ Fanning
59976f1be2
HDFS-17450. Add explicit dependency on httpclient jar (#6130). Contributed by PJ Fanning
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-30 23:24:15 +05:30
huhaiyang
4807815e1c
HDFS-17448. Enhance the stability of the unit test TestDiskBalancerCommand (#6690). Contributed by Haiyang Hu
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-30 22:51:05 +05:30
PJ Fanning
06db6289cb
HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
ConfX
aabf31f6fa
HDFS-17103. Fix file system cleanup in TestNameEditsConfigs (#6071). Contributed by ConfX.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-30 15:04:42 +05:30
PJ Fanning
97c5a6efba
HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
slfan1989
347521c95d
HADOOP-19124. Update org.ehcache from 3.3.1 to 3.8.2. (#6665) 2024-03-28 21:56:12 -04:00
Junfan Zhang
f1f2abe641
YARN-11668. Fix RM crash for potential concurrent modification exception when updating node attributes (#6681) Contributed by Junfan Zhang.
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-29 09:45:16 +08:00
ConfX
f3f6340746
HDFS-17443. add null check for fileSys and cluster before shutting down (#6683) 2024-03-28 11:09:50 -04:00
xiaojunxiang
8528d5783d
HDFS-17216. Distcp: When handle the small files, the bandwidth parameter will be invalid, fix this bug. (#6138) 2024-03-28 10:31:06 -04:00
PJ Fanning
5bfca65692
HADOOP-19115. Upgrade to nimbus-jose-jwt 9.37.2 due to CVE-2023-52428. (#6637)
Contributed by PJ Fanning
2024-03-27 10:30:55 +00:00
Alex
50370769cd
HDFS-17429. Fixing wrong log file name in datatransfer Sender.java (#6670) Contributed by Zhongkun Wu.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-27 06:59:15 +08:00
Syed Shameerur Rahman
032796a0fb
HADOOP-19047: S3A: Support in-memory tracking of Magic Commit data (#6468)
If the option fs.s3a.committer.magic.track.commits.in.memory.enabled
is set to true, then rather than save data about in-progress uploads
to S3, this information is cached in memory.

If the number of files being committed is low, this will save network IO
in both the generation of .pending and marker files, and in the scanning
of task attempt directory trees during task commit.

Contributed by Syed Shameerur Rahman
2024-03-26 15:29:35 +00:00
Viraj Jasani
9fe371aa15
HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals (ADDENDUM) (#6546)
This is a followup to #6406:
HADOOP-18980. S3A credential provider remapping: make extensible

It adds extra validation of key-value pairs in a configuration
option, with tests.

Contributed by Viraj Jasani
2024-03-26 11:18:03 +00:00
Zilong Zhu
37f9ccdc86
HDFS-17368. HA: Standby should exit safemode when resources are available. (#6518). Contributed by Zilong Zhu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-03-26 17:35:55 +08:00
Takanobu Asanuma
8bc4939ee2
HDFS-17441. Fix junit dependency by adding missing library in hadoop-hdfs-rbf. (#6669)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-26 09:40:28 +09:00
Takanobu Asanuma
c4f7a3625b
HDFS-17435. Fix TestRouterRpc failed (#6650)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-26 09:38:35 +09:00
Gautham B A
76489e579b
HADOOP-19127. Do not run unit tests on Windows pre-commit CI (#6672) 2024-03-25 09:16:03 -07:00
Gautham B A
44a249e32a
Exclude files from Apache RAT (#6671) 2024-03-25 09:13:57 -07:00
PJ Fanning
7653f968e5
HADOOP-19116. Update to zookeeper client 3.8.4 due to CVE-2024-23944. (#6638)
Updated ZK client dependency to 3.8.4 to address  CVE-2024-23944.

Contributed by PJ Fanning
2024-03-25 15:10:56 +00:00
Anuj Modi
c4fa1b65fb
HADOOP-19089: [ABFS] Reverting Back Support of setXAttr() and getXAttr() on root path (#6592)
This reverts most of
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003).

Calling getXAttr("/") or setXAttr("/") on an abfs container will fail with

`Operation failed: "The request URI is invalid.", HTTP 400 Bad Request`

 
This change is to ensure:
* Consistency across ADLS clients
* Consistency across authentication mechanisms.

Contributed by Anuj Modi
2024-03-25 14:13:24 +00:00
Alex
5c7e40f910
HADOOP-19111. Removing redundant debug message about client info (#6666). Contributed by Zhongkun Wu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-03-25 11:44:33 +08:00
yu liang
55dca911cc
HADOOP-19052.Hadoop use Shell command to get the count of the hard link which takes a lot of time (#6587) Contributed by liangyu.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-24 10:14:27 +08:00
LiuGuH
a60b5e2de3
MAPREDUCE-7469. NNBench createControlFiles should use thread pool to improve performance. (#6463) Contributed by liuguanghua.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-23 06:15:56 +08:00
huhaiyang
8cd4704e0a
HDFS-17430. RecoveringBlock will skip no live replicas when get block recovery command. (#6635) 2024-03-22 09:43:12 -04:00