Commit Graph

5878 Commits

Author SHA1 Message Date
Sebb
f11a8cfa6e
HADOOP-13147. Constructors must not call overrideable methods in PureJavaCrc32C (#6408). Contributed by Sebb. 2024-05-21 00:08:08 +05:30
Mukund Thakur
47be1ab3b6
HADOOP-18679. Add API for bulk/paged delete of files (#6726)
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
2024-05-20 17:05:25 +01:00
skyskyhu
3c00093cb5
HADOOP-19167 Bug Fix: Change of Codec configuration does not work (#6807) 2024-05-17 10:27:39 +08:00
Vikas Kumar
f8dce6c501
HADOOP-18851. Performance improvement for DelegationTokenSecretManager (#6803) 2024-05-16 12:30:52 +08:00
Christopher Tubbs
2e77b7b02c
[HADOOP-18786] Use CDN instead of ASF archive (#5789)
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile

Contributed by Christopher Tubbs
2024-05-14 20:09:52 +01:00
zhihui wang
39dee8ea19
HADOOP-18958. Improve UserGroupInformation debug log. (#6255)
Contributed by zhihui wang
2024-05-14 20:03:49 +01:00
Tsz-Wo Nicholas Sze
bda7045070
HADOOP-19152. Do not hard code security providers. (#6739) 2024-05-14 11:19:57 -07:00
zhengchenyu
4cb4d5dd08
HADOOP-19170. Fixes compilation issues on non-Linux systems (#6822)
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
2024-05-13 20:04:01 -07:00
Felix Nguyen
fb0519253d
HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759) 2024-05-11 15:37:43 +08:00
Sammi Chen
43e8ca428e Revert "HADOOP-18851: Performance improvement for DelegationTokenSecretManager. (#6001). Contributed by Vikas Kumar."
This reverts commit e283375cdf.
2024-05-07 13:29:32 +08:00
Tsz-Wo Nicholas Sze
78987a71a6
HADOOP-19151. Support configurable SASL mechanism. (#6740) 2024-04-29 10:02:23 -07:00
zhtttylz
daafc8a0b8
HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735) Contributed by Hualong Zhang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:36:11 +08:00
Pranav Saxena
6404692c09
HADOOP-19102. [ABFS] FooterReadBufferSize should not be greater than readBufferSize (#6617)
Contributed by  Pranav Saxena
2024-04-22 18:36:12 +01:00
zj619
922c44a339
HADOOP-19130. FTPFileSystem rename with full qualified path broken (#6678). Contributed by shawn
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-17 23:12:38 +05:30
PJ Fanning
d194ad0242
HADOOP-19079. HttpExceptionUtils to verify that loaded class is really an exception before instantiation (#6557)
Security hardening

+ Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings
  can all be found in the toString() value of a raised exception

Contributed by PJ Fanning
2024-04-11 19:38:15 +01:00
Gautham B A
f7bb4f1595
HADOOP-18135. Produce Windows binaries of Hadoop (#6673)
This PR enables one to create the Hadoop
release tarball on Windows, complete with
the native binaries (including winutils.exe).
This PR contains the following changes -

* Prevents splitting during array element
  expansion - this is needed since we need
  to pass the arguments correctly to maven.
* Install Python 3.11.8 and pip to the
  Windows docker image for building
  Hadoop.
* pom file changes to get maven to invoke
  the releasedocmaker script through
  bash.exe on Windows.
2024-04-09 22:15:05 +05:30
Steve Loughran
87fb977777
HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation

Contributed by Steve Loughran

Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
2024-04-03 13:17:52 +01:00
Steve Loughran
b4f9d8e6fa
Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently."
This reverts commit ba7faf90c8.
2024-04-03 13:15:05 +01:00
Steve Loughran
ba7faf90c8
HADOOP-19098. Vector IO: Specify and validate ranges consistently.
Clarifies behaviour of VectorIO methods with contract tests as well as specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.  

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation

Contributed by Steve Loughran
2024-04-02 20:16:38 +01:00
PJ Fanning
f7d1ec2d9e
HADOOP-19077. Remove use of javax.ws.rs.core.HttpHeaders (#6554). Contributed by PJ Fanning
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-01 12:43:39 +05:30
PJ Fanning
06db6289cb
HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
PJ Fanning
97c5a6efba
HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
Viraj Jasani
9fe371aa15
HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals (ADDENDUM) (#6546)
This is a followup to #6406:
HADOOP-18980. S3A credential provider remapping: make extensible

It adds extra validation of key-value pairs in a configuration
option, with tests.

Contributed by Viraj Jasani
2024-03-26 11:18:03 +00:00
Gautham B A
44a249e32a
Exclude files from Apache RAT (#6671) 2024-03-25 09:13:57 -07:00
Alex
5c7e40f910
HADOOP-19111. Removing redundant debug message about client info (#6666). Contributed by Zhongkun Wu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-03-25 11:44:33 +08:00
yu liang
55dca911cc
HADOOP-19052.Hadoop use Shell command to get the count of the hard link which takes a lot of time (#6587) Contributed by liangyu.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-24 10:14:27 +08:00
Peter Szucs
a957cd5049
YARN-5305. Allow log aggregation to discard expired delegation tokens (#6625) 2024-03-20 15:33:10 +01:00
Steve Loughran
705fb8323b
HADOOP-19119. Spotbugs: possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize() (#6642)
Spotbugs is mistaken here as it doesn't observer the read/write locks used
to manage exclusive access to the maps.

* cache the value between checks
* tag as @VisibleForTesting

Contributed by Steve Loughran
2024-03-19 17:18:07 +00:00
slfan1989
ff3f2255d2
HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-19 20:08:03 +08:00
Vinayakumar B
0f51d2a4ec
HADOOP-14451. Deadlock in NativeIO (#6632) 2024-03-18 10:53:21 +05:30
PJ Fanning
fc166d3aec
HADOOP-19090. Use protobuf-java 3.23.4. (#6593). Contributed by PJ Fanning. 2024-03-07 15:09:01 +05:30
hfutatzhanghb
7012986fc3
HDFS-17345. Add a metrics to record block report generating cost time. (#6475). Contributed by farmmamba.
Reviewed-by:  Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-03-06 16:59:00 +08:00
Steve Loughran
095229fefb
HADOOP-19097. S3A: Set fs.s3a.connection.establish.timeout to 30s (#6601)
This is consistent with the value in the hadoop-aws source code

Contributed by Steve Loughran
2024-03-05 10:10:27 +00:00
Jian Zhang
a6aa2925fb
HDFS-17333. DFSClient supports lazy resolution from hostname to IP. (#6430)
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2024-03-02 21:35:24 +09:00
Steve Loughran
095dfcca30
HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>


Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607). 

Log4j 1.2.17 has been replaced by reloadj 1.22.2
SLF4J is at 1.7.36
2024-02-13 16:33:51 +00:00
slfan1989
8011b21c52
HADOOP-19069. Use hadoop-thirdparty 1.2.0. (#6533) Contributed by Shilun Fan
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-02-08 19:18:04 +08:00
Steve Loughran
3f98cb6741
HADOOP-19045. CreateSession Timeout - followup (#6532)
This is a followup to PR:
HADOOP-19045. S3A: Validate CreateSession Timeout Propagation (#6470)

Remove all declarations of fs.s3a.connection.request.timeout
in
- hadoop-common/src/main/resources/core-default.xml
- hadoop-aws/src/test/resources/core-site.xml

New test in TestAwsClientConfig to verify that the value
defined in fs.s3a.Constants class is used.

This is brittle to someone overriding it in their test setups,
but as this test is intended to verify that the option is not
explicitly set, there's no workaround.

Contributed by Steve Loughran
2024-02-07 12:07:54 +00:00
Jia Fan
4f0f5a546c
HADOOP-19049. Fix StatisticsDataReferenceCleaner classloader leak (#6488)
Contributed by Jia Fan
2024-02-03 14:48:52 +00:00
Xing Lin
d74e5160cd
HADOOP-19061 Capture exception from rpcRequestSender.start() in IPC.Connection.run() (#6519)
* HADOOP-19061 - Capture exception from rpcRequestSender.start() in IPC.Connection.run() and proper cleaning is followed if an exception is thrown.

---------

Co-authored-by: Xing Lin <xinglin@linkedin.com>
2024-02-02 16:22:16 -08:00
Viraj Jasani
7504b8505f
HADOOP-18980. S3A credential provider remapping: make extensible (#6406)
Contributed by Viraj Jasani
2024-02-02 17:02:48 +00:00
DieterDP
be13e94843
HADOOP-18987. Various fixes to FileSystem API docs (#6292)
Contributed by Dieter De Paepe
2024-02-02 11:49:31 +00:00
Tsz-Wo Nicholas Sze
da34ecdb83
HADOOP-19035. CrcUtil/CrcComposer should not throw IOException for non-IO. (#6443) 2024-01-25 10:35:32 -08:00
PJ Fanning
76691dfa14
HADOOP-18894: upgrade sshd-core due to CVEs (#6060) Contributed by PJ Fanning.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-21 08:13:25 +08:00
slfan1989
8444f69511
Preparing for 3.5.0 development (#6411)
Co-authored-by: slfan1989 <slfan1989@apache.org>
2024-01-19 15:05:22 +08:00
hfutatzhanghb
ba6ada73ac
HDFS-17337. RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. (#6439). Contributed by farmmamba.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-01-18 11:10:05 +08:00
Hexiaoqiao
9634bd31e6
HADOOP-19031. Enhance access control for RunJar. (#6427). Contributed by He Xiaoqiao.
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-01-17 15:00:06 +08:00
Mukund Thakur
7b1570e2f1
HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool. (#6372)
HADOOP-19015.  Increase fs.s3a.connection.maximum to 500 to minimize the risk of Timeout waiting for connection from the pool

Contributed By: Mukund Thakur
2024-01-16 17:06:28 -06:00
slfan1989
6652922333
HADOOP-19040. mvn site commands fails due to MetricsSystem And MetricsSystemImpl changes. (#6450) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-16 22:11:16 +08:00
Xing Lin
453e264eb4
HADOOP-18981. Move oncrpc and portmap packages to hadoop-common (#6280)
Move the org.apache.hadoop.{oncrpc, portmap} packages from the hadoop-nfs module
to the hadoop-common module.

This allows for use of the protocol beyond just NFS -including within HDFS itself.

Contributed by Xing Lin
2024-01-11 14:06:15 +00:00
LiuGuH
5f9932acc4
HDFS-17325. Fix the documentation of fs expunge command in FileSystemShell.md. (#6413) Contributed by liuguanghua.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-05 18:42:55 +08:00