9145 Commits

Author SHA1 Message Date
PJ Fanning
fa9bb0d1ac
HADOOP-19231. Add JacksonUtil to manage Jackson classes (#6953)
New class org.apache.hadoop.util.JacksonUtil centralizes construction of
Jackson ObjectMappers and JsonFactories.

Contributed by PJ Fanning
2024-08-15 16:44:54 +01:00
Steve Loughran
55a576906d
HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686)
1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
2024-08-14 14:43:00 +01:00
Tsz-Wo Nicholas Sze
b189ef8197
HDFS-17575. SaslDataTransferClient should use SaslParticipant to create messages. (#6954) 2024-08-05 10:42:12 -07:00
Aswin M Prabhu
e2a0dca43b
HDFS-16690. Automatically format unformatted JNs with JournalNodeSyncer (#6925). Contributed by Aswin M Prabhu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-23 20:55:57 +08:00
Viraj Jasani
e000cbf277
HADOOP-19218. Addendum. Update TestFSNamesystemLockReport to exclude hostname resolution from regex. (#6951). Contributed by Viraj Jasani.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-23 20:47:36 +08:00
Tsz-Wo Nicholas Sze
a5eb5e9611
HDFS-17576. Support user defined auth Callback. (#6945) 2024-07-20 15:21:06 +08:00
gavin.wang
5730656660
HDFS-17574. Make NNThroughputBenchmark support human-friendly units about blocksize. (#6931). Contributed by wangzhongwei.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-16 20:57:50 +08:00
zhengchenyu
8913d379fd
HDFS-17566. Got wrong sorted block order when StorageType is considered. (#6934). Contributed by Chenyu Zheng.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-11 17:41:24 +08:00
gavin.wang
783a852029
HDFS-17555. Fix NumberFormatException of NNThroughputBenchmark when configured dfs.blocksize. (#6894). Contributed by wangzhongwei
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-07-09 13:52:15 +05:30
huhaiyang
8ca4627a0d
HDFS-17557. Fix bug for TestRedundancyMonitor#testChooseTargetWhenAllDataNodesStop (#6897). Contributed by Haiyang Hu.
Some checks failed
website / build (push) Has been cancelled
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-07-06 13:18:12 +05:30
huhaiyang
5a8f70a72e
HDFS-17559. Fix the uuid as null in NameNodeMXBean (#6906). Contributed by Haiyang Hu.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-07-06 13:16:25 +05:30
huhaiyang
ae76e9475c
HDFS-17564. EC: Fix the issue of inaccurate metrics when decommission mark busy DN. (#6911). Contributed by Haiyang Hu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-05 20:45:01 +08:00
Yu Zhang
b4ddb2d3bb
HDFS-13603: do not propagate ExecutionException and add maxRetries limit to NameNode edek cache warmup (#6774) 2024-06-24 09:34:52 -07:00
Hexiaoqiao
6545b7eeef
HDFS-17098. DatanodeManager does not handle null storage type properly. (#6840). Contributed by ConfX.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-06-19 20:58:57 +08:00
Tsz-Wo Nicholas Sze
1e6411c9ec
HDFS-17528. FsImageValidation: set txid when saving a new image (#6828) 2024-06-19 11:38:17 +08:00
Fateh Singh
90024d8cb1
HDFS-17439. Support -nonSuperUser for NNThroughputBenchmark: useful for testing auth frameworks such as Ranger (#6677) 2024-06-18 13:52:24 +01:00
Heagan A
2fbbfe3cc9
HDFS-17546. Implementing HostsFileReader timeout (#6873) 2024-06-14 20:47:21 -07:00
hfutatzhanghb
4b1b16a846
HDFS-17551. Fix unit test failure caused by HDFS-17464. (#6883). Contributed by farmmamba. 2024-06-12 22:21:15 +05:30
Felix Nguyen
776c0a3ab9
HDFS-17539. Make TestFileChecksum fields static (#6853) 2024-06-11 15:26:21 +08:00
hfutatzhanghb
fb156e8f05
HDFS-17464. Improve some logs output in class FsDatasetImpl (#6724) 2024-05-21 09:46:21 +08:00
Mukund Thakur
47be1ab3b6
HADOOP-18679. Add API for bulk/paged delete of files (#6726)
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
2024-05-20 17:05:25 +01:00
ZanderXu
cab0f4c9ec
HDFS-17520. [BugFix] TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed (#6812) Contributed by Zengqiang Xu.
Reviewed-by: Vinayakumar B <vinayakumarb@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-05-15 07:55:24 +08:00
ConfX
8d9d58dfc8
HDFS-17099. Fix Null Pointer Exception when stop namesystem in HDFS.(#6034). Contributed by ConfX.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-14 11:14:55 +08:00
zhihui wang
12e0ca6b24
HDFS-17522. JournalNode web interfaces lack configs for X-FRAME-OPTIONS protection (#6814). Contributed by wangzhihui.
Signed-off-by: Vinayakumar B <vinayakumarb@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-13 22:10:08 +08:00
Felix Nguyen
fb0519253d
HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759) 2024-05-11 15:37:43 +08:00
Zilong Zhu
700b3e4800
HDFS-17503. Unreleased volume references because of OOM. (#6782) 2024-05-10 10:34:40 +08:00
kulkabhay
edf985e269
HDFS-17500: Add missing operation name while authorizing some operations (#6776). Contributed by kulkabhay.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-06 12:44:30 +08:00
fuchaohong
0c9e0b4398
HDFS-17456. Fix the incorrect dfsused statistics of datanode when appending a file. (#6713). Contributed by fuchaohong.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:22:53 +08:00
fuchaohong
ddb805951e
HDFS-17471. Correct the percentage of sample range. (#6742). Contributed by fuchaohong.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-30 12:18:47 +08:00
Tsz-Wo Nicholas Sze
78987a71a6
HADOOP-19151. Support configurable SASL mechanism. (#6740) 2024-04-29 10:02:23 -07:00
zhtttylz
daafc8a0b8
HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735) Contributed by Hualong Zhang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:36:11 +08:00
dannytbecker
027b4c3259
Remove empty queues from the queueByBlockId map (#6772) 2024-04-26 14:25:15 -07:00
cxzl25
23286b0632
HDFS-17469. Audit log for reportBadBlocks RPC (#6731) 2024-04-24 09:39:57 +08:00
Madhan Neethiraj
e8b2c28dec
HDFS-17478. FSPermissionChecker optimization by initializing AccessControlEnforcer in constructor (#6749) 2024-04-18 15:43:31 -07:00
dannytbecker
0c35cf0982
HDFS-17477. IncrementalBlockReport race condition additional edge cases (#6748) 2024-04-18 09:04:08 -07:00
Lei313
f49a4df797
HDFS-17383:Datanode current block token should come from active NameNode in HA mode (#6562). Contributed by lei w.
Reviewed-by: Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-04-15 18:35:53 +08:00
huhaiyang
81b05977f2
HDFS-17455. Fix Client throw IndexOutOfBoundsException in DFSInputStream#fetchBlockAt (#6710). Contributed by Haiyang Hu.
Reviewed-by: ZanderXu <zanderxu@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-04-11 18:04:57 +08:00
dannytbecker
05964ad07a
HDFS-17453. IncrementalBlockReport can have race condition with Edit Log Tailer (#6708) 2024-04-10 09:30:24 -07:00
ConfX
73e6931ed0
HDFS-17449. Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error (#6691). Contributed by ConfX
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-06 13:38:09 +05:30
Steve Loughran
87fb977777
HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation

Contributed by Steve Loughran

Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
2024-04-03 13:17:52 +01:00
Steve Loughran
b4f9d8e6fa
Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently."
This reverts commit ba7faf90c80476c79e6bfc7c02749dfc031337eb.
2024-04-03 13:15:05 +01:00
Steve Loughran
ba7faf90c8
HADOOP-19098. Vector IO: Specify and validate ranges consistently.
Clarifies behaviour of VectorIO methods with contract tests as well as specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.  

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation

Contributed by Steve Loughran
2024-04-02 20:16:38 +01:00
Lei313
36c22400b2
HDFS-17408:Reduce the number of quota calculations in FSDirRenameOp (#6653). Contributed by lei w.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-04-02 10:40:28 +08:00
PJ Fanning
f7d1ec2d9e
HADOOP-19077. Remove use of javax.ws.rs.core.HttpHeaders (#6554). Contributed by PJ Fanning
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-04-01 12:43:39 +05:30
huhaiyang
4807815e1c
HDFS-17448. Enhance the stability of the unit test TestDiskBalancerCommand (#6690). Contributed by Haiyang Hu
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-03-30 22:51:05 +05:30
PJ Fanning
06db6289cb
HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
PJ Fanning
97c5a6efba
HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
ConfX
f3f6340746
HDFS-17443. add null check for fileSys and cluster before shutting down (#6683) 2024-03-28 11:09:50 -04:00
Zilong Zhu
37f9ccdc86
HDFS-17368. HA: Standby should exit safemode when resources are available. (#6518). Contributed by Zilong Zhu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-03-26 17:35:55 +08:00
huhaiyang
8cd4704e0a
HDFS-17430. RecoveringBlock will skip no live replicas when get block recovery command. (#6635) 2024-03-22 09:43:12 -04:00