hadoop/hadoop-tools
Steve Loughran 87fb977777
HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation

Contributed by Steve Loughran

Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
2024-04-03 13:17:52 +01:00
..
hadoop-aliyun Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archive-logs Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archives HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-aws HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604 2024-04-03 13:17:52 +01:00
hadoop-azure HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604 2024-04-03 13:17:52 +01:00
hadoop-azure-datalake Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-benchmark Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-compat-bench HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00
hadoop-datajoin Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-distcp HDFS-17216. Distcp: When handle the small files, the bandwidth parameter will be invalid, fix this bug. (#6138) 2024-03-28 10:31:06 -04:00
hadoop-dynamometer Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-extras HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-federation-balance Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-fs2img HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-gridmix HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-kafka Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-openstack Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-pipes Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-resourceestimator Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-rumen Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-sls Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-streaming HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-tools-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
pom.xml HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00