Clarifies behaviour of VectorIO methods with contract tests as well as specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation
Contributed by Steve Loughran
If the option fs.s3a.committer.magic.track.commits.in.memory.enabled
is set to true, then rather than save data about in-progress uploads
to S3, this information is cached in memory.
If the number of files being committed is low, this will save network IO
in both the generation of .pending and marker files, and in the scanning
of task attempt directory trees during task commit.
Contributed by Syed Shameerur Rahman
This is a followup to #6406:
HADOOP-18980. S3A credential provider remapping: make extensible
It adds extra validation of key-value pairs in a configuration
option, with tests.
Contributed by Viraj Jasani
This reverts most of
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003).
Calling getXAttr("/") or setXAttr("/") on an abfs container will fail with
`Operation failed: "The request URI is invalid.", HTTP 400 Bad Request`
This change is to ensure:
* Consistency across ADLS clients
* Consistency across authentication mechanisms.
Contributed by Anuj Modi
Spotbugs is mistaken here as it doesn't observer the read/write locks used
to manage exclusive access to the maps.
* cache the value between checks
* tag as @VisibleForTesting
Contributed by Steve Loughran
FIPS is only supported in north america AWS regions; relevant tests in
ITestS3AEndpointRegion are skipped for buckets with different endpoints/regions.
Disables the new tests added in:
HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions #6425
The underlying issue here is that the block prefetch code can identify
when there's a mismatch between declared and actual length, and doesn't
store any of the incomplete buffer.
This should be addressed in HADOOP-18184.
Contributed by Steve Loughran