Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.
This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.
Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.
The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.
To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.
Contributed by Mukund Thakur and Steve Loughran
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation
Contributed by Steve Loughran
Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
Clarifies behaviour of VectorIO methods with contract tests as well as specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation
Contributed by Steve Loughran
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607).
Log4j 1.2.17 has been replaced by reloadj 1.22.2
SLF4J is at 1.7.36