1. The class WrappedIO has been extended with more filesystem operations
- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable
All these static methods raise UncheckedIOExceptions rather than
checked ones.
2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.
Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
whose toString() value is a dynamically generated, human readable summary.
This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
from which means may be calculated.
There are examples of the dynamic bindings to these classes in:
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics
These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)
3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes
* avro
* columnar
* csv
* hbase
* json
* orc
* parquet
S3A chooses the appropriate sequential/random policy as a
A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes
4. New Path capability fs.capability.virtual.block.locations
Indicates that locations are generated client side
and don't refer to real hosts.
Contributed by Steve Loughran
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.
This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.
Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.
The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.
To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.
Contributed by Mukund Thakur and Steve Loughran