hadoop/hadoop-common-project
Steve Loughran 55a576906d
HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686)
1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
2024-08-14 14:43:00 +01:00
..
hadoop-annotations Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-auth HADOOP-19160. hadoop-auth should not depend on kerb-simplekdc (#6788) 2024-05-03 12:57:26 +02:00
hadoop-auth-examples HADOOP-18088. Replace log4j 1.x with reload4j. (#4052) 2024-02-13 16:33:51 +00:00
hadoop-common HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686) 2024-08-14 14:43:00 +01:00
hadoop-kms HDFS-13603: Do not propagate ExecutionException while initializing EDEK queues for keys. (#6860) 2024-06-03 09:10:06 -07:00
hadoop-minikdc HADOOP-18088. Replace log4j 1.x with reload4j. (#4052) 2024-02-13 16:33:51 +00:00
hadoop-nfs HADOOP-18088. Replace log4j 1.x with reload4j. (#4052) 2024-02-13 16:33:51 +00:00
hadoop-registry HADOOP-19237. Upgrade to dnsjava 3.6.1 due to CVEs (#6961) 2024-08-01 20:07:36 +01:00
pom.xml Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00