Go to file
Steve Loughran 55a576906d
HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686)
1. The class WrappedIO has been extended with more filesystem operations

- openFile()
- PathCapabilities
- StreamCapabilities
- ByteBufferPositionedReadable

All these static methods raise UncheckedIOExceptions rather than
checked ones.

2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics
provides similar access to IOStatistics/IOStatisticsContext classes
and operations.

Allows callers to:
* Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or
  IOStatistics instance
* Save an IOStatisticsSnapshot to file
* Convert an IOStatisticsSnapshot to JSON
* Given an object which may be an IOStatisticsSource, return an object
  whose toString() value is a dynamically generated, human readable summary.
  This is for logging.
* Separate getters to the different sections of IOStatistics.
* Mean values are returned as a Map.Pair<Long, Long> of (samples, sum)
  from which means may be calculated.

There are examples of the dynamic bindings to these classes in:

org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO
org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics

These use DynMethods and other classes in the package
org.apache.hadoop.util.dynamic which are based on the
Apache Parquet equivalents.
This makes re-implementing these in that library and others
which their own fork of the classes (example: Apache Iceberg)

3. The openFile() option "fs.option.openfile.read.policy" has
added specific file format policies for the core filetypes

* avro
* columnar
* csv
* hbase
* json
* orc
* parquet

S3A chooses the appropriate sequential/random policy as a 

A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for
any filesystem aware of it, falling back to the first entry in the list which
the specific version of the filesystem recognizes

4. New Path capability fs.capability.virtual.block.locations

Indicates that locations are generated client side
and don't refer to real hosts.

Contributed by Steve Loughran
2024-08-14 14:43:00 +01:00
.github HADOOP-19193. Create orphan commit for website deployment (#6864) 2024-06-05 15:25:48 +01:00
.yetus Add .yetus/excludes.txt (#4984) 2022-10-11 09:23:34 -07:00
dev-support HADOOP-19246. Update the yasm rpm download address (#6973) 2024-08-05 09:57:16 +08:00
hadoop-assemblies HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
hadoop-build-tools Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-client-modules HADOOP-19237. Upgrade to dnsjava 3.6.1 due to CVEs (#6961) 2024-08-01 20:07:36 +01:00
hadoop-cloud-storage-project HADOOP-19154. Upgrade bouncycastle to 1.78.1 due to CVEs (#6755) 2024-06-05 15:31:23 +01:00
hadoop-common-project HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686) 2024-08-14 14:43:00 +01:00
hadoop-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-hdfs-project HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686) 2024-08-14 14:43:00 +01:00
hadoop-mapreduce-project MAPREDUCE-7475. Fix non-idempotent unit tests (#6785) 2024-05-17 14:51:47 +01:00
hadoop-maven-plugins HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-minicluster Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-project HADOOP-19237. Upgrade to dnsjava 3.6.1 due to CVEs (#6961) 2024-08-01 20:07:36 +01:00
hadoop-project-dist HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan. 2024-03-19 20:08:03 +08:00
hadoop-tools HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686) 2024-08-14 14:43:00 +01:00
hadoop-yarn-project YARN-11705. Turn off Node Manager working directories validation by default (#6948) 2024-07-18 16:55:40 +02:00
licenses HADOOP-17144. Update Hadoop's lz4 to v1.9.2. Contributed by Hemanth Boyina. 2020-10-18 18:37:46 +05:30
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.asf.yaml HADOOP-18630. Add gh-pages in asf.yaml to deploy the current trunk doc (#5393). Contributed by Simhadri Govindappa. 2023-02-14 18:13:29 +05:30
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HADOOP-18963. Fix typos in .gitignore (#6243) 2023-11-04 05:12:39 +05:30
BUILDING.txt HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
LICENSE-binary HADOOP-19237. Upgrade to dnsjava 3.6.1 due to CVEs (#6961) 2024-08-01 20:07:36 +01:00
LICENSE.txt YARN-11356. Upgrade DataTables to 1.11.5 to fix CVEs. Contributed by Bence Kosztolnik. 2022-10-26 22:29:01 +02:00
NOTICE-binary HADOOP-19046. S3A: update AWS V2 SDK to 2.23.5; v1 to 1.12.599 (#6467) 2024-01-21 19:00:34 +00:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh Minor, fix cpu arch compare to use correct Dockerfile (#6852) 2024-06-13 00:37:28 +05:30

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/