Go to file
Steve Loughran e1842b2a74
HADOOP-18103. Add a high-performance vectored read API. (#4476)
This feature adds methods for ranged vectored read operations
in PositionedReadable.

All stream which implement that interface support the new API.

The default implementation reads each range in the vector
sequentially.

However, specific implementations may provide higher performance
versions. This is done in two places

* Local FileSystem/Checksum FileSystem
* The S3A client.

The S3A client first coalesces adjacent and "nearby" ranges
together, then fetches each range in separate HTTP GET requests,
executed in parallel. As such it delivers significant speedups
to applications reading separate blocks of data from the same
file, columnar data format libraries in particular.

This is the merge commit of the feature branch; the work is in

HADOOP-11867. Add a high-performance vectored read API.
HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads.
HADOOP-18107. Adding scale test for vectored reads for large file
HADOOP-18105. Implement buffer pooling with weak references.
HADOOP-18106. Handle memory fragmentation in S3A Vectored IO.

Contributed By: Owen O'Malley and Mukund Thakur
2022-06-22 18:19:23 +01:00
.github HADOOP-17799. Improve the GitHub pull request template (#3277) 2021-08-14 21:16:15 +09:00
dev-support HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00
hadoop-assemblies HDFS-15346. FedBalance tool implementation. Contributed by Jinglun. 2020-06-18 13:33:25 +08:00
hadoop-build-tools HADOOP-17968 Migrate checkstyle module illegalimport to maven enforcer banned-illegal-imports (#3584) 2021-10-28 15:57:15 +09:00
hadoop-client-modules HDFS-16453. Upgrade okhttp from 2.7.5 to 4.9.3 (#4229) 2022-05-21 02:53:14 +09:00
hadoop-cloud-storage-project HADOOP-18159. Bump cos_api-bundle to 5.6.69 to update public-suffix-list.txt (#4444) 2022-06-15 20:03:26 +01:00
hadoop-common-project HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445) 2022-06-22 17:29:32 +01:00
hadoop-dist Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-hdfs-project HDFS-16616. remove use of org.apache.hadoop.util.Sets (#4400) 2022-06-22 10:17:36 +05:30
hadoop-mapreduce-project MAPREDUCE-7391. TestLocalDistributedCacheManager failing after HADOOP-16202 (#4472) 2022-06-22 12:52:41 +01:00
hadoop-maven-plugins HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-minicluster HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-project HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00
hadoop-project-dist HADOOP-18198. Release 3.3.3: release notes and jdiff files. 2022-05-17 19:00:54 +01:00
hadoop-tools HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445) 2022-06-22 17:29:32 +01:00
hadoop-yarn-project YARN-11188. Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured. Contributed by Szilard Nemeth. 2022-06-22 14:40:20 +02:00
licenses HADOOP-17144. Update Hadoop's lz4 to v1.9.2. Contributed by Hemanth Boyina. 2020-10-18 18:37:46 +05:30
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.asf.yaml HADOOP-17234. Add .asf.yaml to allow Github to Jira integration. (#2253). Contributed by Ayush Saxena. 2020-08-28 17:22:46 +05:30
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore YARN-10407. Add phantomjsdriver.log to gitignore. (#2244) 2020-09-01 10:44:55 +09:00
BUILDING.txt Update BUILDING.txt (#3811) 2021-12-22 13:08:14 +08:00
LICENSE-binary HDFS-16453. Upgrade okhttp from 2.7.5 to 4.9.3 (#4229) 2022-05-21 02:53:14 +09:00
LICENSE.txt HADOOP-18044. Hadoop - Upgrade to jQuery 3.6.0 (#3791) 2022-01-12 11:40:32 +08:00
NOTICE-binary HADOOP-18068. upgrade AWS SDK to 1.12.132 (#3864) 2022-01-18 10:31:28 +00:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-18052. Support Apple Silicon in start-build-env.sh (#3817) 2021-12-23 18:13:18 +09:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/