e1842b2a74
This feature adds methods for ranged vectored read operations in PositionedReadable. All stream which implement that interface support the new API. The default implementation reads each range in the vector sequentially. However, specific implementations may provide higher performance versions. This is done in two places * Local FileSystem/Checksum FileSystem * The S3A client. The S3A client first coalesces adjacent and "nearby" ranges together, then fetches each range in separate HTTP GET requests, executed in parallel. As such it delivers significant speedups to applications reading separate blocks of data from the same file, columnar data format libraries in particular. This is the merge commit of the feature branch; the work is in HADOOP-11867. Add a high-performance vectored read API. HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads. HADOOP-18107. Adding scale test for vectored reads for large file HADOOP-18105. Implement buffer pooling with weak references. HADOOP-18106. Handle memory fragmentation in S3A Vectored IO. Contributed By: Owen O'Malley and Mukund Thakur |
||
---|---|---|
.github | ||
dev-support | ||
hadoop-assemblies | ||
hadoop-build-tools | ||
hadoop-client-modules | ||
hadoop-cloud-storage-project | ||
hadoop-common-project | ||
hadoop-dist | ||
hadoop-hdfs-project | ||
hadoop-mapreduce-project | ||
hadoop-maven-plugins | ||
hadoop-minicluster | ||
hadoop-project | ||
hadoop-project-dist | ||
hadoop-tools | ||
hadoop-yarn-project | ||
licenses | ||
licenses-binary | ||
.asf.yaml | ||
.gitattributes | ||
.gitignore | ||
BUILDING.txt | ||
LICENSE-binary | ||
LICENSE.txt | ||
NOTICE-binary | ||
NOTICE.txt | ||
pom.xml | ||
README.txt | ||
start-build-env.sh |
For the latest information about Hadoop, please visit our website at: http://hadoop.apache.org/ and our wiki, at: https://cwiki.apache.org/confluence/display/HADOOP/