hadoop/hadoop-tools
Steve Loughran c69e16b297
HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766)
HADOOP-16202 "Enhance openFile()" added asynchronous draining of the 
remaining bytes of an S3 HTTP input stream for those operations
(unbuffer, seek) where it could avoid blocking the active
thread.

This patch fixes the asynchronous stream draining to work and so
return the stream back to the http pool. Without this, whenever
unbuffer() or seek() was called on a stream and an asynchronous
drain triggered, the connection was not returned; eventually
the pool would be empty and subsequent S3 requests would
fail with the message "Timeout waiting for connection from pool"

The root cause was that even though the fields passed in to drain() were
converted to references through the methods, in the lambda expression
passed in to submit, they were direct references

operation = client.submit(
 () -> drain(uri, streamStatistics,
       false, reason, remaining,
       object, wrappedStream));  /* here */

Those fields were only read during the async execution, at which
point they would have been set to null (or even a subsequent read).

A new SDKStreamDrainer class peforms the draining; this is a Callable
and can be submitted directly to the executor pool.

The class is used in both the classic and prefetching s3a input streams.

Also, calling unbuffer() switches the S3AInputStream from adaptive
to random IO mode; that is, it is considered a cue that future
IO will not be sequential, whole-file reads.

Contributed by Steve Loughran.
2022-08-31 11:16:52 +01:00
..
hadoop-aliyun HADOOP-18313: AliyunOSSBlockOutputStream should not mark the temporary file for deletion (#4502) 2022-07-06 14:23:46 +08:00
hadoop-archive-logs HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-archives HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-aws HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766) 2022-08-31 11:16:52 +01:00
hadoop-azure HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state (#4331) 2022-06-27 19:06:59 +01:00
hadoop-azure-datalake HDFS-16453. Upgrade okhttp from 2.7.5 to 4.9.3 (#4229) 2022-05-21 02:53:14 +09:00
hadoop-benchmark HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445) 2022-06-22 17:29:32 +01:00
hadoop-datajoin HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-distcp HADOOP-15789. DistCp does not clean staging folder if class extends DistCp. Contributed by Lawrence Andrews. (#4534) 2022-07-08 17:04:20 +05:30
hadoop-dynamometer HDFS-16522. Set Http and Ipc ports for Datanodes in MiniDFSCluster (#4108) 2022-04-06 18:17:02 +09:00
hadoop-extras HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-federation-balance HDFS-16256. Minor fix in HDFS Fedbalance document (#4192) 2022-05-02 08:08:12 +08:00
hadoop-fs2img HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-gridmix HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-kafka HADOOP-17753. Keep restrict-imports-enforcer-rule for Guava Lists in top level hadoop-main pom (#3087) 2021-06-11 12:15:52 +09:00
hadoop-openstack HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-pipes Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-resourceestimator HADOOP-15983. Use jersey-json that is built to use jackson2 (#3988) 2022-04-28 14:18:19 +09:00
hadoop-rumen HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-sls YARN-11102. Fix spotbugs error in hadoop-sls module. Contributed by Szilard Nemeth, Andras Gyori. 2022-04-01 18:24:37 +02:00
hadoop-streaming HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2) 2022-04-24 17:33:05 +01:00
hadoop-tools-dist HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00