hadoop/hadoop-tools
Steve Loughran f6c557d3b3
HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766)
HADOOP-16202 "Enhance openFile()" added asynchronous draining of the
remaining bytes of an S3 HTTP input stream for those operations
(unbuffer, seek) where it could avoid blocking the active
thread.

This patch fixes the asynchronous stream draining to work and so
return the stream back to the http pool. Without this, whenever
unbuffer() or seek() was called on a stream and an asynchronous
drain triggered, the connection was not returned; eventually
the pool would be empty and subsequent S3 requests would
fail with the message "Timeout waiting for connection from pool"

The root cause was that even though the fields passed in to drain() were
converted to references through the methods, in the lambda expression
passed in to submit, they were direct references

operation = client.submit(
 () -> drain(uri, streamStatistics,
       false, reason, remaining,
       object, wrappedStream));  /* here */

Those fields were only read during the async execution, at which
point they would have been set to null (or even a subsequent read).

A new SDKStreamDrainer class peforms the draining; this is a Callable
and can be submitted directly to the executor pool.

The class is used in both the classic and prefetching s3a input streams.

Also, calling unbuffer() switches the S3AInputStream from adaptive
to random IO mode; that is, it is considered a cue that future
IO will not be sequential, whole-file reads.

Contributed by Steve Loughran.
2022-08-31 16:52:12 +01:00
..
hadoop-aliyun HADOOP-18313: AliyunOSSBlockOutputStream should not mark the temporary file for deletion (#4502) 2022-07-06 14:31:07 +08:00
hadoop-archive-logs HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-archives HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-aws HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766) 2022-08-31 16:52:12 +01:00
hadoop-azure HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state (#4517) 2022-07-02 01:49:14 +05:30
hadoop-azure-datalake HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-benchmark HADOOP-18322. Yetus build failure in branch-3.3. 2022-06-30 15:05:38 -05:00
hadoop-datajoin HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-distcp HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-dynamometer HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-extras HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-fs2img HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-gridmix HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-kafka HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-openstack HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-pipes HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-resourceestimator HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-rumen HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-sls HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
hadoop-streaming MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855) 2022-06-22 13:13:05 +01:00
hadoop-tools-dist HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482) 2022-06-22 13:09:50 +01:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-23 17:09:16 -05:00