HADOOP-16202 "Enhance openFile()" added asynchronous draining of the
remaining bytes of an S3 HTTP input stream for those operations
(unbuffer, seek) where it could avoid blocking the active
thread.
This patch fixes the asynchronous stream draining to work and so
return the stream back to the http pool. Without this, whenever
unbuffer() or seek() was called on a stream and an asynchronous
drain triggered, the connection was not returned; eventually
the pool would be empty and subsequent S3 requests would
fail with the message "Timeout waiting for connection from pool"
The root cause was that even though the fields passed in to drain() were
converted to references through the methods, in the lambda expression
passed in to submit, they were direct references
operation = client.submit(
() -> drain(uri, streamStatistics,
false, reason, remaining,
object, wrappedStream)); /* here */
Those fields were only read during the async execution, at which
point they would have been set to null (or even a subsequent read).
A new SDKStreamDrainer class peforms the draining; this is a Callable
and can be submitted directly to the executor pool.
The class is used in both the classic and prefetching s3a input streams.
Also, calling unbuffer() switches the S3AInputStream from adaptive
to random IO mode; that is, it is considered a cue that future
IO will not be sequential, whole-file reads.
Contributed by Steve Loughran.
* This PR adds an option
use.platformToolsetVersion that
makes the build systems to use
this platform toolset version.
* This also makes sure that
win-vs-upgrade.cmd does not get
executed when the
use.platformToolsetVersion
option is specified.
This patch prepares the hadoop-aws module for a future
migration to using the v2 AWS SDK (HADOOP-18073)
That upgrade will be incompatible; this patch prepares
for it:
-marks some credential providers and other
classes and methods as @deprecated.
-updates site documentation
-reduces the visibility of the s3 client;
other than for testing, it is kept private to
the S3AFileSystem class.
-logs some warnings when deprecated APIs are used.
The warning messages are printed only once
per JVM's life. To disable them, set the
log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade
to ERROR
Contributed by Ahmar Suhail
Declares its compatibility with Spark's dynamic
output partitioning by having the stream capability
"mapreduce.job.committer.dynamic.partitioning"
Requires a Spark release with SPARK-40034, which
does the probing before deciding whether to
accept/rejecting instantiation with
dynamic partition overwrite set
This feature can be declared as supported by
any other PathOutputCommitter implementations
whose algorithm and destination filesystem
are compatible.
None of the S3A committers are compatible.
The classic FileOutputCommitter is, but it
does not declare itself as such out of our fear
of changing that code. The Spark-side code
will automatically infer compatibility if
the created committer is of that class or
a subclass.
Contributed by Steve Loughran.
This addresses an issue where the plugin's default classpath for executing tests fails to include org.junit.platform.launcher.core.LauncherFactory.
Contributed by: Steve Vaughan Jr
Exclude bound local addresses, including the use of a wildcard address in the bound host configurations.
* Allow sync attempts with unresolved addresses
* Update the comments.
* Remove unused import
Signed-off-by: stack <stack@apache.org>
Avoid reconnecting to the old address after detecting that the address has been updated.
* Fix Checkstyle line length violation
* Keep ConnectionId as Immutable for map key
The ConnectionId is used as a key in the connections map, and updating the remoteId caused problems with the cleanup of connections when the removeMethod was used.
Instead of updating the address within the remoteId, use the removeMethod to cleanup references to the current identifier and then replace it with a new identifier using the updated address.
* Use final to protect immutable ConnectionId
Mark non-test fields as private and final, and add a missing accessor.
* Use a stable hashCode to allow safe IP addr changes
* Add test that updated address is used
Once the address has been updated, it should be used in future calls. Check to ensure that a second request succeeds and that it uses the existing updated address instead of having to re-resolve.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: sokui
Signed-off-by: XanderZu
Signed-off-by: stack <stack@apache.org>
This is the the preview release of the HADOOP-18028 S3A performance input stream.
It is still stabilizing, but ready to test.
Contains
HADOOP-18028. High performance S3A input stream (#4109)
Contributed by Bhalchandra Pandit.
HADOOP-18180. Replace use of twitter util-core with java futures (#4115)
Contributed by PJ Fanning.
HADOOP-18177. Document prefetching architecture. (#4205)
Contributed by Ahmar Suhail
HADOOP-18175. fix test failures with prefetching s3a input stream (#4212)
Contributed by Monthon Klongklaew
HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386)
* adds in new test for prefetching input stream
* creates streamStats before opening stream
* updates numBlocks calculation method
* fixes ITestS3AOpenCost.testOpenFileLongerLength
* drains stream async
* fixes failing unit test
Contributed by Ahmar Suhail
HADOOP-18254. Disable S3A prefetching by default. (#4469)
Contributed by Ahmar Suhail
HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458)
This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with
new statistic names in StreamStatistics.
This stream is not (yet) IOStatisticsContext aware.
Contributed by Ahmar Suhail
HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk
HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums.
HADOOP-18318. Update class names to be clear they belong to S3A prefetching
Contributed by Steve Loughran