hadoop/hadoop-tools
Steve Loughran 36198b5edf
HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions (#6425)
Differentiate from "EOF out of range/end of GET" from
"EOF channel problems" through
two different subclasses of EOFException and input streams to always
retry on http channel errors; out of range GET requests are not retried.
Currently an EOFException is always treated as a fail-fast call in read()

This allows for all existing external code catching EOFException to handle
both, but S3AInputStream to cleanly differentiate range errors (map to -1)
from channel errors (retry)

- HttpChannelEOFException is subclass of EOFException, so all code
  which catches EOFException is still happy.
  retry policy: connectivityFailure
- RangeNotSatisfiableEOFException is the subclass of EOFException
  raised on 416 GET range errors.
  retry policy: fail
- Method ErrorTranslation.maybeExtractChannelException() to create this
  from shaded/unshaded NoHttpResponseException, using string match to
  avoid classpath problems.
- And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035.
  We believe this is the OpenSSL equivalent.
- ErrorTranslation.maybeExtractIOException() to perform this translation as
  appropriate.

S3AInputStream.reopen() code retries on EOF, except on
 RangeNotSatisfiableEOFException,
 which is converted to a -1 response to the caller
 as is done historically.

S3AInputStream knows to handle these with
 read(): HttpChannelEOFException: stream aborting close then retry
 lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map
  any other EOFException class raised.

This means that
* out of range reads map to -1
* channel problems in reopen are retried
* channel problems in read() abort the failed http connection so it
  isn't recycled

Tests for this using/abusing mocking.

Testing through actually raising 416 exceptions and verifying that
readFully(), char read() and vector reads are all good.

There is no attempt to recover within a readFully(); there's
a boolean constant switch to turn this on, but if anyone does
it a test will spin forever as the inner PositionedReadable.read(position, buffer, len)
downgrades all EOF exceptions to -1.
A new method would need to be added which controls whether to downgrade/rethrow
exceptions.

What does that mean? Possibly reduced resilience to non-retried failures
on the inner stream, even though more channel exceptions are retried on.

Contributed by Steve Loughran
2024-01-16 14:14:03 +00:00
..
hadoop-aliyun HADOOP-18458: AliyunOSSBlockOutputStream to support heap/off-heap buffer before uploading data to OSS (#4912) 2023-03-28 14:27:01 +08:00
hadoop-archive-logs HADOOP-18917. Addendum. Fix deprecation issues after commons-io upgrade. (#6228). Contributed by PJ Fanning. 2023-10-30 09:35:02 +05:30
hadoop-archives HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-aws HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions (#6425) 2024-01-16 14:14:03 +00:00
hadoop-azure HADOOP-18971. [ABFS] Read and cache file footer with fs.azure.footer.read.request.size (#6270) 2024-01-03 12:49:52 +00:00
hadoop-azure-datalake HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429) 2023-02-28 10:48:54 +00:00
hadoop-benchmark HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-datajoin HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-distcp HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308) 2023-12-01 14:16:33 +00:00
hadoop-dynamometer HADOOP-18359. Update commons-cli from 1.2 to 1.5. (#5095). Contributed by Shilun Fan. 2023-05-10 01:42:12 +05:30
hadoop-extras HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-federation-balance HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-fs2img HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-gridmix HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-kafka HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-openstack HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
hadoop-pipes Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-resourceestimator HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-rumen HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-sls YARN-11599. Incorrect log4j properties file in SLS sample conf (#6220) Contributed by Junfan Zhang. 2023-11-05 13:57:48 +08:00
hadoop-streaming HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-tools-dist HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00