hadoop/hadoop-tools
Steve Loughran ea6e0f7cd5
HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt (#6938)
This is a major change which handles 400 error responses when uploading
large files from memory heap/buffer (or staging committer) and the remote S3
store returns a 500 response from a upload of a block in a multipart upload.

The SDK's own streaming code seems unable to fully replay the upload;
at attempts to but then blocks and the S3 store returns a 400 response

    "Your socket connection to the server was not read from or written to
     within the timeout period. Idle connections will be closed.
     (Service: S3, Status Code: 400...)"

There is an option to control whether or not the S3A client itself
attempts to retry on a 50x error other than 503 throttling events
(which are independently processed as before)

Option:  fs.s3a.retry.http.5xx.errors
Default: true

500 errors are very rare from standard AWS S3, which has a five nines
SLA. It may be more common against S3 Express which has lower
guarantees.

Third party stores have unknown guarantees, and the exception may
indicate a bad server configuration. Consider setting
fs.s3a.retry.http.5xx.errors to false when working with
such stores.

Signification Code changes:

There is now a custom set of implementations of
software.amazon.awssdk.http.ContentStreamProvidercontent in
the class org.apache.hadoop.fs.s3a.impl.UploadContentProviders.

These:

* Restart on failures
* Do not copy buffers/byte buffers into new private byte arrays,
  so avoid exacerbating memory problems..

There new IOStatistics for specific http error codes -these are collected
even when all recovery is performed within the SDK.
  
S3ABlockOutputStream has major changes, including handling of
Thread.interrupt() on the main thread, which now triggers and briefly
awaits cancellation of any ongoing uploads.

If the writing thread is interrupted in close(), it is mapped to
an InterruptedIOException. Applications like Hive and Spark must
catch these after cancelling a worker thread.

Contributed by Steve Loughran
2024-09-13 20:02:14 +01:00
..
hadoop-aliyun HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686) 2024-08-14 14:43:00 +01:00
hadoop-archive-logs Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archives HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-aws HADOOP-19221. S3A: Unable to recover from failure of multipart block upload attempt (#6938) 2024-09-13 20:02:14 +01:00
hadoop-azure Revert "HADOOP-19231. Add JacksonUtil to manage Jackson classes (#6953)" 2024-08-29 14:42:03 +05:30
hadoop-azure-datalake Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-benchmark Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-compat-bench HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00
hadoop-datajoin HADOOP-19134. Use StringBuilder instead of StringBuffer. (#6692). Contributed by PJ Fanning 2024-08-18 21:29:12 +05:30
hadoop-distcp HADOOP-19134. Use StringBuilder instead of StringBuffer. (#6692). Contributed by PJ Fanning 2024-08-18 21:29:12 +05:30
hadoop-dynamometer Revert "HADOOP-19231. Add JacksonUtil to manage Jackson classes (#6953)" 2024-08-29 14:42:03 +05:30
hadoop-extras HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-federation-balance Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-fs2img HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-gridmix HADOOP-16928. Make javadoc work on Java 17 (#6976) 2024-09-04 11:50:59 +01:00
hadoop-kafka Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-openstack Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-pipes Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-resourceestimator Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-rumen Revert "HADOOP-19231. Add JacksonUtil to manage Jackson classes (#6953)" 2024-08-29 14:42:03 +05:30
hadoop-sls Revert "HADOOP-19231. Add JacksonUtil to manage Jackson classes (#6953)" 2024-08-29 14:42:03 +05:30
hadoop-streaming HADOOP-16928. Make javadoc work on Java 17 (#6976) 2024-09-04 11:50:59 +01:00
hadoop-tools-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
pom.xml HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00