hadoop/hadoop-common-project
Steve Loughran 56dee66770
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.
Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is
built up into batches of a thousand and then POSTed in a single large
DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes
per second *and* each entry in that POST counts as a single write, then
one of those posts alone can trigger throttling on an already loaded
S3 directory tree. Which can trigger backoff and retry, with the same
thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in
  fs.s3a.bulk.delete.page.size; the default is 250.
* The property fs.s3a.experimental.aws.s3.throttling (default=true)
  can be set to false to disable throttle retry logic in the AWS
  client SDK -it is all handled in the S3A client. This
  gives more visibility in to when operations are being throttled
* Bulk delete throttling events are logged to the log
  org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears
  often then choose a smaller page size.
* The metric "store_io_throttled" adds the entire count of delete
  requests when a single DeleteObjects request is throttled.
* A new quantile, "store_io_throttle_rate" can track throttling
  load over time.
* DynamoDB metastore throttle resilience issues have also been
  identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling
  flag does not apply to DDB IO precisely because there may still be
  lurking issues there and it safest to rely on the DynamoDB client
  SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84
2020-02-13 19:09:49 +00:00
..
hadoop-annotations HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-auth HDFS-15136. LOG flooding in secure mode when Cookies are not set in request header. Contributed by Renukaprasad C 2020-02-08 01:17:59 +05:30
hadoop-auth-examples HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-common HADOOP-16823. Large DeleteObject requests are their own Thundering Herd. 2020-02-13 19:09:49 +00:00
hadoop-kms HADOOP-16596. [pb-upgrade] Use shaded protobuf classes from hadoop-thirdparty dependency (#1635). Contributed by Vinayakumar B. 2020-02-07 14:51:24 +05:30
hadoop-minikdc HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-nfs HADOOP-16510. [hadoop-common] Fix order of actual and expected expression in assert statements. Contributed by Adam Antal 2019-10-31 14:35:20 +01:00
hadoop-registry HADOOP-16596. [pb-upgrade] Use shaded protobuf classes from hadoop-thirdparty dependency (#1635). Contributed by Vinayakumar B. 2020-02-07 14:51:24 +05:30
pom.xml HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00