hadoop/hadoop-tools
Steve Loughran 56dee66770
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.
Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is
built up into batches of a thousand and then POSTed in a single large
DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes
per second *and* each entry in that POST counts as a single write, then
one of those posts alone can trigger throttling on an already loaded
S3 directory tree. Which can trigger backoff and retry, with the same
thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in
  fs.s3a.bulk.delete.page.size; the default is 250.
* The property fs.s3a.experimental.aws.s3.throttling (default=true)
  can be set to false to disable throttle retry logic in the AWS
  client SDK -it is all handled in the S3A client. This
  gives more visibility in to when operations are being throttled
* Bulk delete throttling events are logged to the log
  org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears
  often then choose a smaller page size.
* The metric "store_io_throttled" adds the entire count of delete
  requests when a single DeleteObjects request is throttled.
* A new quantile, "store_io_throttle_rate" can track throttling
  load over time.
* DynamoDB metastore throttle resilience issues have also been
  identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling
  flag does not apply to DDB IO precisely because there may still be
  lurking issues there and it safest to rely on the DynamoDB client
  SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84
2020-02-13 19:09:49 +00:00
..
hadoop-aliyun HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-archive-logs HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-archives HADOOP-16512. [hadoop-tools] Fix order of actual and expected expression in assert statements 2019-10-07 16:38:08 +09:00
hadoop-aws HADOOP-16823. Large DeleteObject requests are their own Thundering Herd. 2020-02-13 19:09:49 +00:00
hadoop-azure HADOOP-16825: ITestAzureBlobFileSystemCheckAccess failing. 2020-02-06 18:48:00 +00:00
hadoop-azure-datalake HADOOP-16605. Fix testcase testSSLChannelModeConfig 2019-10-03 11:13:55 +01:00
hadoop-datajoin HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-distcp HADOOP-16596. [pb-upgrade] Use shaded protobuf classes from hadoop-thirdparty dependency (#1635). Contributed by Vinayakumar B. 2020-02-07 14:51:24 +05:30
hadoop-dynamometer YARN-10083. Provide utility to ask whether an application is in final status. Contributed by Adam Antal 2020-01-22 16:25:07 +01:00
hadoop-extras HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-fs2img HADOOP-16596. [pb-upgrade] Use shaded protobuf classes from hadoop-thirdparty dependency (#1635). Contributed by Vinayakumar B. 2020-02-07 14:51:24 +05:30
hadoop-gridmix HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-kafka HADOOP-16512. [hadoop-tools] Fix order of actual and expected expression in assert statements 2019-10-07 16:38:08 +09:00
hadoop-openstack HADOOP-16431. Remove useless log in IOUtils.java and ExceptionDiags.java. 2019-07-24 10:04:39 +09:00
hadoop-pipes HADOOP-16739. Fix native build failure of hadoop-pipes on CentOS 8. 2020-02-10 13:13:11 +09:00
hadoop-resourceestimator HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-rumen HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-sls YARN-10015. Correct the sample command in SLS README file. Contributed by Aihua Xu. 2020-01-28 17:47:49 -08:00
hadoop-streaming HADOOP-16764. Rewrite Python example codes using Python3 (#1762) 2019-12-16 11:04:20 +09:00
hadoop-tools-dist HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
pom.xml HDFS-12345 Add Dynamometer to hadoop-tools, a tool for scale testing the HDFS NameNode with real metadata and workloads. Contributed by Erik Krogen. 2019-06-25 08:07:39 -07:00