hadoop/hadoop-tools
Steve Loughran c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
..
hadoop-aliyun Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archive-logs Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archives HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-aws HADOOP-19146. S3A: noaa-cors-pds test bucket access with global endpoint fails (#6723) 2024-04-30 12:16:36 +01:00
hadoop-azure MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
hadoop-azure-datalake Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-benchmark Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-compat-bench HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00
hadoop-datajoin Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-distcp HDFS-17216. Distcp: When handle the small files, the bandwidth parameter will be invalid, fix this bug. (#6138) 2024-03-28 10:31:06 -04:00
hadoop-dynamometer Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-extras HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-federation-balance Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-fs2img HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-gridmix HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-kafka Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-openstack Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-pipes Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-resourceestimator Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-rumen Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-sls Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-streaming HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-tools-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
pom.xml HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00