hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client
Steve Loughran c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
..
hadoop-mapreduce-client-app HADOOP-19077. Remove use of javax.ws.rs.core.HttpHeaders (#6554). Contributed by PJ Fanning 2024-04-01 12:43:39 +05:30
hadoop-mapreduce-client-common Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-mapreduce-client-core MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
hadoop-mapreduce-client-hs Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-mapreduce-client-hs-plugins Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-mapreduce-client-jobclient HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-mapreduce-client-nativetask Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-mapreduce-client-shuffle Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-mapreduce-client-uploader HADOOP-19114. Upgrade to commons-compress 1.26.1 due to CVEs. (#6636) 2024-04-03 19:32:15 +01:00
pom.xml HADOOP-18088. Replace log4j 1.x with reload4j. (#4052) 2024-02-13 16:33:51 +00:00