hadoop/hadoop-mapreduce-project
Steve Loughran c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
..
bin MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
conf MAPREDUCE-6875. Rename mapred-site.xml.template to mapred-site.xml. (Yuanbo Liu via Haibo Chen) 2017-04-17 12:25:30 -07:00
dev-support HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan. 2024-03-19 20:08:03 +08:00
hadoop-mapreduce-client MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
hadoop-mapreduce-examples Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
lib/jdiff
shellprofile.d HADOOP-12930. Dynamic subcommands for hadoop shell scripts (aw) 2016-05-16 17:54:45 -07:00
.gitignore MAPREDUCE-6875. Rename mapred-site.xml.template to mapred-site.xml. (Yuanbo Liu via Haibo Chen) 2017-04-17 12:25:30 -07:00
pom.xml HADOOP-19019: Parallel Maven Build Support for Apache Hadoop (#6373). Contributed by JiaLiangC. 2024-01-23 14:51:20 +08:00