Go to file
Steve Loughran c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
.github HADOOP-18823. Add Labeler Github Action. (#5874). Contributed by Ayush Saxena. 2023-07-25 03:04:49 +05:30
.yetus Add .yetus/excludes.txt (#4984) 2022-10-11 09:23:34 -07:00
dev-support HADOOP-18135. Produce Windows binaries of Hadoop (#6673) 2024-04-09 22:15:05 +05:30
hadoop-assemblies HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
hadoop-build-tools Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-client-modules Revert "HADOOP-16822. Provide source artifacts for hadoop-client-api. Contributed by Karel Kolman." (#6458) 2024-04-10 12:03:59 +01:00
hadoop-cloud-storage-project HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-common-project HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759) 2024-05-11 15:37:43 +08:00
hadoop-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-hdfs-project HDFS-17522. JournalNode web interfaces lack configs for X-FRAME-OPTIONS protection (#6814). Contributed by wangzhihui. 2024-05-13 22:10:08 +08:00
hadoop-mapreduce-project MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
hadoop-maven-plugins HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-minicluster Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-project Bump org.apache.derby:derby in /hadoop-project (#6816) 2024-05-13 12:47:31 +08:00
hadoop-project-dist HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan. 2024-03-19 20:08:03 +08:00
hadoop-tools MAPREDUCE-7474. Improve Manifest committer resilience (#6716) 2024-05-13 21:12:34 +01:00
hadoop-yarn-project YARN-11689. Update the cgroup v2 init error handling (#6810) 2024-05-13 12:56:26 +02:00
licenses HADOOP-17144. Update Hadoop's lz4 to v1.9.2. Contributed by Hemanth Boyina. 2020-10-18 18:37:46 +05:30
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.asf.yaml HADOOP-18630. Add gh-pages in asf.yaml to deploy the current trunk doc (#5393). Contributed by Simhadri Govindappa. 2023-02-14 18:13:29 +05:30
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HADOOP-18963. Fix typos in .gitignore (#6243) 2023-11-04 05:12:39 +05:30
BUILDING.txt HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
LICENSE-binary HADOOP-19107. Drop support for HBase v1 & upgrade HBase v2 (#6629). Contributed by Ayush Saxena 2024-04-22 21:55:58 +05:30
LICENSE.txt YARN-11356. Upgrade DataTables to 1.11.5 to fix CVEs. Contributed by Bence Kosztolnik. 2022-10-26 22:29:01 +02:00
NOTICE-binary HADOOP-19046. S3A: update AWS V2 SDK to 2.23.5; v1 to 1.12.599 (#6467) 2024-01-21 19:00:34 +00:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-18052. Support Apple Silicon in start-build-env.sh (#3817) 2021-12-23 18:13:18 +09:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/