hadoop

History

Steve Loughran c9270600b7 MAPREDUCE-7474. Improve Manifest committer resilience (#6716 ) Improve task commit resilience everywhere and add an option to reduce delete IO requests on job cleanup (relevant for ABFS and HDFS). Task Commit Resilience ---------------------- Task manifest saving is re-attempted on failure; the number of attempts made is configurable with the option: mapreduce.manifest.committer.manifest.save.attempts * The default is 5. * The minimum is 1; asking for less is ignored. * A retry policy adds 500ms of sleep per attempt. * Move from classic rename() to commitFile() to rename the file, after calling getFileStatus() to get its length and possibly etag. This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach the ResilientCommitByRename callbacks in abfs, which report on the outcome to the caller...which is then logged at WARN. * New statistic task_stage_save_summary_file to distinguish from other saving operations (job success/report file). This is only saved to the manifest on task commit retries, and provides statistics on all previous unsuccessful attempts to save the manifests + test changes to match the codepath changes, including improvements in fault injection. Directory size for deletion --------------------------- New option mapreduce.manifest.committer.cleanup.parallel.delete.base.first This attempts an initial attempt at deleting the base dir, only falling back to parallel deletes if there's a timeout. This option is disabled by default; Consider enabling it for abfs to reduce IO load. Consult the documentation for more details. Success file printing --------------------- The command to print a JSON _SUCCESS file from this committer and any S3A committer is now something which can be invoked from the mapred command: mapred successfile <path to file> Contributed by Steve Loughran		2024-05-13 21:12:34 +01:00
..
dev-support	HADOOP-19129: [ABFS] Test Fixes and Test Script Bug Fixes (#6676 )	2024-04-12 17:52:47 +01:00
src	MAPREDUCE-7474. Improve Manifest committer resilience (#6716 )	2024-05-13 21:12:34 +01:00
.gitignore	HADOOP-17912. ABFS: Support for Encryption Context (#6221 )	2024-01-01 19:09:44 +00:00
pom.xml	MAPREDUCE-7474. Improve Manifest committer resilience (#6716 )	2024-05-13 21:12:34 +01:00