c9270600b7
Improve task commit resilience everywhere and add an option to reduce delete IO requests on job cleanup (relevant for ABFS and HDFS). Task Commit Resilience ---------------------- Task manifest saving is re-attempted on failure; the number of attempts made is configurable with the option: mapreduce.manifest.committer.manifest.save.attempts * The default is 5. * The minimum is 1; asking for less is ignored. * A retry policy adds 500ms of sleep per attempt. * Move from classic rename() to commitFile() to rename the file, after calling getFileStatus() to get its length and possibly etag. This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach the ResilientCommitByRename callbacks in abfs, which report on the outcome to the caller...which is then logged at WARN. * New statistic task_stage_save_summary_file to distinguish from other saving operations (job success/report file). This is only saved to the manifest on task commit retries, and provides statistics on all previous unsuccessful attempts to save the manifests + test changes to match the codepath changes, including improvements in fault injection. Directory size for deletion --------------------------- New option mapreduce.manifest.committer.cleanup.parallel.delete.base.first This attempts an initial attempt at deleting the base dir, only falling back to parallel deletes if there's a timeout. This option is disabled by default; Consider enabling it for abfs to reduce IO load. Consult the documentation for more details. Success file printing --------------------- The command to print a JSON _SUCCESS file from this committer and any S3A committer is now something which can be invoked from the mapred command: mapred successfile <path to file> Contributed by Steve Loughran |
||
---|---|---|
.github | ||
.yetus | ||
dev-support | ||
hadoop-assemblies | ||
hadoop-build-tools | ||
hadoop-client-modules | ||
hadoop-cloud-storage-project | ||
hadoop-common-project | ||
hadoop-dist | ||
hadoop-hdfs-project | ||
hadoop-mapreduce-project | ||
hadoop-maven-plugins | ||
hadoop-minicluster | ||
hadoop-project | ||
hadoop-project-dist | ||
hadoop-tools | ||
hadoop-yarn-project | ||
licenses | ||
licenses-binary | ||
.asf.yaml | ||
.gitattributes | ||
.gitignore | ||
BUILDING.txt | ||
LICENSE-binary | ||
LICENSE.txt | ||
NOTICE-binary | ||
NOTICE.txt | ||
pom.xml | ||
README.txt | ||
start-build-env.sh |
For the latest information about Hadoop, please visit our website at: http://hadoop.apache.org/ and our wiki, at: https://cwiki.apache.org/confluence/display/HADOOP/