Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).
Task Commit Resilience
----------------------
Task manifest saving is re-attempted on failure; the number of
attempts made is configurable with the option:
mapreduce.manifest.committer.manifest.save.attempts
* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
after calling getFileStatus() to get its length and possibly etag.
This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
the ResilientCommitByRename callbacks in abfs, which report on
the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
other saving operations (job success/report file).
This is only saved to the manifest on task commit retries, and
provides statistics on all previous unsuccessful attempts to save
the manifests
+ test changes to match the codepath changes, including improvements
in fault injection.
Directory size for deletion
---------------------------
New option
mapreduce.manifest.committer.cleanup.parallel.delete.base.first
This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.
This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.
Success file printing
---------------------
The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:
mapred successfile <path to file>
Contributed by Steve Loughran
This commit includes the following changes:
HADOOP-13356. Add a function to handle command_subcommand_OPTS
HADOOP-13355. Handle HADOOP_CLIENT_OPTS in a function
HADOOP-13554. Add an equivalent of hadoop_subcmd_opts for secure opts
HADOOP-13562. Change hadoop_subcommand_opts to use only uppercase
HADOOP-13358. Modify HDFS to use hadoop_subcommand_opts
HADOOP-13357. Modify common to use hadoop_subcommand_opts
HADOOP-13359. Modify YARN to use hadoop_subcommand_opts
HADOOP-13361. Modify hadoop_verify_user to be consistent with hadoop_subcommand_opts (ie more granularity)
HADOOP-13564. modify mapred to use hadoop_subcommand_opts
HADOOP-13563. hadoop_subcommand_opts should print name not actual content during debug
HADOOP-13360. Documentation for HADOOP_subcommand_OPTS
This closesapache/hadoop#126
This commit contains the following JIRA issues:
HADOOP-12931. bin/hadoop work for dynamic subcommands
HADOOP-12932. bin/yarn work for dynamic subcommands
HADOOP-12933. bin/hdfs work for dynamic subcommands
HADOOP-12934. bin/mapred work for dynamic subcommands
HADOOP-12935. API documentation for dynamic subcommands
HADOOP-12936. modify hadoop-tools to take advantage of dynamic subcommands
HADOOP-13086. enable daemonization of dynamic commands
HADOOP-13087. env var doc update for dynamic commands
HADOOP-13088. fix shellprofiles in hadoop-tools to allow replacement
HADOOP-13089. hadoop distcp adds client opts twice when dynamic
HADOOP-13094. hadoop-common unit tests for dynamic commands
HADOOP-13095. hadoop-hdfs unit tests for dynamic commands
HADOOP-13107. clean up how rumen is executed
HADOOP-13108. dynamic subcommands need a way to manipulate arguments
HADOOP-13110. add a streaming subcommand to mapred
HADOOP-13111. convert hadoop gridmix to be dynamic
HADOOP-13115. dynamic subcommand docs should talk about exit vs. continue program flow
HADOOP-13117. clarify daemonization and security vars for dynamic commands
HADOOP-13120. add a --debug message when dynamic commands have been used
HADOOP-13121. rename sub-project shellprofiles to match the rest of Hadoop
HADOOP-13129. fix typo in dynamic subcommand docs
HADOOP-13151. Underscores should be escaped in dynamic subcommands document
HADOOP-13153. fix typo in debug statement for dynamic subcommands