Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).
Task Commit Resilience
----------------------
Task manifest saving is re-attempted on failure; the number of
attempts made is configurable with the option:
mapreduce.manifest.committer.manifest.save.attempts
* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
after calling getFileStatus() to get its length and possibly etag.
This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
the ResilientCommitByRename callbacks in abfs, which report on
the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
other saving operations (job success/report file).
This is only saved to the manifest on task commit retries, and
provides statistics on all previous unsuccessful attempts to save
the manifests
+ test changes to match the codepath changes, including improvements
in fault injection.
Directory size for deletion
---------------------------
New option
mapreduce.manifest.committer.cleanup.parallel.delete.base.first
This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.
This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.
Success file printing
---------------------
The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:
mapred successfile <path to file>
Contributed by Steve Loughran
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607).
Log4j 1.2.17 has been replaced by reloadj 1.22.2
SLF4J is at 1.7.36
POM and LICENSE fixup of transient dependencies
* Exclude hadoop-cloud-storage imports which come in with hadoop-common
* Add explicit import of hadoop's org.codehaus.jettison declaration
to hadoop-aliyun
* Tune aliyun jars imports
* Update LICENSE-binary for the current set of libraries.
Contributed by Steve Loughran
Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests
* resilient rename
* tests for job commit through the manifest committer.
contains
- HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls
- HADOOP-16204. ABFS tests to include terasort
Contributed by Steve Loughran.
Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f
Addresses transient failures in the following test classes:
* ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure
* ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count
closes#3341
Contributed by Sumangala Patki
Removed findbugs from the hadoop build images and added spotbugs instead.
Upgraded SpotBugs to 4.2.2 and spotbugs-maven-plugin to 4.2.0.
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
Use spotbugs instead of findbugs. Removed findbugs from the hadoop build images,
and added spotbugs in the images instead.
Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
Introduces `openssl` as an option for `fs.s3a.ssl.channel.mode`.
The new option is documented and marked as experimental.
For details on how to use this, consult the peformance document
in the s3a documentation.
This patch is the successor to HADOOP-16050 "S3A SSL connections
should use OpenSSL" -which was reverted because of
incompatibilities between the wildfly OpenSSL client and the AWS
HTTPS servers (HADOOP-16347). With the Wildfly release moved up
to 1.0.7.Final (HADOOP-16405) everything should now work.
Related issues:
* HADOOP-15669. ABFS: Improve HTTPS Performance
* HADOOP-16050: S3A SSL connections should use OpenSSL
* HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8
* HADOOP-16405: Upgrade Wildfly Openssl version to 1.0.7.Final
Contributed by Sahil Takiar
Change-Id: I80a4bc5051519f186b7383b2c1cea140be42444e
Contributed by Jeetesh Mangwani.
This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver.
The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.
Contributed by Shane Mainali, Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, James Baker, Shaoyu Zhang, Lawrence Chen, Kevin Chen and Steve Loughran