Commit Graph

69 Commits

Author SHA1 Message Date
Steve Loughran
c9270600b7
MAPREDUCE-7474. Improve Manifest committer resilience (#6716)
Improve task commit resilience everywhere
and add an option to reduce delete IO requests on
job cleanup (relevant for ABFS and HDFS).

Task Commit Resilience
----------------------

Task manifest saving is re-attempted on failure; the number of 
attempts made is configurable with the option:

  mapreduce.manifest.committer.manifest.save.attempts

* The default is 5.
* The minimum is 1; asking for less is ignored.
* A retry policy adds 500ms of sleep per attempt.
* Move from classic rename() to commitFile() to rename the file,
  after calling getFileStatus() to get its length and possibly etag.
  This becomes a rename() on gcs/hdfs anyway, but on abfs it does reach
  the ResilientCommitByRename callbacks in abfs, which report on
  the outcome to the caller...which is then logged at WARN.
* New statistic task_stage_save_summary_file to distinguish from
  other saving operations (job success/report file).
  This is only saved to the manifest on task commit retries, and
  provides statistics on all previous unsuccessful attempts to save
  the manifests
+ test changes to match the codepath changes, including improvements
  in fault injection.

Directory size for deletion
---------------------------

New option

  mapreduce.manifest.committer.cleanup.parallel.delete.base.first

This attempts an initial attempt at deleting the base dir, only falling
back to parallel deletes if there's a timeout.

This option is disabled by default; Consider enabling it for abfs to
reduce IO load. Consult the documentation for more details.

Success file printing
---------------------

The command to print a JSON _SUCCESS file from this committer and
any S3A committer is now something which can be invoked from
the mapred command:

  mapred successfile <path to file>

Contributed by Steve Loughran
2024-05-13 21:12:34 +01:00
PJ Fanning
06db6289cb
HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
Steve Loughran
095dfcca30
HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>


Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607). 

Log4j 1.2.17 has been replaced by reloadj 1.22.2
SLF4J is at 1.7.36
2024-02-13 16:33:51 +00:00
slfan1989
8444f69511
Preparing for 3.5.0 development (#6411)
Co-authored-by: slfan1989 <slfan1989@apache.org>
2024-01-19 15:05:22 +08:00
Ayush Saxena
1d0c9ab433
Revert "HADOOP-18207. Introduce hadoop-logging module (#5503)"
This reverts commit 03a499821c.
2023-06-05 09:34:40 +05:30
Viraj Jasani
03a499821c
HADOOP-18207. Introduce hadoop-logging module (#5503)
Reviewed-by: Duo Zhang <zhangduo@apache.org>
2023-06-02 18:07:34 -07:00
Steve Loughran
dcd9dc6983
HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429)
POM and LICENSE fixup of transient dependencies
* Exclude hadoop-cloud-storage imports which come in with hadoop-common
* Add explicit import of hadoop's org.codehaus.jettison declaration
  to hadoop-aliyun
* Tune aliyun jars imports
* Update LICENSE-binary for the current set of libraries.

Contributed by Steve Loughran
2023-02-28 10:48:54 +00:00
Sumangala Patki
7bcf853ff4
HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699)
Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute

Contributed by Sumangala Patki
2022-09-06 11:00:52 +01:00
Steve Loughran
8294bd5a37
HADOOP-18163. hadoop-azure support for the Manifest Committer of MAPREDUCE-7341
Follow-on patch to MAPREDUCE-7341, adding ABFS support and tests

* resilient rename
* tests for job commit through the manifest committer.

contains
- HADOOP-17976. ABFS etag extraction inconsistent between LIST and HEAD calls
- HADOOP-16204. ABFS tests to include terasort

Contributed by Steve Loughran.

Change-Id: I0a7d4043bdf19bcb00c033fc389730109b93b77f
2022-03-17 11:24:51 +00:00
Viraj Jasani
04b6b9a87b
HADOOP-16908. Prune Jackson 1 from the codebase and restrict it's usage for future (#3789)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2021-12-20 16:01:34 +09:00
Steve Loughran
45f164a854 Revert "HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341)"
This reverts commit 82658a22d6.
2021-11-05 14:21:15 +00:00
sumangala-patki
82658a22d6
HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341)
Addresses transient failures in the following test classes:

* ITestAbfsStreamStatistics: Uses a filesystem level static instance to record read/write statistics, which also tracks these operations in other tests running in parallel. Marked for sequential-only run to avoid transient failure

* ITestAbfsRestOperationException: The use of a static member to track retry count causes transient failures when two tests of this class happen to run together. Switch to non-static variable for assertions on retry count

closes #3341

Contributed by Sumangala Patki
2021-11-04 14:40:37 +00:00
Viraj Jasani
516f36c6f1
HADOOP-17967. Keep restrict-imports-enforcer-rule for Guava VisibleForTesting in hadoop-main pom (#3555) 2021-10-21 16:54:25 +09:00
Viraj Jasani
79e5a7f3e3
HADOOP-17962. Replace Guava VisibleForTesting by Hadoop's own annotation in hadoop-tools modules (#3540) 2021-10-14 17:43:32 +09:00
Mukund Thakur
9b8f81a179
HADOOP-17156. ABFS: Release the byte buffers held by input streams in close() (#3285)
Contributed By: Mukund Thakur
2021-09-07 15:13:36 +05:30
Viraj Jasani
4ef27a596f
HADOOP-17753. Keep restrict-imports-enforcer-rule for Guava Lists in top level hadoop-main pom (#3087) 2021-06-11 12:15:52 +09:00
Viraj Jasani
f4b24c68e7
HADOOP-17743. Replace Guava Lists usage by Hadoop's own Lists in hadoop-common, hadoop-tools and cloud-storage projects (#3072) 2021-06-07 13:24:09 +09:00
Akira Ajisaka
23b343aed1
HADOOP-16870. Use spotbugs-maven-plugin instead of findbugs-maven-plugin (#2753)
Removed findbugs from the hadoop build images and added spotbugs instead.
Upgraded SpotBugs to 4.2.2 and spotbugs-maven-plugin to 4.2.0.

Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
2021-03-11 10:56:07 +09:00
Akira Ajisaka
9a298d180d
Revert "HADOOP-16870. Use spotbugs-maven-plugin instead of findbugs-maven-plugin (#2454)"
This reverts commit 4cf3531583.
2021-02-19 11:09:10 +09:00
Akira Ajisaka
4cf3531583
HADOOP-16870. Use spotbugs-maven-plugin instead of findbugs-maven-plugin (#2454)
Use spotbugs instead of findbugs. Removed findbugs from the hadoop build images,
and added spotbugs in the images instead.

Reviewed-by: Masatake Iwasaki <iwasakims@apache.org>
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Dinesh Chitlangia <dineshc@apache.org>
2021-02-17 10:38:20 +09:00
Sneha Vijayarajan
b612c310c2
HADOOP-17404. ABFS: Small write - Merge append and flush
- Contributed by Sneha Vijayarajan
2021-01-06 10:43:37 -08:00
Ayush Saxena
1e3a6efcef
HADOOP-17288. Use shaded guava from thirdparty. (#2342). Contributed by Ayush Saxena. 2020-10-17 12:01:18 +05:30
Steve Loughran
9f407bcc88
HADOOP-17107. hadoop-azure parallel tests not working on recent JDKs (#2118)
Contributed by Steve Loughran.
2020-07-20 10:51:26 +01:00
bilaharith
0ad0102678
HADOOP-16855. Changing wildfly dependency scope in hadoop-azure to compile
Contributed by Biliharith
2020-04-14 19:17:12 +01:00
Brahma Reddy Battula
8914cf9167 Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
bilaharith
0b931f36ec
Hadoop 16890. Change in expiry calculation for MSI token provider.
Contributed by Bilahari T H
2020-03-11 20:39:10 +00:00
Sahil Takiar
f206b736f0
HADOOP-16346. Stabilize S3A OpenSSL support.
Introduces `openssl` as an option for `fs.s3a.ssl.channel.mode`.
The new option is documented and marked as experimental.

For details on how to use this, consult the peformance document
in the s3a documentation.

This patch is the successor to HADOOP-16050 "S3A SSL connections
should use OpenSSL" -which was reverted because of
incompatibilities between the wildfly OpenSSL client and the AWS
HTTPS servers (HADOOP-16347). With the Wildfly release moved up
to 1.0.7.Final (HADOOP-16405) everything should now work.

Related issues:

* HADOOP-15669. ABFS: Improve HTTPS Performance
* HADOOP-16050: S3A SSL connections should use OpenSSL
* HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8
* HADOOP-16405: Upgrade Wildfly Openssl version to 1.0.7.Final

Contributed by Sahil Takiar

Change-Id: I80a4bc5051519f186b7383b2c1cea140be42444e
2020-01-21 16:37:51 +00:00
Jeetesh Mangwani
b033c681e4
HADOOP-16612. Track Azure Blob File System client-perceived latency
Contributed by Jeetesh Mangwani.

This add the ability to track the end-to-end performance of ADLS Gen 2 REST APIs by measuring latency in the Hadoop ABFS driver.
The latency information is sent back to the ADLS Gen 2 REST API endpoints in the subsequent requests.
2019-11-19 09:00:24 -08:00
Steve Loughran
309501c6fa
Revert "HADOOP-16050: s3a SSL connections should use OpenSSL"
This reverts commit b067f8acaa.

Change-Id: I584b050a56c0e6f70b11fa3f7db00d5ac46e7dd8
2019-06-05 13:54:55 +01:00
Akira Ajisaka
afd844059c HADOOP-16331. Fix ASF License check in pom.xml
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:25:13 +09:00
Akira Ajisaka
9f933e6446
HADOOP-16323. https everywhere in Maven settings. 2019-05-27 15:24:59 +09:00
Sahil Takiar
b067f8acaa HADOOP-16050: s3a SSL connections should use OpenSSL
(cherry picked from commit aebf229c175dfa19fff3b31e9e67596f6c6124fa)
2019-05-16 08:57:54 -06:00
Da Zhou
05df151d09
HADOOP-16163. NPE in setup/teardown of ITestAbfsDelegationTokens.
Contributed by Da Zhou.

Signed-off-by: Steve Loughran <stevel@apache.org>
2019-03-05 14:02:34 +00:00
Steve Loughran
65f60e56b0
HADOOP-16068. ABFS Authentication and Delegation Token plugins to optionally be bound to specific URI of the store.
Contributed by Steve Loughran.
2019-02-28 14:22:32 +00:00
Akira Ajisaka
1129288cf5
HADOOP-14178. Move Mockito up to version 2.23.4. Contributed by Akira Ajisaka and Masatake Iwasaki. 2019-01-29 18:29:56 -08:00
Sunil G
58fa96b697 Changed version in trunk to 3.3.0-SNAPSHOT. 2018-10-02 22:41:41 +05:30
Steve Loughran
a4abf02028
HADOOP-15739. ABFS: remove unused maven dependencies and add used undeclared dependencies.
Contributed by Da Zhou.
2018-09-25 20:58:32 +01:00
Steve Loughran
51d368982b
HADOOP-15714. Tune abfs/wasb parallel and sequential test execution.
Contributed by Da Zhou.
2018-09-18 12:09:25 +01:00
Thomas Marquardt
4410eacba7 HADOOP-15664. ABFS: Reduce test run time via parallelization and grouping.
Contributed by Da Zhou.
2018-09-17 19:54:01 +00:00
Thomas Marquardt
d6a4f39bd5 HADOOP-15669. ABFS: Improve HTTPS Performance.
Contributed by Vishwajeet Dusane.
2018-09-17 19:54:01 +00:00
Thomas Marquardt
b54b0c1b67 HADOOP-15659. Code changes for bug fix and new tests.
Contributed by Da Zhou.
2018-09-17 19:54:01 +00:00
Steve Loughran
a271fd0eca HADOOP-15560. ABFS: removed dependency injection and unnecessary dependencies.
Contributed by Da Zhou.
2018-09-17 19:54:01 +00:00
Steve Loughran
f044deedbb HADOOP-15407. HADOOP-15540. Support Windows Azure Storage - Blob file system "ABFS" in Hadoop: Core Commit.
Contributed by Shane Mainali, Thomas Marquardt, Zichen Sun, Georgi Chalakov, Esfandiar Manii, Amit Singh, Dana Kaban, Da Zhou, Junhua Gu, Saher Ahwal, Saurabh Pant, James Baker, Shaoyu Zhang, Lawrence Chen, Kevin Chen and Steve Loughran
2018-09-17 19:54:01 +00:00
Steve Loughran
45d9568aaa
HADOOP-15547/ WASB: improve listStatus performance.
Contributed by Thomas Marquardt.

(cherry picked from commit 749fff577ed9afb4ef8a54b8948f74be083cc620)
2018-07-19 12:31:19 -07:00
Steve Loughran
ba051b0686
HADOOP-15354. hadoop-aliyun & hadoop-azure modules to mark hadoop-common as provided
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2018-05-10 18:38:08 +09:00
Wangda Tan
60f9e60b3b Preparing for 3.2.0 development
Change-Id: I6d0e01f3d665d26573ef2b957add1cf0cddf7938
2018-02-11 11:17:38 +08:00
Akira Ajisaka
6903cf096e HADOOP-13514. Upgrade maven surefire plugin to 2.20.1
Signed-off-by: Allen Wittenauer <aw@apache.org>
2017-11-19 12:39:37 -08:00
Steve Loughran
2d2d97fa7d
HADOOP-14553. Add (parallelized) integration tests to hadoop-azure
Contributed by Steve Loughran
2017-09-15 17:03:01 +01:00
Andrew Wang
0d419c984f Preparing for 3.1.0 development 2017-09-01 11:53:48 -07:00
Andrew Wang
af2773f609 Updating version for 3.0.0-beta1 development 2017-06-29 17:57:40 -07:00