Go to file
Steve Loughran 81edbebdd8
HADOOP-18889. S3A v2 SDK third party support (#6141)
Tune AWS v2 SDK changes based on testing with third party stores
including GCS. 

Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs

* Changes needed to work with multiple third party stores
* New third_party_stores document on how to bind to and test
  third party stores, including google gcs (which works!)
* Troubleshooting docs mostly updated for v2 SDK

Exception translation/resilience

* New AWSUnsupportedFeatureException for unsupported/unavailable errors
* Handle 501 method unimplemented as one of these
* Error codes > 500 mapped to the AWSStatus500Exception if no explicit
  handler.
* Precondition errors handled a bit better
* GCS throttle exception also recognized.
* GCS raises 404 on a delete of a file which doesn't exist: swallow it.
* Error translation uses reflection to create IOE of the right type.
  All IOEs at the bottom of an AWS stack chain are regenerated.
  then a new exception of that specific type is created, with the top level ex
  its cause. This is done to retain the whole stack chain.
* Reduce the number of retries within the AWS SDK
* And those of s3a code.
* S3ARetryPolicy explicitly declare SocketException as connectivity failure
  but subclasses BindException
* SocketTimeoutException also considered connectivity  
* Log at debug whenever retry policies looked up
* Reorder exceptions to alphabetical order, with commentary
* Review use of the Invoke.retry() method 

 The reduction in retries is because its clear when you try to create a bucket
 which doesn't resolve that the time for even an UnknownHostException to
 eventually fail over 90s, which then hit the s3a retry code.
 - Reducing the SDK retries means these escalate to our code better.
 - Cutting back on our own retries makes it a bit more responsive for most real
 deployments.
 - maybeTranslateNetworkException() and s3a retry policy means that
   unknown host exception is recognised and fails fast.

Contributed by Steve Loughran
2023-10-12 17:47:44 +01:00
.github HADOOP-18823. Add Labeler Github Action. (#5874). Contributed by Ayush Saxena. 2023-07-25 03:04:49 +05:30
.yetus Add .yetus/excludes.txt (#4984) 2022-10-11 09:23:34 -07:00
dev-support HADOOP-18789. Remove ozone from hadoop dev support. (#5800). Contributed by Xiaoqiao He. 2023-07-02 15:23:32 +08:00
hadoop-assemblies HDFS-15346. FedBalance tool implementation. Contributed by Jinglun. 2020-06-18 13:33:25 +08:00
hadoop-build-tools HADOOP-17968 Migrate checkstyle module illegalimport to maven enforcer banned-illegal-imports (#3584) 2021-10-28 15:57:15 +09:00
hadoop-client-modules HADOOP-18929. Exclude commons-compress module-info.class (#6170) 2023-10-11 12:50:37 -05:00
hadoop-cloud-storage-project HADOOP-18890. Remove use of okhttp in runtime code (#6057) 2023-09-19 12:38:36 +01:00
hadoop-common-project HADOOP-18889. S3A v2 SDK third party support (#6141) 2023-10-12 17:47:44 +01:00
hadoop-dist HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-hdfs-project HDFS-17208. Add the metrics PendingAsyncDiskOperations in datanode (#6109). Contributed by Haiyang Hu. 2023-10-12 23:27:15 +08:00
hadoop-mapreduce-project MAPREDUCE-7453. Revert HADOOP-18649. (#6102). Contributed by zhengchenyu. 2023-10-01 17:25:32 +05:30
hadoop-maven-plugins HADOOP-18441. Remove hadoop custom ServicesResourceTransformer (#4850). Contributed by PJ Fanning. 2022-09-07 17:11:12 +05:30
hadoop-minicluster HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-project HADOOP-18917. Upgrade to commons-io 2.14.0 (#6133). Contributed by PJ Fanning 2023-10-06 01:58:21 +05:30
hadoop-project-dist HADOOP-18751. Fix incorrect output path in javadoc build phase (#5688) 2023-06-26 15:52:17 -07:00
hadoop-tools HADOOP-18889. S3A v2 SDK third party support (#6141) 2023-10-12 17:47:44 +01:00
hadoop-yarn-project YARN-11588. [Federation] Fix uncleaned threads in yarn router thread pool executor (#6159) Contributed by Jeffrey Chang. 2023-10-12 19:13:44 +08:00
licenses HADOOP-17144. Update Hadoop's lz4 to v1.9.2. Contributed by Hemanth Boyina. 2020-10-18 18:37:46 +05:30
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.asf.yaml HADOOP-18630. Add gh-pages in asf.yaml to deploy the current trunk doc (#5393). Contributed by Simhadri Govindappa. 2023-02-14 18:13:29 +05:30
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HADOOP-18774. Add .vscode to gitignore. (#5756). Contributed by Xiaoqiao He. 2023-06-18 14:08:38 +08:00
BUILDING.txt HADOOP-18506. Update build instructions for Windows using VS2019 (#5066) 2022-10-24 09:28:29 -07:00
LICENSE-binary HADOOP-18917. Upgrade to commons-io 2.14.0 (#6133). Contributed by PJ Fanning 2023-10-06 01:58:21 +05:30
LICENSE.txt YARN-11356. Upgrade DataTables to 1.11.5 to fix CVEs. Contributed by Bence Kosztolnik. 2022-10-26 22:29:01 +02:00
NOTICE-binary HADOOP-18890. Remove use of okhttp in runtime code (#6057) 2023-09-19 12:38:36 +01:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml HADOOP-18923. Switch to SPDX identifier for license name (#6149). Contributed by Colm O hEigeartaigh. 2023-10-07 22:50:38 +05:30
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-18052. Support Apple Silicon in start-build-env.sh (#3817) 2021-12-23 18:13:18 +09:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/