hadoop/hadoop-tools
Steve Loughran 81edbebdd8
HADOOP-18889. S3A v2 SDK third party support (#6141)
Tune AWS v2 SDK changes based on testing with third party stores
including GCS. 

Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs

* Changes needed to work with multiple third party stores
* New third_party_stores document on how to bind to and test
  third party stores, including google gcs (which works!)
* Troubleshooting docs mostly updated for v2 SDK

Exception translation/resilience

* New AWSUnsupportedFeatureException for unsupported/unavailable errors
* Handle 501 method unimplemented as one of these
* Error codes > 500 mapped to the AWSStatus500Exception if no explicit
  handler.
* Precondition errors handled a bit better
* GCS throttle exception also recognized.
* GCS raises 404 on a delete of a file which doesn't exist: swallow it.
* Error translation uses reflection to create IOE of the right type.
  All IOEs at the bottom of an AWS stack chain are regenerated.
  then a new exception of that specific type is created, with the top level ex
  its cause. This is done to retain the whole stack chain.
* Reduce the number of retries within the AWS SDK
* And those of s3a code.
* S3ARetryPolicy explicitly declare SocketException as connectivity failure
  but subclasses BindException
* SocketTimeoutException also considered connectivity  
* Log at debug whenever retry policies looked up
* Reorder exceptions to alphabetical order, with commentary
* Review use of the Invoke.retry() method 

 The reduction in retries is because its clear when you try to create a bucket
 which doesn't resolve that the time for even an UnknownHostException to
 eventually fail over 90s, which then hit the s3a retry code.
 - Reducing the SDK retries means these escalate to our code better.
 - Cutting back on our own retries makes it a bit more responsive for most real
 deployments.
 - maybeTranslateNetworkException() and s3a retry policy means that
   unknown host exception is recognised and fails fast.

Contributed by Steve Loughran
2023-10-12 17:47:44 +01:00
..
hadoop-aliyun HADOOP-18458: AliyunOSSBlockOutputStream to support heap/off-heap buffer before uploading data to OSS (#4912) 2023-03-28 14:27:01 +08:00
hadoop-archive-logs HADOOP-18206 Cleanup the commons-logging references and restrict its usage in future (#5315) 2023-02-14 03:24:06 +08:00
hadoop-archives HADOOP-18548. Hadoop Archive tool (HAR) should acquire delegation tokens from source and destination file systems (#5355) 2023-03-30 07:12:02 +08:00
hadoop-aws HADOOP-18889. S3A v2 SDK third party support (#6141) 2023-10-12 17:47:44 +01:00
hadoop-azure HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003) 2023-10-09 20:05:23 +01:00
hadoop-azure-datalake HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429) 2023-02-28 10:48:54 +00:00
hadoop-benchmark HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-datajoin HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-distcp HDFS-17120. Support snapshot diff based copylisting for flat paths. (#5885) 2023-07-27 00:53:57 -07:00
hadoop-dynamometer HADOOP-18359. Update commons-cli from 1.2 to 1.5. (#5095). Contributed by Shilun Fan. 2023-05-10 01:42:12 +05:30
hadoop-extras HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-federation-balance HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-fs2img HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-gridmix HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-kafka HADOOP-17753. Keep restrict-imports-enforcer-rule for Guava Lists in top level hadoop-main pom (#3087) 2021-06-11 12:15:52 +09:00
hadoop-openstack HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
hadoop-pipes Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-resourceestimator YARN-11498. Add exclusion for jettison everywhere jersey-json is loaded (#5786) 2023-09-13 18:10:24 +01:00
hadoop-rumen Revert "HADOOP-18207. Introduce hadoop-logging module (#5503)" 2023-06-05 09:34:40 +05:30
hadoop-sls YARN-10680. Revisit try blocks without catch blocks but having finally blocks. Contributed by Susheel Gupta 2022-10-15 21:51:08 +02:00
hadoop-streaming HADOOP-18359. Update commons-cli from 1.2 to 1.5. (#5095). Contributed by Shilun Fan. 2023-05-10 01:42:12 +05:30
hadoop-tools-dist HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00