Go to file
Steve Loughran e221231e81
HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308)
This adds borad support for Amazon S3 Express One Zone to the S3A connector,
particularly resilience of other parts of the codebase to LIST operations returning
paths under which only in-progress uploads are taking place.

hadoop-common and hadoop-mapreduce treewalking routines all cope with this;
distcp is left alone.

There are still some outstanding followup issues, and we expect more to surface
with extended use.

Contains HADOOP-18955. AWS SDK v2: add path capability probe "fs.s3a.capability.aws.v2
* lets us probe for AWS SDK version
* bucket-info reports it

Contains HADOOP-18961 S3A: add s3guard command "bucket"

hadoop s3guard bucket -create -region us-west-2 -zone usw2-az2 \
  s3a://stevel--usw2-az2--x-s3/

* requires -zone if bucket is zonal
* rejects it if not
* rejects zonal bucket suffixes if endpoint is not aws (safety feature)
* imperfect, but a functional starting point.

New path capability "fs.s3a.capability.zonal.storage"
* Used in tests to determine whether pending uploads manifest paths
* cli tests can probe for this
* bucket-info reports it
* some tests disable/change assertions as appropriate

----

Shell commands fail on S3Express buckets if pending uploads.

New path capability in hadoop-common
   "fs.capability.directory.listing.inconsistent"

1. S3AFS returns true on a S3 Express bucket
2. FileUtil.maybeIgnoreMissingDirectory(fs, path, fnfe)
   decides whether to swallow the exception or not.
3. This is used in: Shell, FileInputFormat, LocatedFileStatusFetcher

Fixes with tests
* fs -ls -R
* fs -du
* fs -df
* fs -find
* S3AFS.getContentSummary() (maybe...should discuss)
* mapred LocatedFileStatusFetcher
* Globber, HADOOP-15478 already fixed that when dealing with
   S3 inconsistencies
* FileInputFormat

S3Express CreateSession request is permitted outside audit spans.

S3 Bulk Delete calls request the store to return the list of deleted objects
if RequestFactoryImpl is set to trace.
log4j.logger.org.apache.hadoop.fs.s3a.impl.RequestFactoryImpl=TRACE

Test Changes
 * ITestS3AMiscOperations removes all tests which require unencrypted
   buckets. AWS S3 defaults to SSE-S3 everywhere.
 * ITestBucketTool to test new tool without actually creating new
   buckets.
 * S3ATestUtils add methods to skip test suites/cases if store is/is not
   S3Express
 * Cutting down on "is this a S3Express bucket" logic to trailing --x-s3 string
   and not worrying about AZ naming logic. commented out relevant tests.
 * ITestTreewalkProblems validated against standard and S3Express stores

Outstanding

 * Distcp: tests show it fails. Proposed: release notes.

---

x-amz-checksum header not found when signing S3Express messages

This modifies the custom signer in ITestCustomSigner to be a subclass
of AwsS3V4Signer with a goal of preventing signing problems with
S3 Express stores.

----

RemoteFileChanged renaming multipart file

Maps 412 status code to RemoteFileChangedException

Modifies huge file tests
-Adds a check on etag match for stat vs list
-ITestS3AHugeFilesByteBufferBlocks renames parent dirs, rather than
 files, to replicate distcp better.

----

S3Express custom Signing cannot handle bulk delete

Copy custom signer into production JAR, so enable downstream testing

Extend ITestCustomSigner to cover more filesystem operations
- PUT
- POST
- COPY
- LIST
- Bulk delete through delete() and rename()
- list + abort multipart uploads

Suite is parameterized on bulk delete enabled/disabled.

To use the new signer for a full test run:

<property>
  <name>fs.s3a.custom.signers</name>
  <value>CustomSdkSigner:org.apache.hadoop.fs.s3a.auth.CustomSdkSigner</value>
</property>

<property>
  <name>fs.s3a.s3.signing-algorithm</name>
  <value>CustomSdkSigner</value>
</property>
2023-12-01 14:16:33 +00:00
.github HADOOP-18823. Add Labeler Github Action. (#5874). Contributed by Ayush Saxena. 2023-07-25 03:04:49 +05:30
.yetus Add .yetus/excludes.txt (#4984) 2022-10-11 09:23:34 -07:00
dev-support HDFS-17246. Fix shaded client for building Hadoop on Windows (#5943) 2023-11-01 09:10:15 -07:00
hadoop-assemblies HDFS-15346. FedBalance tool implementation. Contributed by Jinglun. 2020-06-18 13:33:25 +08:00
hadoop-build-tools HADOOP-17968 Migrate checkstyle module illegalimport to maven enforcer banned-illegal-imports (#3584) 2021-10-28 15:57:15 +09:00
hadoop-client-modules HADOOP-18916. Exclude all module-info classes from uber jars (#6131) 2023-10-13 20:01:44 +01:00
hadoop-cloud-storage-project HADOOP-18890. Remove use of okhttp in runtime code (#6057) 2023-09-19 12:38:36 +01:00
hadoop-common-project HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308) 2023-12-01 14:16:33 +00:00
hadoop-dist HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun. 2023-06-11 11:38:13 +05:30
hadoop-hdfs-project HDFS-17261. RBF: Fix getFileInfo return wrong path when get mountTable path which is multi-level (#6288). Contributed by liuguanghua. 2023-12-01 17:05:58 +05:30
hadoop-mapreduce-project HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308) 2023-12-01 14:16:33 +00:00
hadoop-maven-plugins HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
hadoop-minicluster HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-project S3A: Upgrade AWS SDK version to 2.21.33 for Amazon S3 Express One Zone support (#6306) 2023-11-29 13:16:19 +00:00
hadoop-project-dist HADOOP-18751. Fix incorrect output path in javadoc build phase (#5688) 2023-06-26 15:52:17 -07:00
hadoop-tools HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308) 2023-12-01 14:16:33 +00:00
hadoop-yarn-project HADOOP-18924. Upgrade to grpc 1.53.0 due to CVEs (#6161). Contributed by PJ Fanning. 2023-12-01 09:53:47 +05:30
licenses HADOOP-17144. Update Hadoop's lz4 to v1.9.2. Contributed by Hemanth Boyina. 2020-10-18 18:37:46 +05:30
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.asf.yaml HADOOP-18630. Add gh-pages in asf.yaml to deploy the current trunk doc (#5393). Contributed by Simhadri Govindappa. 2023-02-14 18:13:29 +05:30
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HADOOP-18963. Fix typos in .gitignore (#6243) 2023-11-04 05:12:39 +05:30
BUILDING.txt HADOOP-18487. Protobuf 2.5 removal part 2: stop exporting protobuf-2.5 (#6185) 2023-11-06 17:52:05 +00:00
LICENSE-binary HADOOP-18924. Upgrade to grpc 1.53.0 due to CVEs (#6161). Contributed by PJ Fanning. 2023-12-01 09:53:47 +05:30
LICENSE.txt YARN-11356. Upgrade DataTables to 1.11.5 to fix CVEs. Contributed by Bence Kosztolnik. 2022-10-26 22:29:01 +02:00
NOTICE-binary HADOOP-18890. Remove use of okhttp in runtime code (#6057) 2023-09-19 12:38:36 +01:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml HADOOP-18957. Use StandardCharsets.UTF_8 (#6231). Contributed by PJ Fanning. 2023-11-20 23:44:48 +05:30
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-18052. Support Apple Silicon in start-build-env.sh (#3817) 2021-12-23 18:13:18 +09:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/