HADOOP-19057. S3A: Landsat bucket used in tests no longer accessible (#6515)
The AWS landsat data previously used in some S3A tests is no longer accessible This PR moves to the new external file s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz * Large enough file for scale tests * Bucket supports anonymous access * Ends in .gz to keep codec tests happy * No spaces in path to keep bucket-info happy Test Code Changes * Leaves the test key name alone: fs.s3a.scale.test.csvfile * Rename all methods and fields move remove "csv" from their names and move to "external file" we no longer require it to be CSV. * Path definition and helper methods have been moved to PublicDatasetTestUtils * Improve error reporting in ITestS3AInputStreamPerformance if the file is too short With S3 Select removed, there is no need for the file to be a CSV file; there is a test which tries to unzip it; other tests have a minimum file size. Consult the JIRA for the settings to add to auth-keys.xml to switch earlier builds to this same file. Contributed by Steve Loughran
This commit is contained in:
parent
5cbe52f4e8
commit
7651afd3db
@ -585,7 +585,7 @@ If an operation fails with an `AccessDeniedException`, then the role does not ha
|
||||
the permission for the S3 Operation invoked during the call.
|
||||
|
||||
```
|
||||
> hadoop fs -touch s3a://landsat-pds/a
|
||||
> hadoop fs -touch s3a://noaa-isd-pds/a
|
||||
|
||||
java.nio.file.AccessDeniedException: a: Writing Object on a:
|
||||
software.amazon.awssdk.services.s3.model.S3Exception: Access Denied
|
||||
|
@ -111,9 +111,9 @@ Specific buckets can have auditing disabled, even when it is enabled globally.
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.audit.enabled</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.audit.enabled</name>
|
||||
<value>false</value>
|
||||
<description>Do not audit landsat bucket operations</description>
|
||||
<description>Do not audit bucket operations</description>
|
||||
</property>
|
||||
```
|
||||
|
||||
@ -342,9 +342,9 @@ either globally or for specific buckets:
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.audit.referrer.enabled</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.audit.referrer.enabled</name>
|
||||
<value>false</value>
|
||||
<description>Do not add the referrer header to landsat operations</description>
|
||||
<description>Do not add the referrer header to operations</description>
|
||||
</property>
|
||||
```
|
||||
|
||||
|
@ -747,7 +747,7 @@ For example, for any job executed through Hadoop MapReduce, the Job ID can be us
|
||||
### `Filesystem does not have support for 'magic' committer`
|
||||
|
||||
```
|
||||
org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://landsat-pds': Filesystem does not have support for 'magic' committer enabled
|
||||
org.apache.hadoop.fs.s3a.commit.PathCommitException: `s3a://noaa-isd-pds': Filesystem does not have support for 'magic' committer enabled
|
||||
in configuration option fs.s3a.committer.magic.enabled
|
||||
```
|
||||
|
||||
@ -760,42 +760,15 @@ Remove all global/per-bucket declarations of `fs.s3a.bucket.magic.enabled` or se
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.committer.magic.enabled</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.committer.magic.enabled</name>
|
||||
<value>true</value>
|
||||
</property>
|
||||
```
|
||||
|
||||
Tip: you can verify that a bucket supports the magic committer through the
|
||||
`hadoop s3guard bucket-info` command:
|
||||
`hadoop s3guard bucket-info` command.
|
||||
|
||||
|
||||
```
|
||||
> hadoop s3guard bucket-info -magic s3a://landsat-pds/
|
||||
Location: us-west-2
|
||||
|
||||
S3A Client
|
||||
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
|
||||
Endpoint: fs.s3a.endpoint=s3.amazonaws.com
|
||||
Encryption: fs.s3a.encryption.algorithm=none
|
||||
Input seek policy: fs.s3a.experimental.input.fadvise=normal
|
||||
Change Detection Source: fs.s3a.change.detection.source=etag
|
||||
Change Detection Mode: fs.s3a.change.detection.mode=server
|
||||
|
||||
S3A Committers
|
||||
The "magic" committer is supported in the filesystem
|
||||
S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
|
||||
S3A Committer name: fs.s3a.committer.name=magic
|
||||
Store magic committer integration: fs.s3a.committer.magic.enabled=true
|
||||
|
||||
Security
|
||||
Delegation token support is disabled
|
||||
|
||||
Directory Markers
|
||||
The directory marker policy is "keep"
|
||||
Available Policies: delete, keep, authoritative
|
||||
Authoritative paths: fs.s3a.authoritative.path=```
|
||||
```
|
||||
|
||||
### Error message: "File being created has a magic path, but the filesystem has magic file support disabled"
|
||||
|
||||
A file is being written to a path which is used for "magic" files,
|
||||
|
@ -284,14 +284,13 @@ a bucket.
|
||||
The up to date list of regions is [Available online](https://docs.aws.amazon.com/general/latest/gr/s3.html).
|
||||
|
||||
This list can be used to specify the endpoint of individual buckets, for example
|
||||
for buckets in the central and EU/Ireland endpoints.
|
||||
for buckets in the us-west-2 and EU/Ireland endpoints.
|
||||
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.endpoint.region</name>
|
||||
<name>fs.s3a.bucket.us-west-2-dataset.endpoint.region</name>
|
||||
<value>us-west-2</value>
|
||||
<description>The region for s3a://landsat-pds URLs</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
@ -354,9 +353,9 @@ The boolean option `fs.s3a.endpoint.fips` (default `false`) switches the S3A con
|
||||
For a single bucket:
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.endpoint.fips</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.endpoint.fips</name>
|
||||
<value>true</value>
|
||||
<description>Use the FIPS endpoint for the landsat dataset</description>
|
||||
<description>Use the FIPS endpoint for the NOAA dataset</description>
|
||||
</property>
|
||||
```
|
||||
|
||||
|
@ -188,7 +188,7 @@ If it was deployed unbonded, the DT Binding is asked to create a new DT.
|
||||
|
||||
It is up to the binding what it includes in the token identifier, and how it obtains them.
|
||||
This new token identifier is included in a token which has a "canonical service name" of
|
||||
the URI of the filesystem (e.g "s3a://landsat-pds").
|
||||
the URI of the filesystem (e.g "s3a://noaa-isd-pds").
|
||||
|
||||
The issued/reissued token identifier can be marshalled and reused.
|
||||
|
||||
|
@ -481,8 +481,8 @@ This will fetch the token and save it to the named file (here, `tokens.bin`),
|
||||
even if Kerberos is disabled.
|
||||
|
||||
```bash
|
||||
# Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin
|
||||
$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin
|
||||
# Fetch a token for the AWS noaa-isd-pds bucket and save it to tokens.bin
|
||||
$ hdfs fetchdt --webservice s3a://noaa-isd-pds/ tokens.bin
|
||||
```
|
||||
|
||||
If the command fails with `ERROR: Failed to fetch token` it means the
|
||||
@ -498,11 +498,11 @@ host on which it was created.
|
||||
```bash
|
||||
$ bin/hdfs fetchdt --print tokens.bin
|
||||
|
||||
Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds;
|
||||
Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://noaa-isd-pds;
|
||||
timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3};
|
||||
Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.};
|
||||
Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid))
|
||||
for s3a://landsat-pds
|
||||
for s3a://noaa-isd-pds
|
||||
```
|
||||
The "(valid)" annotation means that the AWS credentials are considered "valid":
|
||||
there is both a username and a secret.
|
||||
@ -513,11 +513,11 @@ If delegation support is enabled, it also prints the current
|
||||
hadoop security level.
|
||||
|
||||
```bash
|
||||
$ hadoop s3guard bucket-info s3a://landsat-pds/
|
||||
$ hadoop s3guard bucket-info s3a://noaa-isd-pds/
|
||||
|
||||
Filesystem s3a://landsat-pds
|
||||
Filesystem s3a://noaa-isd-pds
|
||||
Location: us-west-2
|
||||
Filesystem s3a://landsat-pds is not using S3Guard
|
||||
Filesystem s3a://noaa-isd-pds is not using S3Guard
|
||||
The "magic" committer is not supported
|
||||
|
||||
S3A Client
|
||||
|
@ -314,9 +314,8 @@ All releases of Hadoop which have been updated to be marker aware will support t
|
||||
Example: `s3guard bucket-info -markers aware` on a compatible release.
|
||||
|
||||
```
|
||||
> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
|
||||
Filesystem s3a://landsat-pds
|
||||
Location: us-west-2
|
||||
> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
|
||||
Filesystem s3a://noaa-isd-pds
|
||||
|
||||
...
|
||||
|
||||
@ -326,13 +325,14 @@ Directory Markers
|
||||
Authoritative paths: fs.s3a.authoritative.path=
|
||||
The S3A connector is compatible with buckets where directory markers are not deleted
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
The same command will fail on older releases, because the `-markers` option
|
||||
is unknown
|
||||
|
||||
```
|
||||
> hadoop s3guard bucket-info -markers aware s3a://landsat-pds/
|
||||
> hadoop s3guard bucket-info -markers aware s3a://noaa-isd-pds/
|
||||
Illegal option -markers
|
||||
Usage: hadoop bucket-info [OPTIONS] s3a://BUCKET
|
||||
provide/check information about a specific bucket
|
||||
@ -354,9 +354,8 @@ Generic options supported are:
|
||||
A specific policy check verifies that the connector is configured as desired
|
||||
|
||||
```
|
||||
> hadoop s3guard bucket-info -markers keep s3a://landsat-pds/
|
||||
Filesystem s3a://landsat-pds
|
||||
Location: us-west-2
|
||||
> hadoop s3guard bucket-info -markers keep s3a://noaa-isd-pds/
|
||||
Filesystem s3a://noaa-isd-pds
|
||||
|
||||
...
|
||||
|
||||
@ -371,9 +370,8 @@ When probing for a specific policy, the error code "46" is returned if the activ
|
||||
does not match that requested:
|
||||
|
||||
```
|
||||
> hadoop s3guard bucket-info -markers delete s3a://landsat-pds/
|
||||
Filesystem s3a://landsat-pds
|
||||
Location: us-west-2
|
||||
> hadoop s3guard bucket-info -markers delete s3a://noaa-isd-pds/
|
||||
Filesystem s3a://noaa-isd-pds
|
||||
|
||||
S3A Client
|
||||
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
|
||||
@ -398,7 +396,7 @@ Directory Markers
|
||||
Authoritative paths: fs.s3a.authoritative.path=
|
||||
|
||||
2021-11-22 16:03:59,175 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210))
|
||||
-Exiting with status 46: 46: Bucket s3a://landsat-pds: required marker polic is
|
||||
-Exiting with status 46: 46: Bucket s3a://noaa-isd-pds: required marker polic is
|
||||
"keep" but actual policy is "delete"
|
||||
|
||||
```
|
||||
@ -450,10 +448,10 @@ Audit the path and fail if any markers were found.
|
||||
|
||||
|
||||
```
|
||||
> hadoop s3guard markers -limit 8000 -audit s3a://landsat-pds/
|
||||
> hadoop s3guard markers -limit 8000 -audit s3a://noaa-isd-pds/
|
||||
|
||||
The directory marker policy of s3a://landsat-pds is "Keep"
|
||||
2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://landsat-pds/
|
||||
The directory marker policy of s3a://noaa-isd-pds is "Keep"
|
||||
2020-08-05 13:42:56,079 [main] INFO tools.MarkerTool (DurationInfo.java:<init>(77)) - Starting: marker scan s3a://noaa-isd-pds/
|
||||
Scanned 1,000 objects
|
||||
Scanned 2,000 objects
|
||||
Scanned 3,000 objects
|
||||
@ -463,8 +461,8 @@ Scanned 6,000 objects
|
||||
Scanned 7,000 objects
|
||||
Scanned 8,000 objects
|
||||
Limit of scan reached - 8,000 objects
|
||||
2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://landsat-pds/: duration 0:05.107s
|
||||
No surplus directory markers were found under s3a://landsat-pds/
|
||||
2020-08-05 13:43:01,184 [main] INFO tools.MarkerTool (DurationInfo.java:close(98)) - marker scan s3a://noaa-isd-pds/: duration 0:05.107s
|
||||
No surplus directory markers were found under s3a://noaa-isd-pds/
|
||||
Listing limit reached before completing the scan
|
||||
2020-08-05 13:43:01,187 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 3:
|
||||
```
|
||||
|
@ -616,15 +616,14 @@ header.x-amz-version-id="KcDOVmznIagWx3gP1HlDqcZvm1mFWZ2a"
|
||||
A file with no-encryption (on a bucket without versioning but with intelligent tiering):
|
||||
|
||||
```
|
||||
bin/hadoop fs -getfattr -d s3a://landsat-pds/scene_list.gz
|
||||
bin/hadoop fs -getfattr -d s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
|
||||
|
||||
# file: s3a://landsat-pds/scene_list.gz
|
||||
header.Content-Length="45603307"
|
||||
header.Content-Type="application/octet-stream"
|
||||
header.ETag="39c34d489777a595b36d0af5726007db"
|
||||
header.Last-Modified="Wed Aug 29 01:45:15 BST 2018"
|
||||
header.x-amz-storage-class="INTELLIGENT_TIERING"
|
||||
header.x-amz-version-id="null"
|
||||
# file: s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz
|
||||
header.Content-Length="524671"
|
||||
header.Content-Type="binary/octet-stream"
|
||||
header.ETag=""3e39531220fbd3747d32cf93a79a7a0c""
|
||||
header.Last-Modified="Tue Jan 02 00:15:13 GMT 2024"
|
||||
header.x-amz-server-side-encryption="AES256"
|
||||
```
|
||||
|
||||
###<a name="changing-encryption"></a> Use `rename()` to encrypt files with new keys
|
||||
|
@ -503,7 +503,7 @@ explicitly opened up for broader access.
|
||||
```bash
|
||||
hadoop fs -ls \
|
||||
-D fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider \
|
||||
s3a://landsat-pds/
|
||||
s3a://noaa-isd-pds/
|
||||
```
|
||||
|
||||
1. Allowing anonymous access to an S3 bucket compromises
|
||||
@ -1630,11 +1630,11 @@ a session key:
|
||||
</property>
|
||||
```
|
||||
|
||||
Finally, the public `s3a://landsat-pds/` bucket can be accessed anonymously:
|
||||
Finally, the public `s3a://noaa-isd-pds/` bucket can be accessed anonymously:
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.aws.credentials.provider</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.aws.credentials.provider</name>
|
||||
<value>org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider</value>
|
||||
</property>
|
||||
```
|
||||
|
@ -447,7 +447,8 @@ An example of this is covered in [HADOOP-13871](https://issues.apache.org/jira/b
|
||||
|
||||
1. For public data, use `curl`:
|
||||
|
||||
curl -O https://landsat-pds.s3.amazonaws.com/scene_list.gz
|
||||
curl -O https://noaa-cors-pds.s3.amazonaws.com/raw/2023/001/akse/AKSE001a.23_.gz
|
||||
|
||||
1. Use `nettop` to monitor a processes connections.
|
||||
|
||||
|
||||
@ -696,7 +697,7 @@ via `FileSystem.get()` or `Path.getFileSystem()`.
|
||||
The cache, `FileSystem.CACHE` will, for each user, cachec one instance of a filesystem
|
||||
for a given URI.
|
||||
All calls to `FileSystem.get` for a cached FS for a URI such
|
||||
as `s3a://landsat-pds/` will return that singe single instance.
|
||||
as `s3a://noaa-isd-pds/` will return that singe single instance.
|
||||
|
||||
FileSystem instances are created on-demand for the cache,
|
||||
and will be done in each thread which requests an instance.
|
||||
@ -720,7 +721,7 @@ can be created simultaneously for different object stores/distributed
|
||||
filesystems.
|
||||
|
||||
For example, a value of four would put an upper limit on the number
|
||||
of wasted instantiations of a connector for the `s3a://landsat-pds/`
|
||||
of wasted instantiations of a connector for the `s3a://noaa-isd-pds/`
|
||||
bucket.
|
||||
|
||||
```xml
|
||||
|
@ -260,22 +260,20 @@ define the target region in `auth-keys.xml`.
|
||||
### <a name="csv"></a> CSV Data Tests
|
||||
|
||||
The `TestS3AInputStreamPerformance` tests require read access to a multi-MB
|
||||
text file. The default file for these tests is one published by amazon,
|
||||
[s3a://landsat-pds.s3.amazonaws.com/scene_list.gz](http://landsat-pds.s3.amazonaws.com/scene_list.gz).
|
||||
This is a gzipped CSV index of other files which amazon serves for open use.
|
||||
text file. The default file for these tests is a public one.
|
||||
`s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz`
|
||||
from the [NOAA Continuously Operating Reference Stations (CORS) Network (NCN)](https://registry.opendata.aws/noaa-ncn/)
|
||||
|
||||
Historically it was required to be a `csv.gz` file to validate S3 Select
|
||||
support. Now that S3 Select support has been removed, other large files
|
||||
may be used instead.
|
||||
However, future versions may want to read a CSV file again, so testers
|
||||
should still reference one.
|
||||
|
||||
The path to this object is set in the option `fs.s3a.scale.test.csvfile`,
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.scale.test.csvfile</name>
|
||||
<value>s3a://landsat-pds/scene_list.gz</value>
|
||||
<value>s3a://noaa-cors-pds/raw/2023/001/akse/AKSE001a.23_.gz</value>
|
||||
</property>
|
||||
```
|
||||
|
||||
@ -285,6 +283,7 @@ is hosted in Amazon's US-east datacenter.
|
||||
1. If the data cannot be read for any reason then the test will fail.
|
||||
1. If the property is set to a different path, then that data must be readable
|
||||
and "sufficiently" large.
|
||||
1. If a `.gz` file, expect decompression-related test failures.
|
||||
|
||||
(the reason the space or newline is needed is to add "an empty entry"; an empty
|
||||
`<value/>` would be considered undefined and pick up the default)
|
||||
@ -292,14 +291,13 @@ and "sufficiently" large.
|
||||
|
||||
If using a test file in a different AWS S3 region then
|
||||
a bucket-specific region must be defined.
|
||||
For the default test dataset, hosted in the `landsat-pds` bucket, this is:
|
||||
For the default test dataset, hosted in the `noaa-cors-pds` bucket, this is:
|
||||
|
||||
```xml
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.endpoint.region</name>
|
||||
<value>us-west-2</value>
|
||||
<description>The region for s3a://landsat-pds</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name>
|
||||
<value>us-east-1</value>
|
||||
</property>
|
||||
```
|
||||
|
||||
### <a name="access"></a> Testing Access Point Integration
|
||||
@ -857,7 +855,7 @@ the tests become skipped, rather than fail with a trace which is really a false
|
||||
The ordered test case mechanism of `AbstractSTestS3AHugeFiles` is probably
|
||||
the most elegant way of chaining test setup/teardown.
|
||||
|
||||
Regarding reusing existing data, we tend to use the landsat archive of
|
||||
Regarding reusing existing data, we tend to use the noaa-cors-pds archive of
|
||||
AWS US-East for our testing of input stream operations. This doesn't work
|
||||
against other regions, or with third party S3 implementations. Thus the
|
||||
URL can be overridden for testing elsewhere.
|
||||
|
@ -40,10 +40,10 @@
|
||||
import org.slf4j.LoggerFactory;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.*;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.getCSVTestPath;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides;
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.DELEGATION_TOKEN_BINDING;
|
||||
import static org.apache.hadoop.fs.s3a.impl.InstantiationIOException.CONSTRUCTOR_EXCEPTION;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
|
||||
import static org.junit.Assert.*;
|
||||
|
||||
@ -207,7 +207,7 @@ public void testBadCredentialsWithRemap() throws Exception {
|
||||
@Test
|
||||
public void testAnonymousProvider() throws Exception {
|
||||
Configuration conf = createConf(AnonymousAWSCredentialsProvider.class);
|
||||
Path testFile = getCSVTestPath(conf);
|
||||
Path testFile = getExternalData(conf);
|
||||
try (FileSystem fs = FileSystem.newInstance(testFile.toUri(), conf)) {
|
||||
Assertions.assertThat(fs)
|
||||
.describedAs("Filesystem")
|
||||
|
@ -22,7 +22,6 @@
|
||||
import software.amazon.awssdk.services.s3.model.S3Error;
|
||||
|
||||
import org.assertj.core.api.Assertions;
|
||||
import org.junit.Assume;
|
||||
|
||||
import org.apache.hadoop.conf.Configuration;
|
||||
import org.apache.hadoop.fs.LocatedFileStatus;
|
||||
@ -47,6 +46,7 @@
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.createFiles;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.isBulkDeleteEnabled;
|
||||
import static org.apache.hadoop.fs.s3a.test.ExtraAssertions.failIf;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalData;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.*;
|
||||
import static org.apache.hadoop.util.functional.RemoteIterators.mappingRemoteIterator;
|
||||
import static org.apache.hadoop.util.functional.RemoteIterators.toList;
|
||||
@ -156,31 +156,22 @@ public void testMultiObjectDeleteSomeFiles() throws Throwable {
|
||||
timer.end("removeKeys");
|
||||
}
|
||||
|
||||
|
||||
private Path maybeGetCsvPath() {
|
||||
Configuration conf = getConfiguration();
|
||||
String csvFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
|
||||
Assume.assumeTrue("CSV test file is not the default",
|
||||
DEFAULT_CSVTEST_FILE.equals(csvFile));
|
||||
return new Path(csvFile);
|
||||
}
|
||||
|
||||
/**
|
||||
* Test low-level failure handling with low level delete request.
|
||||
*/
|
||||
@Test
|
||||
public void testMultiObjectDeleteNoPermissions() throws Throwable {
|
||||
describe("Delete the landsat CSV file and expect it to fail");
|
||||
Path csvPath = maybeGetCsvPath();
|
||||
S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem(
|
||||
describe("Delete the external file and expect it to fail");
|
||||
Path path = requireDefaultExternalData(getConfiguration());
|
||||
S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(
|
||||
getConfiguration());
|
||||
// create a span, expect it to be activated.
|
||||
fs.getAuditSpanSource().createSpan(StoreStatisticNames.OP_DELETE,
|
||||
csvPath.toString(), null);
|
||||
path.toString(), null);
|
||||
List<ObjectIdentifier> keys
|
||||
= buildDeleteRequest(
|
||||
new String[]{
|
||||
fs.pathToKey(csvPath),
|
||||
fs.pathToKey(path),
|
||||
"missing-key.csv"
|
||||
});
|
||||
MultiObjectDeleteException ex = intercept(
|
||||
@ -193,10 +184,10 @@ public void testMultiObjectDeleteNoPermissions() throws Throwable {
|
||||
final String undeletedFiles = undeleted.stream()
|
||||
.map(Path::toString)
|
||||
.collect(Collectors.joining(", "));
|
||||
failIf(undeleted.size() != 2,
|
||||
"undeleted list size wrong: " + undeletedFiles,
|
||||
ex);
|
||||
assertTrue("no CSV in " +undeletedFiles, undeleted.contains(csvPath));
|
||||
Assertions.assertThat(undeleted)
|
||||
.describedAs("undeleted files")
|
||||
.hasSize(2)
|
||||
.contains(path);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -205,12 +196,12 @@ public void testMultiObjectDeleteNoPermissions() throws Throwable {
|
||||
*/
|
||||
@Test
|
||||
public void testSingleObjectDeleteNoPermissionsTranslated() throws Throwable {
|
||||
describe("Delete the landsat CSV file and expect it to fail");
|
||||
Path csvPath = maybeGetCsvPath();
|
||||
S3AFileSystem fs = (S3AFileSystem) csvPath.getFileSystem(
|
||||
describe("Delete the external file and expect it to fail");
|
||||
Path path = requireDefaultExternalData(getConfiguration());
|
||||
S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(
|
||||
getConfiguration());
|
||||
AccessDeniedException aex = intercept(AccessDeniedException.class,
|
||||
() -> fs.delete(csvPath, false));
|
||||
() -> fs.delete(path, false));
|
||||
Throwable cause = aex.getCause();
|
||||
failIf(cause == null, "no nested exception", aex);
|
||||
}
|
||||
|
@ -19,8 +19,9 @@
|
||||
package org.apache.hadoop.fs.s3a;
|
||||
|
||||
import java.io.File;
|
||||
import java.net.URI;
|
||||
import java.util.UUID;
|
||||
|
||||
import org.assertj.core.api.Assertions;
|
||||
import org.junit.Before;
|
||||
import org.junit.Test;
|
||||
import org.slf4j.Logger;
|
||||
@ -30,15 +31,16 @@
|
||||
import org.apache.hadoop.fs.FSDataInputStream;
|
||||
import org.apache.hadoop.fs.FileStatus;
|
||||
import org.apache.hadoop.fs.FileSystem;
|
||||
import org.apache.hadoop.fs.LocalFileSystem;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
import org.apache.hadoop.fs.contract.ContractTestUtils;
|
||||
import org.apache.hadoop.fs.permission.FsAction;
|
||||
import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.BUFFER_DIR;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_DEFAULT_SIZE;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_BLOCK_SIZE_KEY;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.PREFETCH_ENABLED_KEY;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
|
||||
import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
|
||||
|
||||
/**
|
||||
@ -49,11 +51,21 @@ public class ITestS3APrefetchingCacheFiles extends AbstractS3ACostTest {
|
||||
private static final Logger LOG =
|
||||
LoggerFactory.getLogger(ITestS3APrefetchingCacheFiles.class);
|
||||
|
||||
/** use a small file size so small source files will still work. */
|
||||
public static final int BLOCK_SIZE = 128 * 1024;
|
||||
|
||||
public static final int PREFETCH_OFFSET = 10240;
|
||||
|
||||
private Path testFile;
|
||||
|
||||
/** The FS with the external file. */
|
||||
private FileSystem fs;
|
||||
|
||||
private int prefetchBlockSize;
|
||||
private Configuration conf;
|
||||
|
||||
private String bufferDir;
|
||||
|
||||
public ITestS3APrefetchingCacheFiles() {
|
||||
super(true);
|
||||
}
|
||||
@ -63,35 +75,31 @@ public void setUp() throws Exception {
|
||||
super.setup();
|
||||
// Sets BUFFER_DIR by calling S3ATestUtils#prepareTestConfiguration
|
||||
conf = createConfiguration();
|
||||
String testFileUri = S3ATestUtils.getCSVTestFile(conf);
|
||||
|
||||
testFile = new Path(testFileUri);
|
||||
prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, PREFETCH_BLOCK_DEFAULT_SIZE);
|
||||
fs = getFileSystem();
|
||||
fs.initialize(new URI(testFileUri), conf);
|
||||
testFile = getExternalData(conf);
|
||||
prefetchBlockSize = conf.getInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE);
|
||||
fs = FileSystem.get(testFile.toUri(), conf);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Configuration createConfiguration() {
|
||||
Configuration configuration = super.createConfiguration();
|
||||
S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_ENABLED_KEY);
|
||||
S3ATestUtils.removeBaseAndBucketOverrides(configuration, PREFETCH_BLOCK_SIZE_KEY);
|
||||
configuration.setBoolean(PREFETCH_ENABLED_KEY, true);
|
||||
// use a small block size unless explicitly set in the test config.
|
||||
configuration.setInt(PREFETCH_BLOCK_SIZE_KEY, BLOCK_SIZE);
|
||||
// patch buffer dir with a unique path for test isolation.
|
||||
final String bufferDirBase = configuration.get(BUFFER_DIR);
|
||||
bufferDir = bufferDirBase + "/" + UUID.randomUUID();
|
||||
configuration.set(BUFFER_DIR, bufferDir);
|
||||
return configuration;
|
||||
}
|
||||
|
||||
@Override
|
||||
public synchronized void teardown() throws Exception {
|
||||
super.teardown();
|
||||
File tmpFileDir = new File(conf.get(BUFFER_DIR));
|
||||
File[] tmpFiles = tmpFileDir.listFiles();
|
||||
if (tmpFiles != null) {
|
||||
for (File filePath : tmpFiles) {
|
||||
String path = filePath.getPath();
|
||||
if (path.endsWith(".bin") && path.contains("fs-cache-")) {
|
||||
filePath.delete();
|
||||
}
|
||||
}
|
||||
if (bufferDir != null) {
|
||||
new File(bufferDir).delete();
|
||||
}
|
||||
cleanupWithLogger(LOG, fs);
|
||||
fs = null;
|
||||
@ -111,34 +119,35 @@ public void testCacheFileExistence() throws Throwable {
|
||||
try (FSDataInputStream in = fs.open(testFile)) {
|
||||
byte[] buffer = new byte[prefetchBlockSize];
|
||||
|
||||
in.read(buffer, 0, prefetchBlockSize - 10240);
|
||||
in.seek(prefetchBlockSize * 2);
|
||||
in.read(buffer, 0, prefetchBlockSize);
|
||||
// read a bit less than a block
|
||||
in.readFully(0, buffer, 0, prefetchBlockSize - PREFETCH_OFFSET);
|
||||
// read at least some of a second block
|
||||
in.read(prefetchBlockSize * 2, buffer, 0, prefetchBlockSize);
|
||||
|
||||
|
||||
File tmpFileDir = new File(conf.get(BUFFER_DIR));
|
||||
assertTrue("The dir to keep cache files must exist", tmpFileDir.exists());
|
||||
final LocalFileSystem localFs = FileSystem.getLocal(conf);
|
||||
Path bufferDirPath = new Path(tmpFileDir.toURI());
|
||||
ContractTestUtils.assertIsDirectory(localFs, bufferDirPath);
|
||||
File[] tmpFiles = tmpFileDir
|
||||
.listFiles((dir, name) -> name.endsWith(".bin") && name.contains("fs-cache-"));
|
||||
boolean isCacheFileForBlockFound = tmpFiles != null && tmpFiles.length > 0;
|
||||
if (!isCacheFileForBlockFound) {
|
||||
LOG.warn("No cache files found under " + tmpFileDir);
|
||||
}
|
||||
assertTrue("File to cache block data must exist", isCacheFileForBlockFound);
|
||||
Assertions.assertThat(tmpFiles)
|
||||
.describedAs("Cache files not found under %s", tmpFileDir)
|
||||
.isNotEmpty();
|
||||
|
||||
|
||||
for (File tmpFile : tmpFiles) {
|
||||
Path path = new Path(tmpFile.getAbsolutePath());
|
||||
try (FileSystem localFs = FileSystem.getLocal(conf)) {
|
||||
FileStatus stat = localFs.getFileStatus(path);
|
||||
ContractTestUtils.assertIsFile(path, stat);
|
||||
assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize,
|
||||
stat.getLen());
|
||||
assertEquals("User permissions should be RW", FsAction.READ_WRITE,
|
||||
stat.getPermission().getUserAction());
|
||||
assertEquals("Group permissions should be NONE", FsAction.NONE,
|
||||
stat.getPermission().getGroupAction());
|
||||
assertEquals("Other permissions should be NONE", FsAction.NONE,
|
||||
stat.getPermission().getOtherAction());
|
||||
}
|
||||
FileStatus stat = localFs.getFileStatus(path);
|
||||
ContractTestUtils.assertIsFile(path, stat);
|
||||
assertEquals("File length not matching with prefetchBlockSize", prefetchBlockSize,
|
||||
stat.getLen());
|
||||
assertEquals("User permissions should be RW", FsAction.READ_WRITE,
|
||||
stat.getPermission().getUserAction());
|
||||
assertEquals("Group permissions should be NONE", FsAction.NONE,
|
||||
stat.getPermission().getGroupAction());
|
||||
assertEquals("Other permissions should be NONE", FsAction.NONE,
|
||||
stat.getPermission().getOtherAction());
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -111,14 +111,16 @@ public interface S3ATestConstants {
|
||||
String KEY_CSVTEST_FILE = S3A_SCALE_TEST + "csvfile";
|
||||
|
||||
/**
|
||||
* The landsat bucket: {@value}.
|
||||
* Default path for the multi MB test file: {@value}.
|
||||
* @deprecated retrieve via {@link PublicDatasetTestUtils}.
|
||||
*/
|
||||
String LANDSAT_BUCKET = "s3a://landsat-pds/";
|
||||
@Deprecated
|
||||
String DEFAULT_CSVTEST_FILE = PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE;
|
||||
|
||||
/**
|
||||
* Default path for the multi MB test file: {@value}.
|
||||
* Example path for unit tests; this is never accessed: {@value}.
|
||||
*/
|
||||
String DEFAULT_CSVTEST_FILE = LANDSAT_BUCKET + "scene_list.gz";
|
||||
String UNIT_TEST_EXAMPLE_PATH = "s3a://example/data/";
|
||||
|
||||
/**
|
||||
* Configuration key for an existing object in a requester pays bucket: {@value}.
|
||||
|
@ -105,6 +105,8 @@
|
||||
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit;
|
||||
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion;
|
||||
import static org.apache.hadoop.fs.s3a.impl.S3ExpressStorage.STORE_CAPABILITY_S3_EXPRESS_STORAGE;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireDefaultExternalDataFile;
|
||||
import static org.apache.hadoop.test.GenericTestUtils.buildPaths;
|
||||
import static org.apache.hadoop.util.Preconditions.checkNotNull;
|
||||
import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_SECURITY_CREDENTIAL_PROVIDER_PATH;
|
||||
@ -405,22 +407,22 @@ public static String getTestProperty(Configuration conf,
|
||||
* Get the test CSV file; assume() that it is not empty.
|
||||
* @param conf test configuration
|
||||
* @return test file.
|
||||
* @deprecated Retained only to assist cherrypicking patches
|
||||
*/
|
||||
@Deprecated
|
||||
public static String getCSVTestFile(Configuration conf) {
|
||||
String csvFile = conf
|
||||
.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
|
||||
Assume.assumeTrue("CSV test file is not the default",
|
||||
isNotEmpty(csvFile));
|
||||
return csvFile;
|
||||
return getExternalData(conf).toUri().toString();
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the test CSV path; assume() that it is not empty.
|
||||
* @param conf test configuration
|
||||
* @return test file as a path.
|
||||
* @deprecated Retained only to assist cherrypicking patches
|
||||
*/
|
||||
@Deprecated
|
||||
public static Path getCSVTestPath(Configuration conf) {
|
||||
return new Path(getCSVTestFile(conf));
|
||||
return getExternalData(conf);
|
||||
}
|
||||
|
||||
/**
|
||||
@ -429,12 +431,11 @@ public static Path getCSVTestPath(Configuration conf) {
|
||||
* read only).
|
||||
* @return test file.
|
||||
* @param conf test configuration
|
||||
* @deprecated Retained only to assist cherrypicking patches
|
||||
*/
|
||||
@Deprecated
|
||||
public static String getLandsatCSVFile(Configuration conf) {
|
||||
String csvFile = getCSVTestFile(conf);
|
||||
Assume.assumeTrue("CSV test file is not the default",
|
||||
DEFAULT_CSVTEST_FILE.equals(csvFile));
|
||||
return csvFile;
|
||||
return requireDefaultExternalDataFile(conf);
|
||||
}
|
||||
/**
|
||||
* Get the test CSV file; assume() that it is not modified (i.e. we haven't
|
||||
@ -442,9 +443,11 @@ public static String getLandsatCSVFile(Configuration conf) {
|
||||
* read only).
|
||||
* @param conf test configuration
|
||||
* @return test file as a path.
|
||||
* @deprecated Retained only to assist cherrypicking patches
|
||||
*/
|
||||
@Deprecated
|
||||
public static Path getLandsatCSVPath(Configuration conf) {
|
||||
return new Path(getLandsatCSVFile(conf));
|
||||
return getExternalData(conf);
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -54,37 +54,34 @@
|
||||
import org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException;
|
||||
import org.apache.hadoop.fs.s3a.auth.delegation.CountInvocationsProvider;
|
||||
import org.apache.hadoop.fs.s3a.impl.InstantiationIOException;
|
||||
import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
|
||||
import org.apache.hadoop.io.retry.RetryPolicy;
|
||||
import org.apache.hadoop.util.Sets;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.ASSUMED_ROLE_CREDENTIALS_PROVIDER;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.AWS_CREDENTIALS_PROVIDER;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.AWS_CREDENTIALS_PROVIDER_MAPPING;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestConstants.DEFAULT_CSVTEST_FILE;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.authenticationContains;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.buildClassListString;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.getCSVTestPath;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.STANDARD_AWS_PROVIDERS;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.buildAWSProviderList;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.createAWSCredentialProviderList;
|
||||
import static org.apache.hadoop.fs.s3a.impl.InstantiationIOException.DOES_NOT_IMPLEMENT;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getExternalData;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.interceptFuture;
|
||||
import static org.junit.Assert.assertEquals;
|
||||
import static org.junit.Assert.assertFalse;
|
||||
import static org.junit.Assert.assertNotNull;
|
||||
import static org.junit.Assert.assertTrue;
|
||||
|
||||
/**
|
||||
* Unit tests for {@link Constants#AWS_CREDENTIALS_PROVIDER} logic.
|
||||
*/
|
||||
public class TestS3AAWSCredentialsProvider {
|
||||
public class TestS3AAWSCredentialsProvider extends AbstractS3ATestBase {
|
||||
|
||||
/**
|
||||
* URI of the landsat images.
|
||||
* URI of the test file: this must be anonymously accessible.
|
||||
* As these are unit tests no actual connection to the store is made.
|
||||
*/
|
||||
private static final URI TESTFILE_URI = new Path(
|
||||
DEFAULT_CSVTEST_FILE).toUri();
|
||||
PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE).toUri();
|
||||
|
||||
private static final Logger LOG = LoggerFactory.getLogger(TestS3AAWSCredentialsProvider.class);
|
||||
|
||||
@ -127,7 +124,7 @@ public void testInstantiationChain() throws Throwable {
|
||||
TemporaryAWSCredentialsProvider.NAME
|
||||
+ ", \t" + SimpleAWSCredentialsProvider.NAME
|
||||
+ " ,\n " + AnonymousAWSCredentialsProvider.NAME);
|
||||
Path testFile = getCSVTestPath(conf);
|
||||
Path testFile = getExternalData(conf);
|
||||
|
||||
AWSCredentialProviderList list = createAWSCredentialProviderList(
|
||||
testFile.toUri(), conf);
|
||||
@ -586,7 +583,7 @@ protected AwsCredentials createCredentials(Configuration config) throws IOExcept
|
||||
@Test
|
||||
public void testConcurrentAuthentication() throws Throwable {
|
||||
Configuration conf = createProviderConfiguration(SlowProvider.class.getName());
|
||||
Path testFile = getCSVTestPath(conf);
|
||||
Path testFile = getExternalData(conf);
|
||||
|
||||
AWSCredentialProviderList list = createAWSCredentialProviderList(testFile.toUri(), conf);
|
||||
|
||||
@ -656,7 +653,7 @@ protected AwsCredentials createCredentials(Configuration config) throws IOExcept
|
||||
@Test
|
||||
public void testConcurrentAuthenticationError() throws Throwable {
|
||||
Configuration conf = createProviderConfiguration(ErrorProvider.class.getName());
|
||||
Path testFile = getCSVTestPath(conf);
|
||||
Path testFile = getExternalData(conf);
|
||||
|
||||
AWSCredentialProviderList list = createAWSCredentialProviderList(testFile.toUri(), conf);
|
||||
ErrorProvider provider = (ErrorProvider) list.getProviders().get(0);
|
||||
|
@ -39,9 +39,9 @@
|
||||
import org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider;
|
||||
import org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider;
|
||||
import org.apache.hadoop.fs.s3a.impl.InstantiationIOException;
|
||||
import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.AWS_CREDENTIALS_PROVIDER;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestConstants.DEFAULT_CSVTEST_FILE;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.ANONYMOUS_CREDENTIALS_V1;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.EC2_CONTAINER_CREDENTIALS_V1;
|
||||
import static org.apache.hadoop.fs.s3a.auth.CredentialProviderListFactory.ENVIRONMENT_CREDENTIALS_V1;
|
||||
@ -56,10 +56,10 @@
|
||||
public class TestV1CredentialsProvider {
|
||||
|
||||
/**
|
||||
* URI of the landsat images.
|
||||
* URI of the test file.
|
||||
*/
|
||||
private static final URI TESTFILE_URI = new Path(
|
||||
DEFAULT_CSVTEST_FILE).toUri();
|
||||
PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE).toUri();
|
||||
|
||||
private static final Logger LOG = LoggerFactory.getLogger(TestV1CredentialsProvider.class);
|
||||
|
||||
|
@ -46,7 +46,6 @@
|
||||
import org.apache.hadoop.fs.s3a.AWSBadRequestException;
|
||||
import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
|
||||
import org.apache.hadoop.fs.s3a.S3AFileSystem;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestConstants;
|
||||
import org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider;
|
||||
import org.apache.hadoop.fs.s3a.commit.CommitConstants;
|
||||
import org.apache.hadoop.fs.s3a.commit.files.PendingSet;
|
||||
@ -68,6 +67,7 @@
|
||||
import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.forbidden;
|
||||
import static org.apache.hadoop.fs.s3a.auth.RoleTestUtils.newAssumedRoleConfig;
|
||||
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardToolTestHelper.exec;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
|
||||
import static org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsSourceToString;
|
||||
import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
|
||||
import static org.apache.hadoop.test.GenericTestUtils.assertExceptionContains;
|
||||
@ -115,7 +115,7 @@ protected Configuration createConfiguration() {
|
||||
public void setup() throws Exception {
|
||||
super.setup();
|
||||
assumeRoleTests();
|
||||
uri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE);
|
||||
uri = requireAnonymousDataPath(getConfiguration()).toUri();
|
||||
}
|
||||
|
||||
@Override
|
||||
|
@ -58,6 +58,8 @@
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.DelegationConstants.*;
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled;
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.closeUserFileSystems;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.getOrcData;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
|
||||
|
||||
/**
|
||||
* Submit a job with S3 delegation tokens.
|
||||
@ -106,10 +108,17 @@ public class ITestDelegatedMRJob extends AbstractDelegationIT {
|
||||
|
||||
private Path destPath;
|
||||
|
||||
private static final Path EXTRA_JOB_RESOURCE_PATH
|
||||
= new Path("s3a://osm-pds/planet/planet-latest.orc");
|
||||
/**
|
||||
* Path of the extra job resource; set up in
|
||||
* {@link #createConfiguration()}.
|
||||
*/
|
||||
private Path extraJobResourcePath;
|
||||
|
||||
public static final URI jobResource = EXTRA_JOB_RESOURCE_PATH.toUri();
|
||||
/**
|
||||
* URI of the extra job resource; set up in
|
||||
* {@link #createConfiguration()}.
|
||||
*/
|
||||
private URI jobResourceUri;
|
||||
|
||||
/**
|
||||
* Test array for parameterized test runs.
|
||||
@ -161,7 +170,9 @@ protected YarnConfiguration createConfiguration() {
|
||||
conf.setInt(YarnConfiguration.RESOURCEMANAGER_CONNECT_RETRY_INTERVAL_MS,
|
||||
10_000);
|
||||
|
||||
String host = jobResource.getHost();
|
||||
extraJobResourcePath = getOrcData(conf);
|
||||
jobResourceUri = extraJobResourcePath.toUri();
|
||||
String host = jobResourceUri.getHost();
|
||||
// and fix to the main endpoint if the caller has moved
|
||||
conf.set(
|
||||
String.format("fs.s3a.bucket.%s.endpoint", host), "");
|
||||
@ -229,9 +240,9 @@ protected int getTestTimeoutMillis() {
|
||||
|
||||
@Test
|
||||
public void testCommonCrawlLookup() throws Throwable {
|
||||
FileSystem resourceFS = EXTRA_JOB_RESOURCE_PATH.getFileSystem(
|
||||
FileSystem resourceFS = extraJobResourcePath.getFileSystem(
|
||||
getConfiguration());
|
||||
FileStatus status = resourceFS.getFileStatus(EXTRA_JOB_RESOURCE_PATH);
|
||||
FileStatus status = resourceFS.getFileStatus(extraJobResourcePath);
|
||||
LOG.info("Extra job resource is {}", status);
|
||||
assertTrue("Not encrypted: " + status, status.isEncrypted());
|
||||
}
|
||||
@ -241,9 +252,9 @@ public void testJobSubmissionCollectsTokens() throws Exception {
|
||||
describe("Mock Job test");
|
||||
JobConf conf = new JobConf(getConfiguration());
|
||||
|
||||
// the input here is the landsat file; which lets
|
||||
// the input here is the external file; which lets
|
||||
// us differentiate source URI from dest URI
|
||||
Path input = new Path(DEFAULT_CSVTEST_FILE);
|
||||
Path input = requireAnonymousDataPath(getConfiguration());
|
||||
final FileSystem sourceFS = input.getFileSystem(conf);
|
||||
|
||||
|
||||
@ -272,7 +283,7 @@ public void testJobSubmissionCollectsTokens() throws Exception {
|
||||
// This is to actually stress the terasort code for which
|
||||
// the yarn ResourceLocalizationService was having problems with
|
||||
// fetching resources from.
|
||||
URI partitionUri = new URI(EXTRA_JOB_RESOURCE_PATH.toString() +
|
||||
URI partitionUri = new URI(extraJobResourcePath.toString() +
|
||||
"#_partition.lst");
|
||||
job.addCacheFile(partitionUri);
|
||||
|
||||
@ -302,7 +313,7 @@ public void testJobSubmissionCollectsTokens() throws Exception {
|
||||
// look up the destination token
|
||||
lookupToken(submittedCredentials, fs.getUri(), tokenKind);
|
||||
lookupToken(submittedCredentials,
|
||||
EXTRA_JOB_RESOURCE_PATH.getFileSystem(conf).getUri(), tokenKind);
|
||||
extraJobResourcePath.getFileSystem(conf).getUri(), tokenKind);
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -53,8 +53,7 @@ public Text getTokenKind() {
|
||||
|
||||
/**
|
||||
* This verifies that the granted credentials only access the target bucket
|
||||
* by using the credentials in a new S3 client to query the AWS-owned landsat
|
||||
* bucket.
|
||||
* by using the credentials in a new S3 client to query the public data bucket.
|
||||
* @param delegatedFS delegated FS with role-restricted access.
|
||||
* @throws Exception failure
|
||||
*/
|
||||
@ -62,7 +61,7 @@ public Text getTokenKind() {
|
||||
protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS)
|
||||
throws Exception {
|
||||
intercept(AccessDeniedException.class,
|
||||
() -> readLandsatMetadata(delegatedFS));
|
||||
() -> readExternalDatasetMetadata(delegatedFS));
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -79,6 +79,7 @@
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.ALICE;
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.MiniKerberizedHadoopCluster.assertSecurityEnabled;
|
||||
import static org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lookupS3ADelegationToken;
|
||||
import static org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils.requireAnonymousDataPath;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.doAs;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
|
||||
import static org.hamcrest.Matchers.containsString;
|
||||
@ -344,7 +345,7 @@ public void testDelegatedFileSystem() throws Throwable {
|
||||
// TODO: Check what should happen here. Calling headObject() on the root path fails in V2,
|
||||
// with the error that key cannot be empty.
|
||||
// fs.getObjectMetadata(new Path("/"));
|
||||
readLandsatMetadata(fs);
|
||||
readExternalDatasetMetadata(fs);
|
||||
|
||||
URI uri = fs.getUri();
|
||||
// create delegation tokens from the test suites FS.
|
||||
@ -463,13 +464,13 @@ protected void executeDelegatedFSOperations(final S3AFileSystem delegatedFS,
|
||||
}
|
||||
|
||||
/**
|
||||
* Session tokens can read the landsat bucket without problems.
|
||||
* Session tokens can read the external bucket without problems.
|
||||
* @param delegatedFS delegated FS
|
||||
* @throws Exception failure
|
||||
*/
|
||||
protected void verifyRestrictedPermissions(final S3AFileSystem delegatedFS)
|
||||
throws Exception {
|
||||
readLandsatMetadata(delegatedFS);
|
||||
readExternalDatasetMetadata(delegatedFS);
|
||||
}
|
||||
|
||||
@Test
|
||||
@ -582,7 +583,7 @@ public void testDelegationBindingMismatch2() throws Throwable {
|
||||
|
||||
/**
|
||||
* This verifies that the granted credentials only access the target bucket
|
||||
* by using the credentials in a new S3 client to query the AWS-owned landsat
|
||||
* by using the credentials in a new S3 client to query the external
|
||||
* bucket.
|
||||
* @param delegatedFS delegated FS with role-restricted access.
|
||||
* @throws AccessDeniedException if the delegated FS's credentials can't
|
||||
@ -590,17 +591,17 @@ public void testDelegationBindingMismatch2() throws Throwable {
|
||||
* @return result of the HEAD
|
||||
* @throws Exception failure
|
||||
*/
|
||||
protected HeadBucketResponse readLandsatMetadata(final S3AFileSystem delegatedFS)
|
||||
protected HeadBucketResponse readExternalDatasetMetadata(final S3AFileSystem delegatedFS)
|
||||
throws Exception {
|
||||
AWSCredentialProviderList testingCreds
|
||||
= delegatedFS.getS3AInternals().shareCredentials("testing");
|
||||
|
||||
URI landsat = new URI(DEFAULT_CSVTEST_FILE);
|
||||
URI external = requireAnonymousDataPath(getConfiguration()).toUri();
|
||||
DefaultS3ClientFactory factory
|
||||
= new DefaultS3ClientFactory();
|
||||
Configuration conf = delegatedFS.getConf();
|
||||
factory.setConf(conf);
|
||||
String host = landsat.getHost();
|
||||
String host = external.getHost();
|
||||
S3ClientFactory.S3ClientCreationParameters parameters = null;
|
||||
parameters = new S3ClientFactory.S3ClientCreationParameters()
|
||||
.withCredentialSet(testingCreds)
|
||||
@ -609,7 +610,7 @@ protected HeadBucketResponse readLandsatMetadata(final S3AFileSystem delegatedFS
|
||||
.newStatisticsFromAwsSdk())
|
||||
.withUserAgentSuffix("ITestSessionDelegationInFilesystem");
|
||||
|
||||
S3Client s3 = factory.createS3Client(landsat, parameters);
|
||||
S3Client s3 = factory.createS3Client(external, parameters);
|
||||
|
||||
return Invoker.once("HEAD", host,
|
||||
() -> s3.headBucket(b -> b.bucket(host)));
|
||||
|
@ -24,10 +24,10 @@
|
||||
import org.junit.Test;
|
||||
|
||||
import org.apache.hadoop.fs.s3a.S3AEncryptionMethods;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestConstants;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestUtils;
|
||||
import org.apache.hadoop.fs.s3a.auth.MarshalledCredentialBinding;
|
||||
import org.apache.hadoop.fs.s3a.auth.MarshalledCredentials;
|
||||
import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
|
||||
import org.apache.hadoop.io.Text;
|
||||
import org.apache.hadoop.security.UserGroupInformation;
|
||||
import org.apache.hadoop.security.token.SecretManager;
|
||||
@ -44,11 +44,11 @@
|
||||
*/
|
||||
public class TestS3ADelegationTokenSupport {
|
||||
|
||||
private static URI landsatUri;
|
||||
private static URI externalUri;
|
||||
|
||||
@BeforeClass
|
||||
public static void classSetup() throws Exception {
|
||||
landsatUri = new URI(S3ATestConstants.DEFAULT_CSVTEST_FILE);
|
||||
externalUri = new URI(PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE);
|
||||
}
|
||||
|
||||
@Test
|
||||
@ -74,7 +74,7 @@ public void testSessionTokenDecode() throws Throwable {
|
||||
= new SessionTokenIdentifier(SESSION_TOKEN_KIND,
|
||||
alice,
|
||||
renewer,
|
||||
new URI("s3a://landsat-pds/"),
|
||||
new URI("s3a://anything/"),
|
||||
new MarshalledCredentials("a", "b", ""),
|
||||
new EncryptionSecrets(S3AEncryptionMethods.SSE_S3, ""),
|
||||
"origin");
|
||||
@ -116,7 +116,7 @@ public void testSessionTokenIdentifierRoundTrip() throws Throwable {
|
||||
SESSION_TOKEN_KIND,
|
||||
new Text(),
|
||||
renewer,
|
||||
landsatUri,
|
||||
externalUri,
|
||||
new MarshalledCredentials("a", "b", "c"),
|
||||
new EncryptionSecrets(), "");
|
||||
|
||||
@ -135,7 +135,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable {
|
||||
SESSION_TOKEN_KIND,
|
||||
new Text(),
|
||||
null,
|
||||
landsatUri,
|
||||
externalUri,
|
||||
new MarshalledCredentials("a", "b", "c"),
|
||||
new EncryptionSecrets(), "");
|
||||
|
||||
@ -151,7 +151,7 @@ public void testSessionTokenIdentifierRoundTripNoRenewer() throws Throwable {
|
||||
@Test
|
||||
public void testRoleTokenIdentifierRoundTrip() throws Throwable {
|
||||
RoleTokenIdentifier id = new RoleTokenIdentifier(
|
||||
landsatUri,
|
||||
externalUri,
|
||||
new Text(),
|
||||
new Text(),
|
||||
new MarshalledCredentials("a", "b", "c"),
|
||||
@ -170,7 +170,7 @@ public void testRoleTokenIdentifierRoundTrip() throws Throwable {
|
||||
public void testFullTokenIdentifierRoundTrip() throws Throwable {
|
||||
Text renewer = new Text("renewerName");
|
||||
FullCredentialsTokenIdentifier id = new FullCredentialsTokenIdentifier(
|
||||
landsatUri,
|
||||
externalUri,
|
||||
new Text(),
|
||||
renewer,
|
||||
new MarshalledCredentials("a", "b", ""),
|
||||
|
@ -26,6 +26,7 @@
|
||||
import org.apache.hadoop.fs.Path;
|
||||
import org.apache.hadoop.test.HadoopTestBase;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH;
|
||||
import static org.apache.hadoop.fs.s3a.commit.staging.Paths.*;
|
||||
import static org.apache.hadoop.test.LambdaTestUtils.intercept;
|
||||
|
||||
@ -81,7 +82,7 @@ private void assertUUIDAdded(String path, String expected) {
|
||||
assertEquals("from " + path, expected, addUUID(path, "UUID"));
|
||||
}
|
||||
|
||||
private static final String DATA = "s3a://landsat-pds/data/";
|
||||
private static final String DATA = UNIT_TEST_EXAMPLE_PATH;
|
||||
private static final Path BASE = new Path(DATA);
|
||||
|
||||
@Test
|
||||
|
@ -22,14 +22,17 @@
|
||||
import java.io.ByteArrayInputStream;
|
||||
import java.io.ByteArrayOutputStream;
|
||||
import java.io.InputStreamReader;
|
||||
import java.net.URI;
|
||||
import java.util.ArrayList;
|
||||
import java.util.Arrays;
|
||||
import java.util.List;
|
||||
|
||||
import org.junit.Test;
|
||||
|
||||
import org.apache.hadoop.conf.Configuration;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
import org.apache.hadoop.fs.s3a.S3AFileSystem;
|
||||
import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
|
||||
import org.apache.hadoop.test.LambdaTestUtils;
|
||||
import org.apache.hadoop.util.StringUtils;
|
||||
|
||||
@ -40,7 +43,6 @@
|
||||
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.clearAnyUploads;
|
||||
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.countUploadsAt;
|
||||
import static org.apache.hadoop.fs.s3a.MultipartTestUtils.createPartUpload;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVFile;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides;
|
||||
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.BucketInfo;
|
||||
import static org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.E_BAD_STATE;
|
||||
@ -57,36 +59,32 @@ public class ITestS3GuardTool extends AbstractS3GuardToolTestBase {
|
||||
"-force", "-verbose"};
|
||||
|
||||
@Test
|
||||
public void testLandsatBucketUnguarded() throws Throwable {
|
||||
run(BucketInfo.NAME,
|
||||
"-" + BucketInfo.UNGUARDED_FLAG,
|
||||
getLandsatCSVFile(getConfiguration()));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testLandsatBucketRequireGuarded() throws Throwable {
|
||||
runToFailure(E_BAD_STATE,
|
||||
BucketInfo.NAME,
|
||||
"-" + BucketInfo.GUARDED_FLAG,
|
||||
getLandsatCSVFile(
|
||||
ITestS3GuardTool.this.getConfiguration()));
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testLandsatBucketRequireUnencrypted() throws Throwable {
|
||||
public void testExternalBucketRequireUnencrypted() throws Throwable {
|
||||
removeBaseAndBucketOverrides(getConfiguration(), S3_ENCRYPTION_ALGORITHM);
|
||||
run(BucketInfo.NAME,
|
||||
"-" + BucketInfo.ENCRYPTION_FLAG, "none",
|
||||
getLandsatCSVFile(getConfiguration()));
|
||||
externalBucket());
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the external bucket; this is of the default external file.
|
||||
* If not set to the default value, the test will be skipped.
|
||||
* @return the bucket of the default external file.
|
||||
*/
|
||||
private String externalBucket() {
|
||||
Configuration conf = getConfiguration();
|
||||
Path result = PublicDatasetTestUtils.requireDefaultExternalData(conf);
|
||||
final URI uri = result.toUri();
|
||||
final String bucket = uri.getScheme() + "://" + uri.getHost();
|
||||
return bucket;
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testLandsatBucketRequireEncrypted() throws Throwable {
|
||||
public void testExternalBucketRequireEncrypted() throws Throwable {
|
||||
runToFailure(E_BAD_STATE,
|
||||
BucketInfo.NAME,
|
||||
"-" + BucketInfo.ENCRYPTION_FLAG,
|
||||
"AES256", getLandsatCSVFile(
|
||||
ITestS3GuardTool.this.getConfiguration()));
|
||||
"AES256", externalBucket());
|
||||
}
|
||||
|
||||
@Test
|
||||
|
@ -33,6 +33,7 @@
|
||||
import org.apache.hadoop.test.AbstractHadoopTestBase;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.AUTHORITATIVE_PATH;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestConstants.UNIT_TEST_EXAMPLE_PATH;
|
||||
import static org.assertj.core.api.Assertions.assertThat;
|
||||
|
||||
/**
|
||||
@ -71,7 +72,7 @@ public void testResolutionWithFQP() throws Throwable {
|
||||
@Test
|
||||
public void testOtherBucket() throws Throwable {
|
||||
assertAuthPaths(l("/one/",
|
||||
"s3a://landsat-pds/",
|
||||
UNIT_TEST_EXAMPLE_PATH,
|
||||
BASE + "/two/"),
|
||||
"/one/", "/two/");
|
||||
}
|
||||
@ -79,7 +80,7 @@ public void testOtherBucket() throws Throwable {
|
||||
@Test
|
||||
public void testOtherScheme() throws Throwable {
|
||||
assertAuthPaths(l("/one/",
|
||||
"s3a://landsat-pds/",
|
||||
UNIT_TEST_EXAMPLE_PATH,
|
||||
"http://bucket/two/"),
|
||||
"/one/");
|
||||
}
|
||||
|
@ -30,6 +30,7 @@
|
||||
import org.apache.hadoop.fs.s3a.S3AInputStream;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestUtils;
|
||||
import org.apache.hadoop.fs.s3a.statistics.S3AInputStreamStatistics;
|
||||
import org.apache.hadoop.fs.s3a.test.PublicDatasetTestUtils;
|
||||
import org.apache.hadoop.fs.statistics.IOStatistics;
|
||||
import org.apache.hadoop.fs.statistics.IOStatisticsSnapshot;
|
||||
import org.apache.hadoop.fs.statistics.MeanStatistic;
|
||||
@ -112,7 +113,9 @@ public void openFS() throws IOException {
|
||||
Configuration conf = getConf();
|
||||
conf.setInt(SOCKET_SEND_BUFFER, 16 * 1024);
|
||||
conf.setInt(SOCKET_RECV_BUFFER, 16 * 1024);
|
||||
String testFile = conf.getTrimmed(KEY_CSVTEST_FILE, DEFAULT_CSVTEST_FILE);
|
||||
// look up the test file, no requirement to be set.
|
||||
String testFile = conf.getTrimmed(KEY_CSVTEST_FILE,
|
||||
PublicDatasetTestUtils.DEFAULT_EXTERNAL_FILE);
|
||||
if (testFile.isEmpty()) {
|
||||
assumptionMessage = "Empty test property: " + KEY_CSVTEST_FILE;
|
||||
LOG.warn(assumptionMessage);
|
||||
@ -394,6 +397,9 @@ private void executeDecompression(long readahead,
|
||||
CompressionCodecFactory factory
|
||||
= new CompressionCodecFactory(getConf());
|
||||
CompressionCodec codec = factory.getCodec(testData);
|
||||
Assertions.assertThat(codec)
|
||||
.describedAs("No codec found for %s", testData)
|
||||
.isNotNull();
|
||||
long bytesRead = 0;
|
||||
int lines = 0;
|
||||
|
||||
@ -525,12 +531,18 @@ private ContractTestUtils.NanoTimer executeRandomIO(S3AInputPolicy policy,
|
||||
describe("Random IO with policy \"%s\"", policy);
|
||||
byte[] buffer = new byte[_1MB];
|
||||
long totalBytesRead = 0;
|
||||
|
||||
final long len = testDataStatus.getLen();
|
||||
in = openTestFile(policy, 0);
|
||||
ContractTestUtils.NanoTimer timer = new ContractTestUtils.NanoTimer();
|
||||
for (int[] action : RANDOM_IO_SEQUENCE) {
|
||||
int position = action[0];
|
||||
long position = action[0];
|
||||
int range = action[1];
|
||||
// if a read goes past EOF, fail with details
|
||||
// this will happen if the test datafile is too small.
|
||||
Assertions.assertThat(position + range)
|
||||
.describedAs("readFully(pos=%d range=%d) of %s",
|
||||
position, range, testDataStatus)
|
||||
.isLessThanOrEqualTo(len);
|
||||
in.readFully(position, buffer, 0, range);
|
||||
totalBytesRead += range;
|
||||
}
|
||||
|
@ -22,61 +22,30 @@
|
||||
|
||||
import org.apache.hadoop.conf.Configuration;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
import org.apache.hadoop.fs.s3a.AbstractS3ATestBase;
|
||||
import org.apache.hadoop.fs.s3a.S3AFileSystem;
|
||||
import org.apache.hadoop.fs.statistics.IOStatistics;
|
||||
import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.Constants.DEFAULT_ENDPOINT;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.ENDPOINT;
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestUtils.getLandsatCSVPath;
|
||||
import static org.apache.hadoop.fs.s3a.Constants.FS_S3A_CREATE_PERFORMANCE;
|
||||
import static org.apache.hadoop.fs.s3a.Statistic.STORE_IO_REQUEST;
|
||||
import static org.apache.hadoop.fs.statistics.IOStatisticAssertions.assertThatStatisticCounter;
|
||||
|
||||
/**
|
||||
* Verify that AWS SDK statistics are wired up.
|
||||
* This test tries to read data from US-east-1 and us-west-2 buckets
|
||||
* so as to be confident that the nuances of region mapping
|
||||
* are handed correctly (HADOOP-13551).
|
||||
* The statistics are probed to verify that the wiring up is complete.
|
||||
*/
|
||||
public class ITestAWSStatisticCollection extends AbstractS3ATestBase {
|
||||
public class ITestAWSStatisticCollection extends AbstractS3ACostTest {
|
||||
|
||||
private static final Path COMMON_CRAWL_PATH
|
||||
= new Path("s3a://osm-pds/planet/planet-latest.orc");
|
||||
|
||||
@Test
|
||||
public void testLandsatStatistics() throws Throwable {
|
||||
final Configuration conf = getConfiguration();
|
||||
// skips the tests if the landsat path isn't the default.
|
||||
Path path = getLandsatCSVPath(conf);
|
||||
conf.set(ENDPOINT, DEFAULT_ENDPOINT);
|
||||
conf.unset("fs.s3a.bucket.landsat-pds.endpoint");
|
||||
|
||||
try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) {
|
||||
fs.getS3AInternals().getObjectMetadata(path);
|
||||
IOStatistics iostats = fs.getIOStatistics();
|
||||
assertThatStatisticCounter(iostats,
|
||||
STORE_IO_REQUEST.getSymbol())
|
||||
.isGreaterThanOrEqualTo(1);
|
||||
}
|
||||
@Override
|
||||
public Configuration createConfiguration() {
|
||||
final Configuration conf = super.createConfiguration();
|
||||
conf.setBoolean(FS_S3A_CREATE_PERFORMANCE, true);
|
||||
return conf;
|
||||
}
|
||||
|
||||
@Test
|
||||
public void testCommonCrawlStatistics() throws Throwable {
|
||||
final Configuration conf = getConfiguration();
|
||||
// skips the tests if the landsat path isn't the default.
|
||||
getLandsatCSVPath(conf);
|
||||
|
||||
Path path = COMMON_CRAWL_PATH;
|
||||
conf.set(ENDPOINT, DEFAULT_ENDPOINT);
|
||||
|
||||
try (S3AFileSystem fs = (S3AFileSystem) path.getFileSystem(conf)) {
|
||||
fs.getS3AInternals().getObjectMetadata(path);
|
||||
IOStatistics iostats = fs.getIOStatistics();
|
||||
assertThatStatisticCounter(iostats,
|
||||
STORE_IO_REQUEST.getSymbol())
|
||||
.isGreaterThanOrEqualTo(1);
|
||||
}
|
||||
public void testSDKMetricsCostOfGetFileStatusOnFile() throws Throwable {
|
||||
describe("performing getFileStatus on a file");
|
||||
Path simpleFile = file(methodPath());
|
||||
// and repeat on the file looking at AWS wired up stats
|
||||
verifyMetrics(() -> getFileSystem().getFileStatus(simpleFile),
|
||||
with(STORE_IO_REQUEST, 1));
|
||||
}
|
||||
|
||||
}
|
||||
|
@ -18,9 +18,13 @@
|
||||
|
||||
package org.apache.hadoop.fs.s3a.test;
|
||||
|
||||
import org.junit.Assume;
|
||||
|
||||
import org.apache.hadoop.classification.InterfaceAudience;
|
||||
import org.apache.hadoop.classification.InterfaceStability;
|
||||
import org.apache.hadoop.conf.Configuration;
|
||||
import org.apache.hadoop.fs.Path;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestConstants;
|
||||
import org.apache.hadoop.fs.s3a.S3ATestUtils;
|
||||
|
||||
import static org.apache.hadoop.fs.s3a.S3ATestConstants.KEY_BUCKET_WITH_MANY_OBJECTS;
|
||||
@ -69,6 +73,77 @@ private PublicDatasetTestUtils() {}
|
||||
private static final String DEFAULT_BUCKET_WITH_MANY_OBJECTS
|
||||
= "s3a://usgs-landsat/collection02/level-1/";
|
||||
|
||||
/**
|
||||
* ORC dataset: {@value}.
|
||||
*/
|
||||
private static final Path ORC_DATA = new Path("s3a://osm-pds/planet/planet-latest.orc");
|
||||
|
||||
/**
|
||||
* Provide a Path for some ORC data.
|
||||
*
|
||||
* @param conf Hadoop configuration
|
||||
* @return S3A FS URI
|
||||
*/
|
||||
public static Path getOrcData(Configuration conf) {
|
||||
return ORC_DATA;
|
||||
}
|
||||
|
||||
/**
|
||||
* Default path for the external test file: {@value}.
|
||||
* This must be: gzipped, large enough for the performance
|
||||
* tests and in a read-only bucket with anonymous access.
|
||||
* */
|
||||
public static final String DEFAULT_EXTERNAL_FILE =
|
||||
"s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz";
|
||||
|
||||
/**
|
||||
* Get the external test file.
|
||||
* <p>
|
||||
* This must be: gzipped, large enough for the performance
|
||||
* tests and in a read-only bucket with anon
|
||||
* @param conf configuration
|
||||
* @return a dataset which meets the requirements.
|
||||
*/
|
||||
public static Path getExternalData(Configuration conf) {
|
||||
return new Path(fetchFromConfig(conf,
|
||||
S3ATestConstants.KEY_CSVTEST_FILE, DEFAULT_EXTERNAL_FILE));
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the anonymous dataset..
|
||||
* @param conf configuration
|
||||
* @return a dataset which supports anonymous access.
|
||||
*/
|
||||
public static Path requireAnonymousDataPath(Configuration conf) {
|
||||
return requireDefaultExternalData(conf);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
* Get the external test file; assume() that it is not modified (i.e. we haven't
|
||||
* switched to a new storage infrastructure where the bucket is no longer
|
||||
* read only).
|
||||
* @return test file.
|
||||
* @param conf test configuration
|
||||
*/
|
||||
public static String requireDefaultExternalDataFile(Configuration conf) {
|
||||
String filename = getExternalData(conf).toUri().toString();
|
||||
Assume.assumeTrue("External test file is not the default",
|
||||
DEFAULT_EXTERNAL_FILE.equals(filename));
|
||||
return filename;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the test external file; assume() that it is not modified (i.e. we haven't
|
||||
* switched to a new storage infrastructure where the bucket is no longer
|
||||
* read only).
|
||||
* @param conf test configuration
|
||||
* @return test file as a path.
|
||||
*/
|
||||
public static Path requireDefaultExternalData(Configuration conf) {
|
||||
return new Path(requireDefaultExternalDataFile(conf));
|
||||
}
|
||||
|
||||
/**
|
||||
* Provide a URI for a directory containing many objects.
|
||||
*
|
||||
@ -97,6 +172,13 @@ public static String getRequesterPaysObject(Configuration conf) {
|
||||
KEY_REQUESTER_PAYS_FILE, DEFAULT_REQUESTER_PAYS_FILE);
|
||||
}
|
||||
|
||||
/**
|
||||
* Fetch a trimmed configuration value, require it to to be non-empty.
|
||||
* @param conf configuration file
|
||||
* @param key key
|
||||
* @param defaultValue default value.
|
||||
* @return the resolved value.
|
||||
*/
|
||||
private static String fetchFromConfig(Configuration conf, String key, String defaultValue) {
|
||||
String value = conf.getTrimmed(key, defaultValue);
|
||||
|
||||
|
@ -30,37 +30,57 @@
|
||||
<final>false</final>
|
||||
</property>
|
||||
|
||||
<!-- Per-bucket configurations: landsat-pds -->
|
||||
<!--
|
||||
Test file for some scale tests.
|
||||
|
||||
A CSV file in this bucket was used for testing S3 select.
|
||||
Although this feature has been removed, (HADOOP-18830)
|
||||
it is still used in some tests as a large file to read
|
||||
in a bucket without write permissions.
|
||||
These tests do not need a CSV file.
|
||||
and as a file in a bucket without write permissions.
|
||||
The original file s3a://landsat-pds/scene_list.gz is
|
||||
on a now-inaccessible bucket.
|
||||
-->
|
||||
<!--
|
||||
This is defined in PublicDatasetTestUtils;
|
||||
if needed for older builds, this can copied into
|
||||
auth-keys along with the other bucket binding information,
|
||||
which is all exclusively defined here.
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.endpoint.region</name>
|
||||
<value>us-west-2</value>
|
||||
<description>The region for s3a://landsat-pds</description>
|
||||
<name>fs.s3a.scale.test.csvfile</name>
|
||||
<value>s3a://noaa-cors-pds/raw/2024/001/akse/AKSE001x.24_.gz</value>
|
||||
<description>file used in scale tests</description>
|
||||
</property>
|
||||
-->
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name>
|
||||
<value>us-east-1</value>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.multipart.purge</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name>
|
||||
<value>false</value>
|
||||
<description>Don't try to purge uploads in the read-only bucket, as
|
||||
it will only create log noise.</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.probe</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.probe</name>
|
||||
<value>0</value>
|
||||
<description>Let's postpone existence checks to the first IO operation </description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.landsat-pds.audit.add.referrer.header</name>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name>
|
||||
<value>false</value>
|
||||
<description>Do not add the referrer header to landsat operations</description>
|
||||
<description>Do not add the referrer header</description>
|
||||
</property>
|
||||
|
||||
<property>
|
||||
<name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name>
|
||||
<value>128k</value>
|
||||
<description>Use a small prefetch size so tests fetch multiple blocks</description>
|
||||
</property>
|
||||
|
||||
<!-- Per-bucket configurations: usgs-landsat -->
|
||||
|
Loading…
Reference in New Issue
Block a user