HADOOP-17409. Remove s3guard from S3A module (#3534)

Completely removes S3Guard support from the S3A codebase.

If the connector is configured to use any metastore other than
the null and local stores (i.e. DynamoDB is selected) the s3a client
will raise an exception and refuse to initialize.

This is to ensure that there is no mix of S3Guard enabled and disabled
deployments with the same configuration but different hadoop releases
-it must be turned off completely.

The "hadoop s3guard" command has been retained -but the supported
subcommands have been reduced to those which are not purely S3Guard
related: "bucket-info" and "uploads".

This is major change in terms of the number of files
changed; before cherry picking subsequent s3a patches into
older releases, this patch will probably need backporting
first.

Goodbye S3Guard, your work is done. Time to die.

Contributed by Steve Loughran.
This commit is contained in:
Steve Loughran 2022-01-17 18:08:57 +00:00 committed by GitHub
parent a94e9fcbde
commit 14ba19af06
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
218 changed files with 1217 additions and 36694 deletions

View File

@ -1229,7 +1229,7 @@
com.amazonaws.auth.AWSCredentialsProvider.
When S3A delegation tokens are not enabled, this list will be used
to directly authenticate with S3 and DynamoDB services.
to directly authenticate with S3 and other AWS services.
When S3A Delegation tokens are enabled, depending upon the delegation
token binding it may be used
to communicate wih the STS endpoint to request session/role
@ -1686,180 +1686,18 @@
</description>
</property>
<property>
<name>fs.s3a.metadatastore.authoritative</name>
<value>false</value>
<description>
When true, allow MetadataStore implementations to act as source of
truth for getting file status and directory listings. Even if this
is set to true, MetadataStore implementations may choose not to
return authoritative results. If the configured MetadataStore does
not support being authoritative, this setting will have no effect.
</description>
</property>
<property>
<name>fs.s3a.metadatastore.metadata.ttl</name>
<value>15m</value>
<description>
This value sets how long an entry in a MetadataStore is valid.
</description>
</property>
<property>
<name>fs.s3a.metadatastore.impl</name>
<value>org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore</value>
<description>
Fully-qualified name of the class that implements the MetadataStore
to be used by s3a. The default class, NullMetadataStore, has no
effect: s3a will continue to treat the backing S3 service as the one
and only source of truth for file and directory metadata.
</description>
</property>
<property>
<name>fs.s3a.metadatastore.fail.on.write.error</name>
<value>true</value>
<description>
When true (default), FileSystem write operations generate
org.apache.hadoop.fs.s3a.MetadataPersistenceException if the metadata
cannot be saved to the metadata store. When false, failures to save to
metadata store are logged at ERROR level, but the overall FileSystem
write operation succeeds.
</description>
</property>
<property>
<name>fs.s3a.s3guard.cli.prune.age</name>
<value>86400000</value>
<description>
Default age (in milliseconds) after which to prune metadata from the
metadatastore when the prune command is run. Can be overridden on the
command-line.
</description>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
<description>The implementation class of the S3A Filesystem</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.region</name>
<value></value>
<description>
AWS DynamoDB region to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the S3Guard will operate table in the associated S3 bucket region.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table</name>
<value></value>
<description>
The DynamoDB table name to operate. Without this property, the respective
S3 bucket name will be used.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table.create</name>
<value>false</value>
<description>
If true, the S3A client will create the table if it does not already exist.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table.capacity.read</name>
<value>0</value>
<description>
Provisioned throughput requirements for read operations in terms of capacity
units for the DynamoDB table. This config value will only be used when
creating a new DynamoDB table.
If set to 0 (the default), new tables are created with "per-request" capacity.
If a positive integer is provided for this and the write capacity, then
a table with "provisioned capacity" will be created.
You can change the capacity of an existing provisioned-capacity table
through the "s3guard set-capacity" command.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table.capacity.write</name>
<value>0</value>
<description>
Provisioned throughput requirements for write operations in terms of
capacity units for the DynamoDB table.
If set to 0 (the default), new tables are created with "per-request" capacity.
Refer to related configuration option fs.s3a.s3guard.ddb.table.capacity.read
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table.sse.enabled</name>
<value>false</value>
<description>
Whether server-side encryption (SSE) is enabled or disabled on the table.
By default it's disabled, meaning SSE is set to AWS owned CMK.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.table.sse.cmk</name>
<value/>
<description>
The KMS Customer Master Key (CMK) used for the KMS encryption on the table.
To specify a CMK, this config value can be its key ID, Amazon Resource Name
(ARN), alias name, or alias ARN. Users only need to provide this config if
the key is different from the default DynamoDB KMS Master Key, which is
alias/aws/dynamodb.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.max.retries</name>
<value>9</value>
<description>
Max retries on throttled/incompleted DynamoDB operations
before giving up and throwing an IOException.
Each retry is delayed with an exponential
backoff timer which starts at 100 milliseconds and approximately
doubles each time. The minimum wait before throwing an exception is
sum(100, 200, 400, 800, .. 100*2^N-1 ) == 100 * ((2^N)-1)
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.throttle.retry.interval</name>
<value>100ms</value>
<description>
Initial interval to retry after a request is throttled events;
the back-off policy is exponential until the number of retries of
fs.s3a.s3guard.ddb.max.retries is reached.
</description>
</property>
<property>
<name>fs.s3a.s3guard.ddb.background.sleep</name>
<value>25ms</value>
<description>
Length (in milliseconds) of pause between each batch of deletes when
pruning metadata. Prevents prune operations (which can typically be low
priority background operations) from overly interfering with other I/O
operations.
</description>
</property>
<property>
<name>fs.s3a.retry.limit</name>
<value>7</value>
<description>
Number of times to retry any repeatable S3 client request on failure,
excluding throttling requests and S3Guard inconsistency resolution.
excluding throttling requests.
</description>
</property>
@ -1868,7 +1706,7 @@
<value>500ms</value>
<description>
Initial retry interval when retrying operations for any reason other
than S3 throttle errors and S3Guard inconsistency resolution.
than S3 throttle errors.
</description>
</property>
@ -1891,27 +1729,6 @@
</description>
</property>
<property>
<name>fs.s3a.s3guard.consistency.retry.limit</name>
<value>7</value>
<description>
Number of times to retry attempts to read/open/copy files when
S3Guard believes a specific version of the file to be available,
but the S3 request does not find any version of a file, or a different
version.
</description>
</property>
<property>
<name>fs.s3a.s3guard.consistency.retry.interval</name>
<value>2s</value>
<description>
Initial interval between attempts to retry operations while waiting for S3
to become consistent with the S3Guard data.
An exponential back-off is used here: every failure doubles the delay.
</description>
</property>
<property>
<name>fs.s3a.committer.name</name>
<value>file</value>

View File

@ -137,7 +137,8 @@ internal state stores:
* The internal MapReduce state data will remain compatible across minor releases within the same major version to facilitate rolling upgrades while MapReduce workloads execute.
* HDFS maintains metadata about the data stored in HDFS in a private, internal format that is versioned. In the event of an incompatible change, the store's version number will be incremented. When upgrading an existing cluster, the metadata store will automatically be upgraded if possible. After the metadata store has been upgraded, it is always possible to reverse the upgrade process.
* The AWS S3A guard keeps a private, internal metadata store that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
* The AWS S3A guard kept a private, internal metadata store.
Now that the feature has been removed, the store is obsolete and can be deleted.
* The YARN resource manager keeps a private, internal state store of application and scheduler information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
* The YARN node manager keeps a private, internal state store of application information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.
* The YARN federation service keeps a private, internal state store of application and cluster information that is versioned. Incompatible changes will cause the version number to be incremented. If an upgrade requires reformatting the store, it will be indicated in the release notes.

View File

@ -477,19 +477,12 @@ rolled back to the older layout.
##### AWS S3A Guard Metadata
For each operation in the Hadoop S3 client (s3a) that reads or modifies
file metadata, a shadow copy of that file metadata is stored in a separate
metadata store, which offers HDFS-like consistency for the metadata, and may
also provide faster lookups for things like file status or directory listings.
S3A guard tables are created with a version marker which indicates
compatibility.
The S3Guard metastore used to store metadata in DynamoDB tables;
as such it had to maintain a compatibility strategy.
Now that S3Guard is removed, the tables are not needed.
###### Policy
The S3A guard metadata schema SHALL be considered
[Private](./InterfaceClassification.html#Private) and
[Unstable](./InterfaceClassification.html#Unstable). Any incompatible change
to the schema MUST result in the version number of the schema being incremented.
Applications configured to use an S3A metadata store other than
the "null" store will fail.
##### YARN Resource Manager State Store

View File

@ -343,7 +343,7 @@ stores pretend that they are a FileSystem, a FileSystem with the same
features and operations as HDFS. This is &mdash;ultimately&mdash;a pretence:
they have different characteristics and occasionally the illusion fails.
1. **Consistency**. Object stores are generally *Eventually Consistent*: it
1. **Consistency**. Object may be *Eventually Consistent*: it
can take time for changes to objects &mdash;creation, deletion and updates&mdash;
to become visible to all callers. Indeed, there is no guarantee a change is
immediately visible to the client which just made the change. As an example,
@ -447,10 +447,6 @@ Object stores have an even vaguer view of time, which can be summarized as
* The timestamp is likely to be in UTC or the TZ of the object store. If the
client is in a different timezone, the timestamp of objects may be ahead or
behind that of the client.
* Object stores with cached metadata databases (for example: AWS S3 with
an in-memory or a DynamoDB metadata store) may have timestamps generated
from the local system clock, rather than that of the service.
This is an optimization to avoid round-trip calls to the object stores.
+ A file's modification time is often the same as its creation time.
+ The `FileSystem.setTimes()` operation to set file timestamps *may* be ignored.
* `FileSystem.chmod()` may update modification times (example: Azure `wasb://`).

View File

@ -203,16 +203,6 @@ in both the task configuration and as a Java option.
Existing configs that already specify both are not affected by this change.
See the full release notes of MAPREDUCE-5785 for more details.
S3Guard: Consistency and Metadata Caching for the S3A filesystem client
---------------------
[HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345) adds an
optional feature to the S3A client of Amazon S3 storage: the ability to use
a DynamoDB table as a fast and consistent store of file and directory
metadata.
See [S3Guard](./hadoop-aws/tools/hadoop-aws/s3guard.html) for more details.
HDFS Router-Based Federation
---------------------
HDFS Router-Based Federation adds a RPC routing layer that provides a federated

View File

@ -29,20 +29,6 @@
<Bug pattern="RCN_REDUNDANT_NULLCHECK_OF_NONNULL_VALUE" />
</Match>
<!--
This extends the serializable S3Object, so findbug checks
serializability. It is never serialized however, so its
warnings are false positives.
-->
<Match>
<Class name="org.apache.hadoop.fs.s3a.InconsistentS3Object" />
<Bug pattern="SE_TRANSIENT_FIELD_NOT_RESTORED" />
</Match>
<Match>
<Class name="org.apache.hadoop.fs.s3a.InconsistentS3Object" />
<Bug pattern="SE_NO_SERIALVERSIONID" />
</Match>
<!--
findbugs gets confused by lambda expressions in synchronized methods
and considers references to fields to be unsynchronized.

View File

@ -45,10 +45,7 @@
<fs.s3a.scale.test.huge.partitionsize>unset</fs.s3a.scale.test.huge.partitionsize>
<!-- Timeout in seconds for scale tests.-->
<fs.s3a.scale.test.timeout>3600</fs.s3a.scale.test.timeout>
<!-- are scale tests enabled ? -->
<fs.s3a.s3guard.test.enabled>false</fs.s3a.s3guard.test.enabled>
<fs.s3a.s3guard.test.authoritative>false</fs.s3a.s3guard.test.authoritative>
<fs.s3a.s3guard.test.implementation>local</fs.s3a.s3guard.test.implementation>
<!-- Set a longer timeout for integration test (in milliseconds) -->
<test.integration.timeout>200000</test.integration.timeout>
@ -166,10 +163,6 @@
<fs.s3a.scale.test.huge.filesize>${fs.s3a.scale.test.huge.filesize}</fs.s3a.scale.test.huge.filesize>
<fs.s3a.scale.test.huge.huge.partitionsize>${fs.s3a.scale.test.huge.partitionsize}</fs.s3a.scale.test.huge.huge.partitionsize>
<fs.s3a.scale.test.timeout>${fs.s3a.scale.test.timeout}</fs.s3a.scale.test.timeout>
<!-- S3Guard -->
<fs.s3a.s3guard.test.enabled>${fs.s3a.s3guard.test.enabled}</fs.s3a.s3guard.test.enabled>
<fs.s3a.s3guard.test.authoritative>${fs.s3a.s3guard.test.authoritative}</fs.s3a.s3guard.test.authoritative>
<fs.s3a.s3guard.test.implementation>${fs.s3a.s3guard.test.implementation}</fs.s3a.s3guard.test.implementation>
<fs.s3a.directory.marker.retention>${fs.s3a.directory.marker.retention}</fs.s3a.directory.marker.retention>
<test.default.timeout>${test.integration.timeout}</test.default.timeout>
@ -193,14 +186,10 @@
<exclude>**/ITestS3AFileContextStatistics.java</exclude>
<exclude>**/ITestS3AEncryptionSSEC*.java</exclude>
<exclude>**/ITestS3AHuge*.java</exclude>
<!-- this sets out to overlaod DynamoDB, so must be run standalone -->
<exclude>**/ITestDynamoDBMetadataStoreScale.java</exclude>
<!-- Terasort MR jobs spawn enough processes that they use up all RAM -->
<exclude>**/ITestTerasort*.java</exclude>
<!-- Root marker tool tests -->
<exclude>**/ITestMarkerToolRootOperations.java</exclude>
<!-- operations across the metastore -->
<exclude>**/ITestS3GuardDDBRootOperations.java</exclude>
<!-- leave this until the end for better statistics -->
<exclude>**/ITestAggregateIOStatistics.java</exclude>
</excludes>
@ -223,10 +212,6 @@
<fs.s3a.scale.test.huge.filesize>${fs.s3a.scale.test.huge.filesize}</fs.s3a.scale.test.huge.filesize>
<fs.s3a.scale.test.huge.huge.partitionsize>${fs.s3a.scale.test.huge.partitionsize}</fs.s3a.scale.test.huge.huge.partitionsize>
<fs.s3a.scale.test.timeout>${fs.s3a.scale.test.timeout}</fs.s3a.scale.test.timeout>
<!-- S3Guard -->
<fs.s3a.s3guard.test.enabled>${fs.s3a.s3guard.test.enabled}</fs.s3a.s3guard.test.enabled>
<fs.s3a.s3guard.test.implementation>${fs.s3a.s3guard.test.implementation}</fs.s3a.s3guard.test.implementation>
<fs.s3a.s3guard.test.authoritative>${fs.s3a.s3guard.test.authoritative}</fs.s3a.s3guard.test.authoritative>
<!-- Markers-->
<fs.s3a.directory.marker.retention>${fs.s3a.directory.marker.retention}</fs.s3a.directory.marker.retention>
<fs.s3a.directory.marker.audit>${fs.s3a.directory.marker.audit}</fs.s3a.directory.marker.audit>
@ -239,8 +224,6 @@
<include>**/ITestS3AHuge*.java</include>
<!-- SSE encrypted files confuse everything else -->
<include>**/ITestS3AEncryptionSSEC*.java</include>
<!-- this sets out to overlaod DynamoDB, so must be run standalone -->
<include>**/ITestDynamoDBMetadataStoreScale.java</include>
<!-- the terasort tests both work with a file in the same path in -->
<!-- the local FS. Running them sequentially guarantees isolation -->
<!-- and that they don't conflict with the other MR jobs for RAM -->
@ -249,9 +232,8 @@
<!-- MUST be run before the other root ops so there's
more likelihood of files in the bucket -->
<include>**/ITestMarkerToolRootOperations.java</include>
<!-- operations across the metastore -->
<!-- operations on the root dir -->
<include>**/ITestS3AContractRootDir.java</include>
<include>**/ITestS3GuardDDBRootOperations.java</include>
<!-- leave this until the end for better statistics -->
<include>**/ITestAggregateIOStatistics.java</include>
</includes>
@ -286,10 +268,6 @@
<fs.s3a.scale.test.enabled>${fs.s3a.scale.test.enabled}</fs.s3a.scale.test.enabled>
<fs.s3a.scale.test.huge.filesize>${fs.s3a.scale.test.huge.filesize}</fs.s3a.scale.test.huge.filesize>
<fs.s3a.scale.test.timeout>${fs.s3a.scale.test.timeout}</fs.s3a.scale.test.timeout>
<!-- S3Guard -->
<fs.s3a.s3guard.test.enabled>${fs.s3a.s3guard.test.enabled}</fs.s3a.s3guard.test.enabled>
<fs.s3a.s3guard.test.implementation>${fs.s3a.s3guard.test.implementation}</fs.s3a.s3guard.test.implementation>
<fs.s3a.s3guard.test.authoritative>${fs.s3a.s3guard.test.authoritative}</fs.s3a.s3guard.test.authoritative>
<!-- Markers-->
<fs.s3a.directory.marker.retention>${fs.s3a.directory.marker.retention}</fs.s3a.directory.marker.retention>
<fs.s3a.directory.marker.audit>${fs.s3a.directory.marker.audit}</fs.s3a.directory.marker.audit>
@ -316,46 +294,6 @@
</properties>
</profile>
<!-- Turn on S3Guard tests-->
<profile>
<id>s3guard</id>
<activation>
<property>
<name>s3guard</name>
</property>
</activation>
<properties >
<fs.s3a.s3guard.test.enabled>true</fs.s3a.s3guard.test.enabled>
</properties>
</profile>
<!-- Switch to DynamoDB for S3Guard. Has no effect unless S3Guard is enabled -->
<profile>
<id>dynamo</id>
<activation>
<property>
<name>dynamo</name>
</property>
</activation>
<properties >
<fs.s3a.s3guard.test.implementation>dynamo</fs.s3a.s3guard.test.implementation>
</properties>
</profile>
<!-- Switch S3Guard from Authoritative=false to true
Has no effect unless S3Guard is enabled -->
<profile>
<id>auth</id>
<activation>
<property>
<name>auth</name>
</property>
</activation>
<properties >
<fs.s3a.s3guard.test.authoritative>true</fs.s3a.s3guard.test.authoritative>
</properties>
</profile>
<!-- Directory marker retention options, all from the -Dmarkers value-->
<profile>
<id>keep-markers</id>

View File

@ -25,11 +25,14 @@
import java.util.concurrent.TimeUnit;
/**
* All the constants used with the {@link S3AFileSystem}.
* Constants used with the {@link S3AFileSystem}.
*
* Some of the strings are marked as {@code Unstable}. This means
* that they may be unsupported in future; at which point they will be marked
* that they may be Unsupported in future; at which point they will be marked
* as deprecated and simply ignored.
*
* All S3Guard related constants are marked as Deprecated and either ignored (ddb config)
* or rejected (setting the metastore to anything other than the null store)
*/
@InterfaceAudience.Public
@InterfaceStability.Evolving
@ -130,7 +133,7 @@ private Constants() {
/**
* JSON policy containing the policy to apply to the role: {@value}.
* This is not used for delegation tokens, which generate the policy
* automatically, and restrict it to the S3, KMS and S3Guard services
* automatically, and restrict it to the S3 and KMS services
* needed.
*/
public static final String ASSUMED_ROLE_POLICY =
@ -494,20 +497,17 @@ private Constants() {
public static final String CUSTOM_SIGNERS = "fs.s3a.custom.signers";
/**
* There's 3 parameters that can be used to specify a non-default signing
* Multiple parameters can be used to specify a non-default signing
* algorithm.<br>
* fs.s3a.signing-algorithm - This property has existed for the longest time.
* If specified, without either of the other 2 properties being specified,
* this signing algorithm will be used for S3 and DDB (S3Guard). <br>
* The other 2 properties override this value for S3 or DDB. <br>
* If specified, without other properties being specified,
* this signing algorithm will be used for all services. <br>
* Another property overrides this value for S3. <br>
* fs.s3a.s3.signing-algorithm - Allows overriding the S3 Signing algorithm.
* This does not affect DDB. Specifying this property without specifying
* Specifying this property without specifying
* fs.s3a.signing-algorithm will only update the signing algorithm for S3
* requests, and the default will be used for DDB.<br>
* fs.s3a.ddb.signing-algorithm - Allows overriding the DDB Signing algorithm.
* This does not affect S3. Specifying this property without specifying
* fs.s3a.signing-algorithm will only update the signing algorithm for
* DDB requests, and the default will be used for S3.
* requests.
* {@code fs.s3a.sts.signing-algorithm}: algorithm to use for STS interaction.
*/
public static final String SIGNING_ALGORITHM = "fs.s3a.signing-algorithm";
@ -515,6 +515,7 @@ private Constants() {
"fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_S3.toLowerCase()
+ ".signing-algorithm";
@Deprecated
public static final String SIGNING_ALGORITHM_DDB =
"fs.s3a." + Constants.AWS_SERVICE_IDENTIFIER_DDB.toLowerCase()
+ "signing-algorithm";
@ -540,13 +541,23 @@ private Constants() {
public static final String USER_AGENT_PREFIX = "fs.s3a.user.agent.prefix";
/** Whether or not to allow MetadataStore to be source of truth for a path prefix */
/**
* Paths considered "authoritative".
* When S3guard was supported, this skipped checks to s3 on directory listings.
* It is also use to optionally disable marker retentation purely on these
* paths -a feature which is still retained/available.
* */
public static final String AUTHORITATIVE_PATH = "fs.s3a.authoritative.path";
public static final String[] DEFAULT_AUTHORITATIVE_PATH = {};
/** Whether or not to allow MetadataStore to be source of truth. */
/**
* Whether or not to allow MetadataStore to be source of truth.
* @deprecated no longer supported
*/
@Deprecated
public static final String METADATASTORE_AUTHORITATIVE =
"fs.s3a.metadatastore.authoritative";
@Deprecated
public static final boolean DEFAULT_METADATASTORE_AUTHORITATIVE = false;
/**
@ -565,13 +576,16 @@ private Constants() {
/**
* How long a directory listing in the MS is considered as authoritative.
* @deprecated no longer supported
*/
@Deprecated
public static final String METADATASTORE_METADATA_TTL =
"fs.s3a.metadatastore.metadata.ttl";
/**
* Default TTL in milliseconds: 15 minutes.
*/
@Deprecated
public static final long DEFAULT_METADATASTORE_METADATA_TTL =
TimeUnit.MINUTES.toMillis(15);
@ -635,202 +649,117 @@ private Constants() {
@InterfaceAudience.Private
public static final int MAX_MULTIPART_COUNT = 10000;
/* Constants. */
/*
* Obsolete S3Guard-related options, retained purely because this file
* is @Public/@Evolving.
*/
@Deprecated
public static final String S3_METADATA_STORE_IMPL =
"fs.s3a.metadatastore.impl";
/**
* Whether to fail when there is an error writing to the metadata store.
*/
@Deprecated
public static final String FAIL_ON_METADATA_WRITE_ERROR =
"fs.s3a.metadatastore.fail.on.write.error";
/**
* Default value ({@value}) for FAIL_ON_METADATA_WRITE_ERROR.
*/
@Deprecated
public static final boolean FAIL_ON_METADATA_WRITE_ERROR_DEFAULT = true;
/** Minimum period of time (in milliseconds) to keep metadata (may only be
* applied when a prune command is manually run).
*/
@InterfaceStability.Unstable
@Deprecated
public static final String S3GUARD_CLI_PRUNE_AGE =
"fs.s3a.s3guard.cli.prune.age";
/**
* The region of the DynamoDB service.
*
* This config has no default value. If the user does not set this, the
* S3Guard will operate table in the associated S3 bucket region.
*/
@Deprecated
public static final String S3GUARD_DDB_REGION_KEY =
"fs.s3a.s3guard.ddb.region";
/**
* The DynamoDB table name to use.
*
* This config has no default value. If the user does not set this, the
* S3Guard implementation will use the respective S3 bucket name.
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_NAME_KEY =
"fs.s3a.s3guard.ddb.table";
/**
* A prefix for adding tags to the DDB Table upon creation.
*
* For example:
* fs.s3a.s3guard.ddb.table.tag.mytag
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_TAG =
"fs.s3a.s3guard.ddb.table.tag.";
/**
* Whether to create the DynamoDB table if the table does not exist.
* Value: {@value}.
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_CREATE_KEY =
"fs.s3a.s3guard.ddb.table.create";
/**
* Read capacity when creating a table.
* When it and the write capacity are both "0", a per-request table is
* created.
* Value: {@value}.
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_CAPACITY_READ_KEY =
"fs.s3a.s3guard.ddb.table.capacity.read";
/**
* Default read capacity when creating a table.
* Value: {@value}.
*/
@Deprecated
public static final long S3GUARD_DDB_TABLE_CAPACITY_READ_DEFAULT = 0;
/**
* Write capacity when creating a table.
* When it and the read capacity are both "0", a per-request table is
* created.
* Value: {@value}.
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_CAPACITY_WRITE_KEY =
"fs.s3a.s3guard.ddb.table.capacity.write";
/**
* Default write capacity when creating a table.
* Value: {@value}.
*/
@Deprecated
public static final long S3GUARD_DDB_TABLE_CAPACITY_WRITE_DEFAULT = 0;
/**
* Whether server-side encryption (SSE) is enabled or disabled on the table.
* By default it's disabled, meaning SSE is set to AWS owned CMK.
* @see com.amazonaws.services.dynamodbv2.model.SSESpecification#setEnabled
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_SSE_ENABLED =
"fs.s3a.s3guard.ddb.table.sse.enabled";
/**
* The KMS Master Key (CMK) used for the KMS encryption on the table.
*
* To specify a CMK, this config value can be its key ID, Amazon Resource
* Name (ARN), alias name, or alias ARN. Users only provide this config
* if the key is different from the default DynamoDB KMS Master Key, which is
* alias/aws/dynamodb.
*/
@Deprecated
public static final String S3GUARD_DDB_TABLE_SSE_CMK =
"fs.s3a.s3guard.ddb.table.sse.cmk";
/**
* The maximum put or delete requests per BatchWriteItem request.
*
* Refer to Amazon API reference for this limit.
*/
@Deprecated
public static final int S3GUARD_DDB_BATCH_WRITE_REQUEST_LIMIT = 25;
@Deprecated
public static final String S3GUARD_DDB_MAX_RETRIES =
"fs.s3a.s3guard.ddb.max.retries";
/**
* Max retries on batched/throttled DynamoDB operations before giving up and
* throwing an IOException. Default is {@value}. See core-default.xml for
* more detail.
*/
@Deprecated
public static final int S3GUARD_DDB_MAX_RETRIES_DEFAULT =
DEFAULT_MAX_ERROR_RETRIES;
@Deprecated
public static final String S3GUARD_DDB_THROTTLE_RETRY_INTERVAL =
"fs.s3a.s3guard.ddb.throttle.retry.interval";
@Deprecated
public static final String S3GUARD_DDB_THROTTLE_RETRY_INTERVAL_DEFAULT =
"100ms";
/**
* Period of time (in milliseconds) to sleep between batches of writes.
* Currently only applies to prune operations, as they are naturally a
* lower priority than other operations.
*/
@Deprecated
@InterfaceStability.Unstable
public static final String S3GUARD_DDB_BACKGROUND_SLEEP_MSEC_KEY =
"fs.s3a.s3guard.ddb.background.sleep";
@Deprecated
public static final int S3GUARD_DDB_BACKGROUND_SLEEP_MSEC_DEFAULT = 25;
/**
* The default "Null" metadata store: {@value}.
*/
@Deprecated
public static final String S3GUARD_METASTORE_NULL
= "org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore";
/**
* Use Local memory for the metadata: {@value}.
* This is not coherent across processes and must be used for testing only.
*/
@Deprecated
@InterfaceStability.Unstable
public static final String S3GUARD_METASTORE_LOCAL
= "org.apache.hadoop.fs.s3a.s3guard.LocalMetadataStore";
/**
* Maximum number of records in LocalMetadataStore.
*/
@InterfaceStability.Unstable
@Deprecated
public static final String S3GUARD_METASTORE_LOCAL_MAX_RECORDS =
"fs.s3a.s3guard.local.max_records";
@Deprecated
public static final int DEFAULT_S3GUARD_METASTORE_LOCAL_MAX_RECORDS = 256;
/**
* Time to live in milliseconds in LocalMetadataStore.
* If zero, time-based expiration is disabled.
*/
@InterfaceStability.Unstable
@Deprecated
public static final String S3GUARD_METASTORE_LOCAL_ENTRY_TTL =
"fs.s3a.s3guard.local.ttl";
@Deprecated
public static final int DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL
= 60 * 1000;
/**
* Use DynamoDB for the metadata: {@value}.
*/
@Deprecated
public static final String S3GUARD_METASTORE_DYNAMO
= "org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore";
/**
* The warn level if S3Guard is disabled.
*/
@Deprecated
public static final String S3GUARD_DISABLED_WARN_LEVEL
= "fs.s3a.s3guard.disabled.warn.level";
@Deprecated
public static final String DEFAULT_S3GUARD_DISABLED_WARN_LEVEL =
"SILENT";
/**
* Inconsistency (visibility delay) injection settings.
* No longer used.
*/
@InterfaceStability.Unstable
@Deprecated
public static final String FAIL_INJECT_INCONSISTENCY_KEY =
"fs.s3a.failinject.inconsistency.key.substring";
@InterfaceStability.Unstable
@Deprecated
public static final String FAIL_INJECT_INCONSISTENCY_MSEC =
"fs.s3a.failinject.inconsistency.msec";
@InterfaceStability.Unstable
@Deprecated
public static final String FAIL_INJECT_INCONSISTENCY_PROBABILITY =
"fs.s3a.failinject.inconsistency.probability";
@ -990,17 +919,20 @@ private Constants() {
* Number of times to retry any repeatable S3 client request on failure,
* excluding throttling requests: {@value}.
*/
@Deprecated
public static final String S3GUARD_CONSISTENCY_RETRY_LIMIT =
"fs.s3a.s3guard.consistency.retry.limit";
/**
* Default retry limit: {@value}.
*/
@Deprecated
public static final int S3GUARD_CONSISTENCY_RETRY_LIMIT_DEFAULT = 7;
/**
* Initial retry interval: {@value}.
*/
@Deprecated
public static final String S3GUARD_CONSISTENCY_RETRY_INTERVAL =
"fs.s3a.s3guard.consistency.retry.interval";
@ -1010,10 +942,12 @@ private Constants() {
* each probe can cause the S3 load balancers to retain any 404 in
* its cache for longer. See HADOOP-16490.
*/
@Deprecated
public static final String S3GUARD_CONSISTENCY_RETRY_INTERVAL_DEFAULT =
"2s";
public static final String AWS_SERVICE_IDENTIFIER_S3 = "S3";
@Deprecated
public static final String AWS_SERVICE_IDENTIFIER_DDB = "DDB";
public static final String AWS_SERVICE_IDENTIFIER_STS = "STS";

View File

@ -28,11 +28,6 @@
/**
* Simple object which stores current failure injection settings.
* "Delaying a key" can mean:
* - Removing it from the S3 client's listings while delay is in effect.
* - Causing input stream reads to fail.
* - Causing the S3 side of getFileStatus(), i.e.
* AmazonS3#getObjectMetadata(), to throw FileNotFound.
*/
public class FailureInjectionPolicy {
/**
@ -40,29 +35,9 @@ public class FailureInjectionPolicy {
*/
public static final String DEFAULT_DELAY_KEY_SUBSTRING = "DELAY_LISTING_ME";
/**
* How many seconds affected keys will have delayed visibility.
* This should probably be a config value.
*/
public static final long DEFAULT_DELAY_KEY_MSEC = 5 * 1000;
public static final float DEFAULT_DELAY_KEY_PROBABILITY = 1.0f;
/** Special config value since we can't store empty strings in XML. */
public static final String MATCH_ALL_KEYS = "*";
private static final Logger LOG =
LoggerFactory.getLogger(InconsistentAmazonS3Client.class);
/** Empty string matches all keys. */
private String delayKeySubstring;
/** Probability to delay visibility of a matching key. */
private float delayKeyProbability;
/** Time in milliseconds to delay visibility of newly modified object. */
private long delayKeyMsec;
/**
* Probability of throttling a request.
*/
@ -75,33 +50,10 @@ public class FailureInjectionPolicy {
public FailureInjectionPolicy(Configuration conf) {
this.delayKeySubstring = conf.get(FAIL_INJECT_INCONSISTENCY_KEY,
DEFAULT_DELAY_KEY_SUBSTRING);
// "" is a substring of all strings, use it to match all keys.
if (this.delayKeySubstring.equals(MATCH_ALL_KEYS)) {
this.delayKeySubstring = "";
}
this.delayKeyProbability = validProbability(
conf.getFloat(FAIL_INJECT_INCONSISTENCY_PROBABILITY,
DEFAULT_DELAY_KEY_PROBABILITY));
this.delayKeyMsec = conf.getLong(FAIL_INJECT_INCONSISTENCY_MSEC,
DEFAULT_DELAY_KEY_MSEC);
this.setThrottleProbability(conf.getFloat(FAIL_INJECT_THROTTLE_PROBABILITY,
0.0f));
}
public String getDelayKeySubstring() {
return delayKeySubstring;
}
public float getDelayKeyProbability() {
return delayKeyProbability;
}
public long getDelayKeyMsec() {
return delayKeyMsec;
}
public float getThrottleProbability() {
return throttleProbability;
}
@ -126,25 +78,10 @@ public static boolean trueWithProbability(float p) {
return Math.random() < p;
}
/**
* Should we delay listing visibility for this key?
* @param key key which is being put
* @return true if we should delay
*/
public boolean shouldDelay(String key) {
float p = getDelayKeyProbability();
boolean delay = key.contains(getDelayKeySubstring());
delay = delay && trueWithProbability(p);
LOG.debug("{}, p={} -> {}", key, p, delay);
return delay;
}
@Override
public String toString() {
return String.format("FailureInjectionPolicy:" +
" %s msec delay, substring %s, delay probability %s;" +
" throttle probability %s" + "; failure limit %d",
delayKeyMsec, delayKeySubstring, delayKeyProbability,
throttleProbability, failureLimit);
}

View File

@ -18,13 +18,8 @@
package org.apache.hadoop.fs.s3a;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.concurrent.atomic.AtomicLong;
import java.util.stream.Collectors;
import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
@ -60,12 +55,13 @@
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
/**
* A wrapper around {@link com.amazonaws.services.s3.AmazonS3} that injects
* inconsistency and/or errors. Used for testing S3Guard.
* Currently only delays listing visibility, not affecting GET.
* failures.
* It used to also inject inconsistency, but this was removed with S3Guard;
* what is retained is the ability to throttle AWS operations and for the
* input stream to be inconsistent.
*/
@InterfaceAudience.Private
@InterfaceStability.Unstable
@ -81,38 +77,6 @@ public class InconsistentAmazonS3Client extends AmazonS3Client {
*/
private final AtomicLong failureCounter = new AtomicLong(0);
/**
* Composite of data we need to track about recently deleted objects:
* when it was deleted (same was with recently put objects) and the object
* summary (since we should keep returning it for sometime after its
* deletion).
*/
private static class Delete {
private Long time;
private S3ObjectSummary summary;
Delete(Long time, S3ObjectSummary summary) {
this.time = time;
this.summary = summary;
}
public Long time() {
return time;
}
public S3ObjectSummary summary() {
return summary;
}
}
/**
* Map of key to delay -> time it was deleted + object summary (object summary
* is null for prefixes.
*/
private Map<String, Delete> delayedDeletes = new HashMap<>();
/** Map of key to delay -> time it was created. */
private Map<String, Long> delayedPutKeys = new HashMap<>();
/**
* Instantiate.
@ -130,19 +94,6 @@ public InconsistentAmazonS3Client(AWSCredentialsProvider credentials,
policy = new FailureInjectionPolicy(conf);
}
/**
* Clear any accumulated inconsistency state. Used by tests to make paths
* visible again.
* @param fs S3AFileSystem under test
* @throws Exception on failure
*/
public static void clearInconsistency(S3AFileSystem fs) throws Exception {
AmazonS3 s3 = fs.getAmazonS3ClientForTesting("s3guard");
InconsistentAmazonS3Client ic = InconsistentAmazonS3Client.castFrom(s3);
ic.clearInconsistency();
}
/**
* A way for tests to patch in a different fault injection policy at runtime.
* @param fs filesystem under test
@ -166,17 +117,6 @@ public String toString() {
policy, failureCounter.get());
}
/**
* Clear all oustanding inconsistent keys. After calling this function,
* listings should behave normally (no failure injection), until additional
* keys are matched for delay, e.g. via putObject(), deleteObject().
*/
public void clearInconsistency() {
LOG.info("clearing all delayed puts / deletes");
delayedDeletes.clear();
delayedPutKeys.clear();
}
/**
* Convenience function for test code to cast from supertype.
* @param c supertype to cast from
@ -199,12 +139,6 @@ public DeleteObjectsResult deleteObjects(DeleteObjectsRequest
deleteObjectsRequest)
throws AmazonClientException, AmazonServiceException {
maybeFail();
LOG.info("registering bulk delete of objects");
for (DeleteObjectsRequest.KeyVersion keyVersion :
deleteObjectsRequest.getKeys()) {
registerDeleteObject(keyVersion.getKey(),
deleteObjectsRequest.getBucketName());
}
return super.deleteObjects(deleteObjectsRequest);
}
@ -214,7 +148,6 @@ public void deleteObject(DeleteObjectRequest deleteObjectRequest)
String key = deleteObjectRequest.getKey();
LOG.debug("key {}", key);
maybeFail();
registerDeleteObject(key, deleteObjectRequest.getBucketName());
super.deleteObject(deleteObjectRequest);
}
@ -224,7 +157,6 @@ public PutObjectResult putObject(PutObjectRequest putObjectRequest)
throws AmazonClientException, AmazonServiceException {
LOG.debug("key {}", putObjectRequest.getKey());
maybeFail();
registerPutObject(putObjectRequest);
return super.putObject(putObjectRequest);
}
@ -233,283 +165,17 @@ public PutObjectResult putObject(PutObjectRequest putObjectRequest)
public ObjectListing listObjects(ListObjectsRequest listObjectsRequest)
throws AmazonClientException, AmazonServiceException {
maybeFail();
return innerlistObjects(listObjectsRequest);
return super.listObjects(listObjectsRequest);
}
/**
* Run the list object call without any failure probability.
* This stops a very aggressive failure rate from completely overloading
* the retry logic.
* @param listObjectsRequest request
* @return listing
* @throws AmazonClientException failure
*/
private ObjectListing innerlistObjects(ListObjectsRequest listObjectsRequest)
throws AmazonClientException, AmazonServiceException {
LOG.debug("prefix {}", listObjectsRequest.getPrefix());
ObjectListing listing = super.listObjects(listObjectsRequest);
listing = filterListObjects(listing);
listing = restoreListObjects(listObjectsRequest, listing);
return listing;
}
/* We should only need to override these versions of listObjects() */
/* consistent listing with possibility of failing. */
@Override
public ListObjectsV2Result listObjectsV2(ListObjectsV2Request request)
throws AmazonClientException, AmazonServiceException {
maybeFail();
return innerListObjectsV2(request);
return super.listObjectsV2(request);
}
/**
* Non failing V2 list object request.
* @param request request
* @return result.
*/
private ListObjectsV2Result innerListObjectsV2(ListObjectsV2Request request) {
LOG.debug("prefix {}", request.getPrefix());
ListObjectsV2Result listing = super.listObjectsV2(request);
listing = filterListObjectsV2(listing);
listing = restoreListObjectsV2(request, listing);
return listing;
}
private void addSummaryIfNotPresent(List<S3ObjectSummary> list,
S3ObjectSummary item) {
// Behavior of S3ObjectSummary
String key = item.getKey();
if (list.stream().noneMatch((member) -> member.getKey().equals(key))) {
LOG.debug("Reinstate summary {}", key);
list.add(item);
}
}
/**
* Add prefix of child to given list. The added prefix will be equal to
* ancestor plus one directory past ancestor. e.g.:
* if ancestor is "/a/b/c" and child is "/a/b/c/d/e/file" then "a/b/c/d" is
* added to list.
* @param prefixes list to add to
* @param ancestor path we are listing in
* @param child full path to get prefix from
*/
private void addPrefixIfNotPresent(List<String> prefixes, String ancestor,
String child) {
Path prefixCandidate = new Path(child).getParent();
Path ancestorPath = new Path(ancestor);
Preconditions.checkArgument(child.startsWith(ancestor), "%s does not " +
"start with %s", child, ancestor);
while (!prefixCandidate.isRoot()) {
Path nextParent = prefixCandidate.getParent();
if (nextParent.equals(ancestorPath)) {
String prefix = prefixCandidate.toString();
if (!prefixes.contains(prefix)) {
LOG.debug("Reinstate prefix {}", prefix);
prefixes.add(prefix);
}
return;
}
prefixCandidate = nextParent;
}
}
/**
* Checks that the parent key is an ancestor of the child key.
* @param parent key that may be the parent.
* @param child key that may be the child.
* @param recursive if false, only return true for direct children. If
* true, any descendant will count.
* @return true if parent is an ancestor of child
*/
private boolean isDescendant(String parent, String child, boolean recursive) {
if (recursive) {
if (!parent.endsWith("/")) {
parent = parent + "/";
}
return child.startsWith(parent);
} else {
Path actualParentPath = new Path(child).getParent();
Path expectedParentPath = new Path(parent);
// children which are directory markers are excluded here
return actualParentPath.equals(expectedParentPath)
&& !child.endsWith("/");
}
}
/**
* Simulate eventual consistency of delete for this list operation: Any
* recently-deleted keys will be added.
* @param request List request
* @param rawListing listing returned from underlying S3
* @return listing with recently-deleted items restored
*/
private ObjectListing restoreListObjects(ListObjectsRequest request,
ObjectListing rawListing) {
List<S3ObjectSummary> outputList = rawListing.getObjectSummaries();
List<String> outputPrefixes = rawListing.getCommonPrefixes();
// recursive list has no delimiter, returns everything that matches a
// prefix.
boolean recursiveObjectList = !("/".equals(request.getDelimiter()));
String prefix = request.getPrefix();
restoreDeleted(outputList, outputPrefixes, recursiveObjectList, prefix);
return new CustomObjectListing(rawListing, outputList, outputPrefixes);
}
/**
* V2 list API variant of
* {@link #restoreListObjects(ListObjectsRequest, ObjectListing)}.
* @param request original v2 list request
* @param result raw s3 result
*/
private ListObjectsV2Result restoreListObjectsV2(ListObjectsV2Request request,
ListObjectsV2Result result) {
List<S3ObjectSummary> outputList = result.getObjectSummaries();
List<String> outputPrefixes = result.getCommonPrefixes();
// recursive list has no delimiter, returns everything that matches a
// prefix.
boolean recursiveObjectList = !("/".equals(request.getDelimiter()));
String prefix = request.getPrefix();
restoreDeleted(outputList, outputPrefixes, recursiveObjectList, prefix);
return new CustomListObjectsV2Result(result, outputList, outputPrefixes);
}
/**
* Main logic for
* {@link #restoreListObjects(ListObjectsRequest, ObjectListing)} and
* the v2 variant above.
* @param summaries object summary list to modify.
* @param prefixes prefix list to modify
* @param recursive true if recursive list request
* @param prefix prefix for original list request
*/
private void restoreDeleted(List<S3ObjectSummary> summaries,
List<String> prefixes, boolean recursive, String prefix) {
// Go through all deleted keys
for (String key : new HashSet<>(delayedDeletes.keySet())) {
Delete delete = delayedDeletes.get(key);
if (isKeyDelayed(delete.time(), key)) {
if (isDescendant(prefix, key, recursive)) {
if (delete.summary() != null) {
addSummaryIfNotPresent(summaries, delete.summary());
}
}
// Non-recursive list has delimiter: will return rolled-up prefixes for
// all keys that are not direct children
if (!recursive) {
if (isDescendant(prefix, key, true)) {
addPrefixIfNotPresent(prefixes, prefix, key);
}
}
} else {
// Clean up any expired entries
LOG.debug("Remove expired key {}", key);
delayedDeletes.remove(key);
}
}
}
private ObjectListing filterListObjects(ObjectListing rawListing) {
// Filter object listing
List<S3ObjectSummary> outputList = filterSummaries(
rawListing.getObjectSummaries());
// Filter prefixes (directories)
List<String> outputPrefixes = filterPrefixes(
rawListing.getCommonPrefixes());
return new CustomObjectListing(rawListing, outputList, outputPrefixes);
}
private ListObjectsV2Result filterListObjectsV2(ListObjectsV2Result raw) {
// Filter object listing
List<S3ObjectSummary> outputList = filterSummaries(
raw.getObjectSummaries());
// Filter prefixes (directories)
List<String> outputPrefixes = filterPrefixes(raw.getCommonPrefixes());
return new CustomListObjectsV2Result(raw, outputList, outputPrefixes);
}
private List<S3ObjectSummary> filterSummaries(
List<S3ObjectSummary> summaries) {
List<S3ObjectSummary> outputList = new ArrayList<>();
for (S3ObjectSummary s : summaries) {
String key = s.getKey();
if (!isKeyDelayed(delayedPutKeys.get(key), key)) {
outputList.add(s);
}
}
return outputList;
}
private List<String> filterPrefixes(List<String> prefixes) {
return prefixes.stream()
.filter(key -> !isKeyDelayed(delayedPutKeys.get(key), key))
.collect(Collectors.toList());
}
private boolean isKeyDelayed(Long enqueueTime, String key) {
if (enqueueTime == null) {
LOG.debug("no delay for key {}", key);
return false;
}
long currentTime = System.currentTimeMillis();
long deadline = enqueueTime + policy.getDelayKeyMsec();
if (currentTime >= deadline) {
delayedDeletes.remove(key);
LOG.debug("no longer delaying {}", key);
return false;
} else {
LOG.info("delaying {}", key);
return true;
}
}
private void registerDeleteObject(String key, String bucket) {
if (policy.shouldDelay(key)) {
Delete delete = delayedDeletes.get(key);
if (delete != null && isKeyDelayed(delete.time(), key)) {
// there is already an entry in the delayed delete list,
// so ignore the operation
LOG.debug("Ignoring delete of already deleted object");
} else {
// Record summary so we can add it back for some time post-deletion
ListObjectsRequest request = new ListObjectsRequest()
.withBucketName(bucket)
.withPrefix(key);
S3ObjectSummary summary = innerlistObjects(request).getObjectSummaries()
.stream()
.filter(result -> result.getKey().equals(key))
.findFirst()
.orElse(null);
delayedDeletes.put(key, new Delete(System.currentTimeMillis(),
summary));
}
}
}
private void registerPutObject(PutObjectRequest req) {
String key = req.getKey();
if (policy.shouldDelay(key)) {
enqueueDelayedPut(key);
}
}
/**
* Record this key as something that should not become visible in
* listObject replies for a while, to simulate eventual list consistency.
* @param key key to delay visibility of
*/
private void enqueueDelayedPut(String key) {
LOG.debug("delaying put of {}", key);
delayedPutKeys.put(key, System.currentTimeMillis());
}
@Override
public CompleteMultipartUploadResult completeMultipartUpload(
@ -542,10 +208,6 @@ public MultipartUploadListing listMultipartUploads(
return super.listMultipartUploads(listMultipartUploadsRequest);
}
public long getDelayKeyMsec() {
return policy.getDelayKeyMsec();
}
/**
* Set the probability of throttling a request.
* @param throttleProbability the probability of a request being throttled.
@ -565,7 +227,7 @@ private void maybeFail(String errorMsg, int statusCode)
throws AmazonClientException {
// code structure here is to line up for more failures later
AmazonServiceException ex = null;
if (policy.trueWithProbability(policy.getThrottleProbability())) {
if (FailureInjectionPolicy.trueWithProbability(policy.getThrottleProbability())) {
// throttle the request
ex = new AmazonServiceException(errorMsg
+ " count = " + (failureCounter.get() + 1), null);
@ -599,18 +261,16 @@ public void setFailureLimit(int limit) {
@Override
public S3Object getObject(GetObjectRequest var1) throws SdkClientException,
AmazonServiceException {
maybeFail("file not found", 404);
S3Object o = super.getObject(var1);
LOG.debug("Wrapping in InconsistentS3Object for key {}", var1.getKey());
return new InconsistentS3Object(o, policy);
maybeFail();
return super.getObject(var1);
}
@Override
public S3Object getObject(String bucketName, String key)
throws SdkClientException, AmazonServiceException {
S3Object o = super.getObject(bucketName, key);
LOG.debug("Wrapping in InconsistentS3Object for key {}", key);
return new InconsistentS3Object(o, policy);
maybeFail();
return super.getObject(bucketName, key);
}
/** Since ObjectListing is immutable, we just override it with wrapper. */

View File

@ -42,6 +42,7 @@ protected AmazonS3 buildAmazonS3Client(
final ClientConfiguration awsConf,
final S3ClientCreationParameters parameters) {
LOG.warn("** FAILURE INJECTION ENABLED. Do not run in production! **");
LOG.warn("List inconsistency is no longer emulated; only throttling and read errors");
InconsistentAmazonS3Client s3
= new InconsistentAmazonS3Client(
parameters.getCredentialSet(), awsConf, getConf());

View File

@ -1,232 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import com.amazonaws.services.s3.internal.AmazonS3ExceptionBuilder;
import com.amazonaws.services.s3.model.AmazonS3Exception;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.model.S3ObjectInputStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* Wrapper around S3Object so we can do failure injection on
* getObjectContent() and S3ObjectInputStream.
* See also {@link InconsistentAmazonS3Client}.
*/
@SuppressWarnings({"NonSerializableFieldInSerializableClass", "serial"})
public class InconsistentS3Object extends S3Object {
// This should be configurable, probably.
public static final int MAX_READ_FAILURES = 100;
private static int readFailureCounter = 0;
private transient S3Object wrapped;
private transient FailureInjectionPolicy policy;
private final static transient Logger LOG = LoggerFactory.getLogger(
InconsistentS3Object.class);
public InconsistentS3Object(S3Object wrapped, FailureInjectionPolicy policy) {
this.wrapped = wrapped;
this.policy = policy;
}
@Override
public S3ObjectInputStream getObjectContent() {
return new InconsistentS3InputStream(wrapped.getObjectContent());
}
@Override
public String toString() {
return "InconsistentS3Object wrapping: " + wrapped.toString();
}
@Override
public ObjectMetadata getObjectMetadata() {
return wrapped.getObjectMetadata();
}
@Override
public void setObjectMetadata(ObjectMetadata metadata) {
wrapped.setObjectMetadata(metadata);
}
@Override
public void setObjectContent(S3ObjectInputStream objectContent) {
wrapped.setObjectContent(objectContent);
}
@Override
public void setObjectContent(InputStream objectContent) {
wrapped.setObjectContent(objectContent);
}
@Override
public String getBucketName() {
return wrapped.getBucketName();
}
@Override
public void setBucketName(String bucketName) {
wrapped.setBucketName(bucketName);
}
@Override
public String getKey() {
return wrapped.getKey();
}
@Override
public void setKey(String key) {
wrapped.setKey(key);
}
@Override
public String getRedirectLocation() {
return wrapped.getRedirectLocation();
}
@Override
public void setRedirectLocation(String redirectLocation) {
wrapped.setRedirectLocation(redirectLocation);
}
@Override
public Integer getTaggingCount() {
return wrapped.getTaggingCount();
}
@Override
public void setTaggingCount(Integer taggingCount) {
wrapped.setTaggingCount(taggingCount);
}
@Override
public void close() throws IOException {
wrapped.close();
}
@Override
public boolean isRequesterCharged() {
return wrapped.isRequesterCharged();
}
@Override
public void setRequesterCharged(boolean isRequesterCharged) {
wrapped.setRequesterCharged(isRequesterCharged);
}
private AmazonS3Exception mockException(String msg, int httpResponse) {
AmazonS3ExceptionBuilder builder = new AmazonS3ExceptionBuilder();
builder.setErrorMessage(msg);
builder.setStatusCode(httpResponse); // this is the important part
builder.setErrorCode(String.valueOf(httpResponse));
return builder.build();
}
/**
* Insert a failiure injection point for a read call.
* @throw IOException, as codepath is on InputStream, not other SDK call.
*/
private void readFailpoint(int off, int len) throws IOException {
if (shouldInjectFailure(getKey())) {
String error = String.format(
"read(b, %d, %d) on key %s failed: injecting error %d/%d" +
" for test.", off, len, getKey(), readFailureCounter,
MAX_READ_FAILURES);
throw new FileNotFoundException(error);
}
}
/**
* Insert a failiure injection point for an InputStream skip() call.
* @throw IOException, as codepath is on InputStream, not other SDK call.
*/
private void skipFailpoint(long len) throws IOException {
if (shouldInjectFailure(getKey())) {
String error = String.format(
"skip(%d) on key %s failed: injecting error %d/%d for test.",
len, getKey(), readFailureCounter, MAX_READ_FAILURES);
throw new FileNotFoundException(error);
}
}
private boolean shouldInjectFailure(String key) {
if (policy.shouldDelay(key) &&
readFailureCounter < MAX_READ_FAILURES) {
readFailureCounter++;
return true;
}
return false;
}
/**
* Wraps S3ObjectInputStream and implements failure injection.
*/
protected class InconsistentS3InputStream extends S3ObjectInputStream {
private S3ObjectInputStream wrapped;
public InconsistentS3InputStream(S3ObjectInputStream wrapped) {
// seems awkward to have the stream wrap itself.
super(wrapped, wrapped.getHttpRequest());
this.wrapped = wrapped;
}
@Override
public void abort() {
wrapped.abort();
}
@Override
public int available() throws IOException {
return wrapped.available();
}
@Override
public void close() throws IOException {
wrapped.close();
}
@Override
public long skip(long n) throws IOException {
skipFailpoint(n);
return wrapped.skip(n);
}
@Override
public int read() throws IOException {
LOG.debug("read() for key {}", getKey());
readFailpoint(0, 1);
return wrapped.read();
}
@Override
public int read(byte[] b, int off, int len) throws IOException {
LOG.debug("read(b, {}, {}) for key {}", off, len, getKey());
readFailpoint(off, len);
return wrapped.read(b, off, len);
}
}
}

View File

@ -21,11 +21,11 @@
import java.io.IOException;
import java.io.InterruptedIOException;
import java.util.Optional;
import java.util.concurrent.Future;
import javax.annotation.Nullable;
import com.amazonaws.AmazonClientException;
import com.amazonaws.SdkBaseException;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@ -34,7 +34,9 @@
import org.apache.hadoop.io.retry.RetryPolicy;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.functional.CallableRaisingIOE;
import org.apache.hadoop.util.functional.FutureIO;
import org.apache.hadoop.util.functional.InvocationRaisingIOE;
import org.apache.hadoop.util.Preconditions;
/**
* Class to provide lambda expression invocation of AWS operations.
@ -137,6 +139,30 @@ public static void once(String action, String path,
});
}
/**
*
* Wait for a future, translating AmazonClientException into an IOException.
* @param action action to execute (used in error messages)
* @param path path of work (used in error messages)
* @param future future to await for
* @param <T> type of return value
* @return the result of the function call
* @throws IOException any IOE raised, or translated exception
* @throws RuntimeException any other runtime exception
*/
@Retries.OnceTranslated
public static <T> T onceInTheFuture(String action,
String path,
final Future<T> future)
throws IOException {
try (DurationInfo ignored = new DurationInfo(LOG, false, "%s", action)) {
return FutureIO.awaitFuture(future);
} catch (AmazonClientException e) {
throw S3AUtils.translateException(action, path, e);
}
}
/**
* Execute an operation and ignore all raised IOExceptions; log at INFO;
* full stack only at DEBUG.

View File

@ -18,15 +18,10 @@
package org.apache.hadoop.fs.s3a;
import javax.annotation.Nullable;
import com.amazonaws.AmazonClientException;
import com.amazonaws.services.s3.model.S3ObjectSummary;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.commons.lang3.tuple.Triple;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
@ -35,10 +30,6 @@
import org.apache.hadoop.fs.s3a.impl.AbstractStoreOperation;
import org.apache.hadoop.fs.s3a.impl.ListingOperationCallbacks;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.fs.s3a.s3guard.DirListingMetadata;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStoreListFilesIterator;
import org.apache.hadoop.fs.s3a.s3guard.PathMetadata;
import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
import org.apache.hadoop.fs.statistics.IOStatistics;
import org.apache.hadoop.fs.statistics.IOStatisticsSource;
import org.apache.hadoop.fs.statistics.impl.IOStatisticsStore;
@ -48,30 +39,21 @@
import org.slf4j.Logger;
import java.io.Closeable;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.time.Instant;
import java.time.OffsetDateTime;
import java.time.ZoneOffset;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;
import java.util.Map;
import java.util.NoSuchElementException;
import java.util.Set;
import java.util.concurrent.CompletableFuture;
import java.util.StringJoiner;
import static org.apache.hadoop.fs.impl.FutureIOSupport.awaitFuture;
import static org.apache.hadoop.fs.s3a.Constants.S3N_FOLDER_SUFFIX;
import static org.apache.hadoop.fs.s3a.Invoker.onceInTheFuture;
import static org.apache.hadoop.fs.s3a.S3AUtils.ACCEPT_ALL;
import static org.apache.hadoop.fs.s3a.S3AUtils.createFileStatus;
import static org.apache.hadoop.fs.s3a.S3AUtils.maybeAddTrailingSlash;
import static org.apache.hadoop.fs.s3a.S3AUtils.objectRepresentsDirectory;
import static org.apache.hadoop.fs.s3a.S3AUtils.stringify;
import static org.apache.hadoop.fs.s3a.S3AUtils.translateException;
import static org.apache.hadoop.fs.s3a.auth.RoleModel.pathToKey;
import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_CONTINUE_LIST_REQUEST;
import static org.apache.hadoop.fs.statistics.StoreStatisticNames.OBJECT_LIST_REQUEST;
@ -136,30 +118,6 @@ public static RemoteIterator<S3AFileStatus> toProvidedFileStatusIterator(
Listing.ACCEPT_ALL_BUT_S3N::accept);
}
/**
* Create a FileStatus iterator against a path, with a given list object
* request.
*
* @param listPath path of the listing
* @param request initial request to make
* @param filter the filter on which paths to accept
* @param acceptor the class/predicate to decide which entries to accept
* in the listing based on the full file status.
* @param span audit span for this iterator
* @return the iterator
* @throws IOException IO Problems
*/
@Retries.RetryRaw
public FileStatusListingIterator createFileStatusListingIterator(
Path listPath,
S3ListRequest request,
PathFilter filter,
Listing.FileStatusAcceptor acceptor,
AuditSpan span) throws IOException {
return createFileStatusListingIterator(listPath, request, filter, acceptor,
null, span);
}
/**
* Create a FileStatus iterator against a path, with a given
* list object request.
@ -168,8 +126,6 @@ public FileStatusListingIterator createFileStatusListingIterator(
* @param filter the filter on which paths to accept
* @param acceptor the class/predicate to decide which entries to accept
* in the listing based on the full file status.
* @param providedStatus the provided list of file status, which may contain
* items that are not listed from source.
* @param span audit span for this iterator
* @return the iterator
* @throws IOException IO Problems
@ -179,14 +135,12 @@ public FileStatusListingIterator createFileStatusListingIterator(
Path listPath,
S3ListRequest request,
PathFilter filter,
Listing.FileStatusAcceptor acceptor,
RemoteIterator<S3AFileStatus> providedStatus,
FileStatusAcceptor acceptor,
AuditSpan span) throws IOException {
return new FileStatusListingIterator(
createObjectListingIterator(listPath, request, span),
filter,
acceptor,
providedStatus);
acceptor);
}
/**
@ -219,28 +173,6 @@ public RemoteIterator<S3ALocatedFileStatus> createLocatedFileStatusIterator(
listingOperationCallbacks::toLocatedFileStatus);
}
/**
* Create an located status iterator that wraps another to filter out a set
* of recently deleted items.
* @param iterator an iterator over the remote located status entries.
* @param tombstones set of paths that are recently deleted and should be
* filtered.
* @return a new remote iterator.
*/
@VisibleForTesting
RemoteIterator<S3ALocatedFileStatus> createTombstoneReconcilingIterator(
RemoteIterator<S3ALocatedFileStatus> iterator,
@Nullable Set<Path> tombstones) {
if (tombstones == null || tombstones.isEmpty()) {
// no need to filter.
return iterator;
} else {
return filteringRemoteIterator(
iterator,
candidate -> !tombstones.contains(candidate.getPath()));
}
}
/**
* Create a remote iterator from a single status entry.
* @param status status
@ -256,20 +188,14 @@ public RemoteIterator<S3ALocatedFileStatus> createSingleStatusIterator(
* @param path input path.
* @param recursive recursive listing?
* @param acceptor file status filter
* @param collectTombstones should tombstones be collected from S3Guard?
* @param forceNonAuthoritativeMS forces metadata store to act like non
* authoritative. This is useful when
* listFiles output is used by import tool.
* @param span audit span for this iterator
* @return an iterator over listing.
* @throws IOException any exception.
*/
public RemoteIterator<S3ALocatedFileStatus> getListFilesAssumingDir(
Path path,
boolean recursive, Listing.FileStatusAcceptor acceptor,
boolean collectTombstones,
boolean forceNonAuthoritativeMS,
AuditSpan span) throws IOException {
Path path,
boolean recursive, FileStatusAcceptor acceptor,
AuditSpan span) throws IOException {
String key = maybeAddTrailingSlash(pathToKey(path));
String delimiter = recursive ? null : "/";
@ -279,82 +205,19 @@ public RemoteIterator<S3ALocatedFileStatus> getListFilesAssumingDir(
LOG.debug("Requesting all entries under {} with delimiter '{}'",
key, delimiter);
}
final RemoteIterator<S3AFileStatus> cachedFilesIterator;
final Set<Path> tombstones;
boolean allowAuthoritative = listingOperationCallbacks
.allowAuthoritative(path);
if (recursive) {
final PathMetadata pm = getStoreContext()
.getMetadataStore()
.get(path, true);
if (pm != null) {
if (pm.isDeleted()) {
OffsetDateTime deletedAt = OffsetDateTime
.ofInstant(Instant.ofEpochMilli(
pm.getFileStatus().getModificationTime()),
ZoneOffset.UTC);
throw new FileNotFoundException("Path " + path + " is recorded as " +
"deleted by S3Guard at " + deletedAt);
}
}
MetadataStoreListFilesIterator metadataStoreListFilesIterator =
new MetadataStoreListFilesIterator(
getStoreContext().getMetadataStore(),
pm,
allowAuthoritative);
tombstones = metadataStoreListFilesIterator.listTombstones();
// if all of the below is true
// - authoritative access is allowed for this metadatastore
// for this directory,
// - all the directory listings are authoritative on the client
// - the caller does not force non-authoritative access
// return the listing without any further s3 access
if (!forceNonAuthoritativeMS &&
allowAuthoritative &&
metadataStoreListFilesIterator.isRecursivelyAuthoritative()) {
S3AFileStatus[] statuses = S3AUtils.iteratorToStatuses(
metadataStoreListFilesIterator, tombstones);
cachedFilesIterator = createProvidedFileStatusIterator(
statuses, ACCEPT_ALL, acceptor);
return createLocatedFileStatusIterator(cachedFilesIterator);
}
cachedFilesIterator = metadataStoreListFilesIterator;
} else {
DirListingMetadata meta =
S3Guard.listChildrenWithTtl(
getStoreContext().getMetadataStore(),
path,
listingOperationCallbacks.getUpdatedTtlTimeProvider(),
allowAuthoritative);
if (meta != null) {
tombstones = meta.listTombstones();
} else {
tombstones = null;
}
cachedFilesIterator = createProvidedFileStatusIterator(
S3Guard.dirMetaToStatuses(meta), ACCEPT_ALL, acceptor);
if (allowAuthoritative && meta != null && meta.isAuthoritative()) {
// metadata listing is authoritative, so return it directly
return createLocatedFileStatusIterator(cachedFilesIterator);
}
}
return createTombstoneReconcilingIterator(
createLocatedFileStatusIterator(
createFileStatusListingIterator(path,
listingOperationCallbacks
.createListObjectsRequest(key,
delimiter,
span),
ACCEPT_ALL,
acceptor,
cachedFilesIterator,
span)),
collectTombstones ? tombstones : null);
return createLocatedFileStatusIterator(
createFileStatusListingIterator(path,
listingOperationCallbacks
.createListObjectsRequest(key,
delimiter,
span),
ACCEPT_ALL,
acceptor,
span));
}
/**
* Generate list located status for a directory.
* Also performing tombstone reconciliation for guarded directories.
* @param dir directory to check.
* @param filter a path filter.
* @param span audit span for this iterator
@ -365,51 +228,14 @@ public RemoteIterator<S3ALocatedFileStatus> getLocatedFileStatusIteratorForDir(
Path dir, PathFilter filter, AuditSpan span) throws IOException {
span.activate();
final String key = maybeAddTrailingSlash(pathToKey(dir));
final Listing.FileStatusAcceptor acceptor =
new Listing.AcceptAllButSelfAndS3nDirs(dir);
boolean allowAuthoritative = listingOperationCallbacks
.allowAuthoritative(dir);
DirListingMetadata meta =
S3Guard.listChildrenWithTtl(getStoreContext().getMetadataStore(),
dir,
listingOperationCallbacks
.getUpdatedTtlTimeProvider(),
allowAuthoritative);
if (meta != null) {
// there's metadata
// convert to an iterator
final RemoteIterator<S3AFileStatus> cachedFileStatusIterator =
createProvidedFileStatusIterator(
S3Guard.dirMetaToStatuses(meta), filter, acceptor);
// if the dir is authoritative and the data considers itself
// to be authorititative.
if (allowAuthoritative && meta.isAuthoritative()) {
// return the list
return createLocatedFileStatusIterator(cachedFileStatusIterator);
} else {
// merge the datasets
return createTombstoneReconcilingIterator(
createLocatedFileStatusIterator(
createFileStatusListingIterator(dir,
listingOperationCallbacks
.createListObjectsRequest(key, "/", span),
filter,
acceptor,
cachedFileStatusIterator,
span)),
meta.listTombstones());
}
} else {
// Unguarded
return createLocatedFileStatusIterator(
createFileStatusListingIterator(dir,
listingOperationCallbacks
.createListObjectsRequest(key, "/", span),
filter,
acceptor,
span));
}
return createLocatedFileStatusIterator(
createFileStatusListingIterator(dir,
listingOperationCallbacks
.createListObjectsRequest(key, "/", span),
filter,
new AcceptAllButSelfAndS3nDirs(dir),
span));
}
/**
@ -417,10 +243,11 @@ public RemoteIterator<S3ALocatedFileStatus> getLocatedFileStatusIteratorForDir(
* to be a non-empty directory.
* @param path input path.
* @param span audit span for this iterator
* @return Triple of file statuses, metaData, auth flag.
* @return iterator of file statuses.
* @throws IOException Any IO problems.
*/
public Triple<RemoteIterator<S3AFileStatus>, DirListingMetadata, Boolean>
@Retries.RetryRaw
public RemoteIterator<S3AFileStatus>
getFileStatusesAssumingNonEmptyDir(Path path, final AuditSpan span)
throws IOException {
String key = pathToKey(path);
@ -428,39 +255,16 @@ public RemoteIterator<S3ALocatedFileStatus> getLocatedFileStatusIteratorForDir(
key = key + '/';
}
boolean allowAuthoritative = listingOperationCallbacks
.allowAuthoritative(path);
DirListingMetadata dirMeta =
S3Guard.listChildrenWithTtl(
getStoreContext().getMetadataStore(),
path,
listingOperationCallbacks.getUpdatedTtlTimeProvider(),
allowAuthoritative);
// In auth mode return directly with auth flag.
if (allowAuthoritative && dirMeta != null && dirMeta.isAuthoritative()) {
RemoteIterator<S3AFileStatus> mfsItr = createProvidedFileStatusIterator(
S3Guard.dirMetaToStatuses(dirMeta),
ACCEPT_ALL,
Listing.ACCEPT_ALL_BUT_S3N);
return Triple.of(mfsItr,
dirMeta, Boolean.TRUE);
}
S3ListRequest request = createListObjectsRequest(key, "/", span);
LOG.debug("listStatus: doing listObjects for directory {}", key);
FileStatusListingIterator filesItr = createFileStatusListingIterator(
path,
request,
ACCEPT_ALL,
new Listing.AcceptAllButSelfAndS3nDirs(path),
span);
// return the results obtained from s3.
return Triple.of(
filesItr,
dirMeta,
Boolean.FALSE);
return createFileStatusListingIterator(
path,
request,
ACCEPT_ALL,
new AcceptAllButSelfAndS3nDirs(path),
span);
}
public S3ListRequest createListObjectsRequest(String key,
@ -542,8 +346,6 @@ class FileStatusListingIterator
/** Iterator over the current set of results. */
private ListIterator<S3AFileStatus> statusBatchIterator;
private final Map<Path, S3AFileStatus> providedStatus;
private Iterator<S3AFileStatus> providedStatusIterator;
/**
* Create an iterator over file status entries.
@ -551,27 +353,17 @@ class FileStatusListingIterator
* @param filter the filter on which paths to accept
* @param acceptor the class/predicate to decide which entries to accept
* in the listing based on the full file status.
* @param providedStatus the provided list of file status, which may contain
* items that are not listed from source.
* @throws IOException IO Problems
*/
@Retries.RetryTranslated
FileStatusListingIterator(ObjectListingIterator source,
PathFilter filter,
FileStatusAcceptor acceptor,
@Nullable RemoteIterator<S3AFileStatus> providedStatus)
FileStatusAcceptor acceptor)
throws IOException {
this.source = source;
this.filter = filter;
this.acceptor = acceptor;
this.providedStatus = new HashMap<>();
for (; providedStatus != null && providedStatus.hasNext();) {
final S3AFileStatus status = providedStatus.next();
Path path = status.getPath();
if (filter.accept(path) && acceptor.accept(status)) {
this.providedStatus.put(path, status);
}
}
// build the first set of results. This will not trigger any
// remote IO, assuming the source iterator is in its initial
// iteration
@ -586,26 +378,17 @@ class FileStatusListingIterator
* Lastly, return true if the {@code providedStatusIterator}
* has left items.
* @return true if a call to {@link #next()} will succeed.
* @throws IOException
* @throws IOException IO Problems
*/
@Override
@Retries.RetryTranslated
public boolean hasNext() throws IOException {
return sourceHasNext() || providedStatusIterator.hasNext();
return sourceHasNext();
}
@Retries.RetryTranslated
private boolean sourceHasNext() throws IOException {
if (statusBatchIterator.hasNext() || requestNextBatch()) {
return true;
} else {
// turn to file status that are only in provided list
if (providedStatusIterator == null) {
LOG.debug("Start iterating the provided status.");
providedStatusIterator = providedStatus.values().iterator();
}
return false;
}
return statusBatchIterator.hasNext() || requestNextBatch();
}
@Override
@ -614,25 +397,8 @@ public S3AFileStatus next() throws IOException {
final S3AFileStatus status;
if (sourceHasNext()) {
status = statusBatchIterator.next();
// We remove from provided map the file status listed by S3 so that
// this does not return duplicate items.
// The provided status is returned as it is assumed to have the better
// metadata (i.e. the eTag and versionId from S3Guard)
S3AFileStatus provided = providedStatus.remove(status.getPath());
if (provided != null) {
LOG.debug(
"Removed and returned the status from provided file status {}",
status);
return provided;
}
} else {
if (providedStatusIterator.hasNext()) {
status = providedStatusIterator.next();
LOG.debug("Returning provided file status {}", status);
} else {
throw new NoSuchElementException();
}
throw new NoSuchElementException();
}
return status;
}
@ -865,24 +631,20 @@ public S3ListResult next() throws IOException {
// clear the firstListing flag for future calls.
firstListing = false;
// Calculating the result of last async list call.
objects = awaitFuture(s3ListResultFuture);
objects = onceInTheFuture("listObjects()", listPath.toString(), s3ListResultFuture);
fetchNextBatchAsyncIfPresent();
} else {
try {
if (objectsPrev!= null && !objectsPrev.isTruncated()) {
// nothing more to request: fail.
throw new NoSuchElementException("No more results in listing of "
+ listPath);
}
// Calculating the result of last async list call.
objects = awaitFuture(s3ListResultFuture);
// Requesting next batch of results.
fetchNextBatchAsyncIfPresent();
listingCount++;
LOG.debug("New listing status: {}", this);
} catch (AmazonClientException e) {
throw translateException("listObjects()", listPath, e);
if (objectsPrev!= null && !objectsPrev.isTruncated()) {
// nothing more to request: fail.
throw new NoSuchElementException("No more results in listing of "
+ listPath);
}
// Calculating the result of last async list call.
objects = onceInTheFuture("listObjects()", listPath.toString(), s3ListResultFuture);
// Requesting next batch of results.
fetchNextBatchAsyncIfPresent();
listingCount++;
LOG.debug("New listing status: {}", this);
}
// Storing the current result to be used by hasNext() call.
objectsPrev = objects;
@ -891,9 +653,8 @@ public S3ListResult next() throws IOException {
/**
* If there are more listings present, call for next batch async.
* @throws IOException
*/
private void fetchNextBatchAsyncIfPresent() throws IOException {
private void fetchNextBatchAsyncIfPresent() {
if (objects.isTruncated()) {
LOG.debug("[{}], Requesting next {} objects under {}",
listingCount, maxKeys, listPath);

View File

@ -1,40 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a;
import org.apache.hadoop.fs.PathIOException;
/**
* Indicates the metadata associated with the given Path could not be persisted
* to the metadata store (e.g. S3Guard / DynamoDB). When this occurs, the
* file itself has been successfully written to S3, but the metadata may be out
* of sync. The metadata can be corrected with the "s3guard import" command
* provided by {@link org.apache.hadoop.fs.s3a.s3guard.S3GuardTool}.
*/
public class MetadataPersistenceException extends PathIOException {
/**
* Constructs a MetadataPersistenceException.
* @param path path of the affected file
* @param cause cause of the issue
*/
public MetadataPersistenceException(String path, Throwable cause) {
super(path, cause);
}
}

View File

@ -25,7 +25,7 @@
/**
* Indicates the S3 object is out of sync with the expected version. Thrown in
* cases such as when the object is updated while an {@link S3AInputStream} is
* open, or when a file expected was never found.
* open, or when a file to be renamed disappeared during the operation.
*/
@SuppressWarnings("serial")
@InterfaceAudience.Public
@ -36,18 +36,10 @@ public class RemoteFileChangedException extends PathIOException {
"Constraints of request were unsatisfiable";
/**
* While trying to get information on a file known to S3Guard, the
* file never became visible in S3.
*/
public static final String FILE_NEVER_FOUND =
"File to rename not found on guarded S3 store after repeated attempts";
/**
* The file wasn't found in rename after a single attempt -the unguarded
* codepath.
* The file disappeaded during a rename between LIST and COPY.
*/
public static final String FILE_NOT_FOUND_SINGLE_ATTEMPT =
"File to rename not found on unguarded S3 store";
"File to rename disappeared during the rename operation.";
/**
* Constructs a RemoteFileChangedException.

View File

@ -674,7 +674,7 @@ private void handleSyncableInvocation() {
}
// downgrading.
WARN_ON_SYNCABLE.warn("Application invoked the Syncable API against"
+ " stream writing to {}. This is unsupported",
+ " stream writing to {}. This is Unsupported",
key);
// and log at debug
LOG.debug("Downgrading Syncable call", ex);

View File

@ -366,10 +366,6 @@ public boolean seekToNewSource(long targetPos) throws IOException {
@Retries.RetryTranslated
private void lazySeek(long targetPos, long len) throws IOException {
// With S3Guard, the metadatastore gave us metadata for the file in
// open(), so we use a slightly different retry policy, but only on initial
// open. After that, an exception generally means the file has changed
// and there is no point retrying anymore.
Invoker invoker = context.getReadInvoker();
invoker.maybeRetry(streamStatistics.getOpenOperations() == 0,
"lazySeek", pathStr, true,
@ -397,7 +393,7 @@ private void incrementBytesRead(long bytesRead) {
}
@Override
@Retries.RetryTranslated // Some retries only happen w/ S3Guard, as intended.
@Retries.RetryTranslated
public synchronized int read() throws IOException {
checkNotClosed();
if (this.contentLength == 0 || (nextReadPos >= contentLength)) {
@ -410,10 +406,6 @@ public synchronized int read() throws IOException {
return -1;
}
// With S3Guard, the metadatastore gave us metadata for the file in
// open(), so we use a slightly different retry policy.
// read() may not be likely to fail, but reopen() does a GET which
// certainly could.
Invoker invoker = context.getReadInvoker();
int byteRead = invoker.retry("read", pathStr, true,
() -> {
@ -478,7 +470,7 @@ private void onReadFailure(IOException ioe, boolean forceAbort) {
* @throws IOException if there are other problems
*/
@Override
@Retries.RetryTranslated // Some retries only happen w/ S3Guard, as intended.
@Retries.RetryTranslated
public synchronized int read(byte[] buf, int off, int len)
throws IOException {
checkNotClosed();
@ -499,10 +491,6 @@ public synchronized int read(byte[] buf, int off, int len)
return -1;
}
// With S3Guard, the metadatastore gave us metadata for the file in
// open(), so we use a slightly different retry policy.
// read() may not be likely to fail, but reopen() does a GET which
// certainly could.
Invoker invoker = context.getReadInvoker();
streamStatistics.readOperationStarted(nextReadPos, len);
@ -766,7 +754,7 @@ public String toString() {
*
*/
@Override
@Retries.RetryTranslated // Some retries only happen w/ S3Guard, as intended.
@Retries.RetryTranslated
public void readFully(long position, byte[] buffer, int offset, int length)
throws IOException {
checkNotClosed();

View File

@ -27,7 +27,6 @@
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.s3a.s3guard.MetastoreInstrumentation;
import org.apache.hadoop.fs.s3a.statistics.BlockOutputStreamStatistics;
import org.apache.hadoop.fs.s3a.statistics.ChangeTrackerStatistics;
import org.apache.hadoop.fs.s3a.statistics.CommitterStatistics;
@ -164,13 +163,7 @@ public class S3AInstrumentation implements Closeable, MetricsSource,
private final MetricsRegistry registry =
new MetricsRegistry("s3aFileSystem").setContext(CONTEXT);
private final MutableQuantiles putLatencyQuantile;
private final MutableQuantiles throttleRateQuantile;
private final MutableQuantiles s3GuardThrottleRateQuantile;
/** Instantiate this without caring whether or not S3Guard is enabled. */
private final S3GuardInstrumentation s3GuardInstrumentation
= new S3GuardInstrumentation();
/**
* This is the IOStatistics store for the S3AFileSystem
@ -224,10 +217,6 @@ public S3AInstrumentation(URI name) {
//todo need a config for the quantiles interval?
int interval = 1;
putLatencyQuantile = quantiles(S3GUARD_METADATASTORE_PUT_PATH_LATENCY,
"ops", "latency", interval);
s3GuardThrottleRateQuantile = quantiles(S3GUARD_METADATASTORE_THROTTLE_RATE,
"events", "frequency (Hz)", interval);
throttleRateQuantile = quantiles(STORE_IO_THROTTLE_RATE,
"events", "frequency (Hz)", interval);
@ -677,15 +666,6 @@ public S3AInputStreamStatistics newInputStreamStatistics(
return new InputStreamStatistics(filesystemStatistics);
}
/**
* Create a MetastoreInstrumentation instrumentation instance.
* There's likely to be at most one instance of this per FS instance.
* @return the S3Guard instrumentation point.
*/
public MetastoreInstrumentation getS3GuardInstrumentation() {
return s3GuardInstrumentation;
}
/**
* Create a new instance of the committer statistics.
* @return a new committer statistics instance
@ -703,9 +683,7 @@ public void close() {
synchronized (METRICS_SYSTEM_LOCK) {
// it is critical to close each quantile, as they start a scheduled
// task in a shared thread pool.
putLatencyQuantile.stop();
throttleRateQuantile.stop();
s3GuardThrottleRateQuantile.stop();
metricsSystem.unregisterSource(metricsSourceName);
metricsSourceActiveCounter--;
int activeSources = metricsSourceActiveCounter;
@ -1617,64 +1595,6 @@ public String toString() {
}
}
/**
* Instrumentation exported to S3Guard.
*/
private final class S3GuardInstrumentation
implements MetastoreInstrumentation {
@Override
public void initialized() {
incrementCounter(S3GUARD_METADATASTORE_INITIALIZATION, 1);
}
@Override
public void storeClosed() {
}
@Override
public void throttled() {
// counters are incremented by owner.
}
@Override
public void retrying() {
// counters are incremented by owner.
}
@Override
public void recordsDeleted(int count) {
incrementCounter(S3GUARD_METADATASTORE_RECORD_DELETES, count);
}
@Override
public void recordsRead(int count) {
incrementCounter(S3GUARD_METADATASTORE_RECORD_READS, count);
}
@Override
public void recordsWritten(int count) {
incrementCounter(S3GUARD_METADATASTORE_RECORD_WRITES, count);
}
@Override
public void directoryMarkedAuthoritative() {
incrementCounter(
S3GUARD_METADATASTORE_AUTHORITATIVE_DIRECTORIES_UPDATED,
1);
}
@Override
public void entryAdded(final long durationNanos) {
addValueToQuantiles(
S3GUARD_METADATASTORE_PUT_PATH_LATENCY,
durationNanos);
incrementCounter(S3GUARD_METADATASTORE_PUT_PATH_REQUEST, 1);
}
}
/**
* Instrumentation exported to S3A Committers.
* The S3AInstrumentation metrics and

View File

@ -38,66 +38,36 @@
@SuppressWarnings("visibilitymodifier")
public class S3AOpContext extends ActiveOperationContext {
final boolean isS3GuardEnabled;
final Invoker invoker;
@Nullable final FileSystem.Statistics stats;
@Nullable final Invoker s3guardInvoker;
/** FileStatus for "destination" path being operated on. */
protected final FileStatus dstFileStatus;
/**
* Alternate constructor that allows passing in two invokers, the common
* one, and another with the S3Guard Retry Policy.
* @param isS3GuardEnabled true if s3Guard is active
* Constructor.
* @param invoker invoker, which contains retry policy
* @param s3guardInvoker s3guard-specific retry policy invoker
* @param stats optional stats object
* @param instrumentation instrumentation to use
* @param dstFileStatus file status from existence check
*/
public S3AOpContext(boolean isS3GuardEnabled, Invoker invoker,
@Nullable Invoker s3guardInvoker,
public S3AOpContext(Invoker invoker,
@Nullable FileSystem.Statistics stats,
S3AStatisticsContext instrumentation,
FileStatus dstFileStatus) {
super(newOperationId(),
instrumentation,
null);
instrumentation
);
Preconditions.checkNotNull(invoker, "Null invoker arg");
Preconditions.checkNotNull(instrumentation, "Null instrumentation arg");
Preconditions.checkNotNull(dstFileStatus, "Null dstFileStatus arg");
this.isS3GuardEnabled = isS3GuardEnabled;
Preconditions.checkArgument(!isS3GuardEnabled || s3guardInvoker != null,
"S3Guard invoker required: S3Guard is enabled.");
this.invoker = invoker;
this.s3guardInvoker = s3guardInvoker;
this.stats = stats;
this.dstFileStatus = dstFileStatus;
}
/**
* Constructor using common invoker and retry policy.
* @param isS3GuardEnabled true if s3Guard is active
* @param invoker invoker, which contains retry policy
* @param stats optional stats object
* @param instrumentation instrumentation to use
* @param dstFileStatus file status from existence check
*/
public S3AOpContext(boolean isS3GuardEnabled,
Invoker invoker,
@Nullable FileSystem.Statistics stats,
S3AStatisticsContext instrumentation,
FileStatus dstFileStatus) {
this(isS3GuardEnabled, invoker, null, stats, instrumentation,
dstFileStatus);
}
public boolean isS3GuardEnabled() {
return isS3GuardEnabled;
}
public Invoker getInvoker() {
return invoker;
}
@ -107,11 +77,6 @@ public FileSystem.Statistics getStats() {
return stats;
}
@Nullable
public Invoker getS3guardInvoker() {
return s3guardInvoker;
}
public FileStatus getDstFileStatus() {
return dstFileStatus;
}

View File

@ -61,9 +61,7 @@ public class S3AReadOpContext extends S3AOpContext {
/**
* Instantiate.
* @param path path of read
* @param isS3GuardEnabled true iff S3Guard is enabled.
* @param invoker invoker for normal retries.
* @param s3guardInvoker S3Guard-specific retry invoker.
* @param stats Fileystem statistics (may be null)
* @param instrumentation statistics context
* @param dstFileStatus target file status
@ -74,9 +72,7 @@ public class S3AReadOpContext extends S3AOpContext {
*/
public S3AReadOpContext(
final Path path,
boolean isS3GuardEnabled,
Invoker invoker,
@Nullable Invoker s3guardInvoker,
@Nullable FileSystem.Statistics stats,
S3AStatisticsContext instrumentation,
FileStatus dstFileStatus,
@ -85,7 +81,7 @@ public S3AReadOpContext(
final long readahead,
final AuditSpan auditSpan) {
super(isS3GuardEnabled, invoker, s3guardInvoker, stats, instrumentation,
super(invoker, stats, instrumentation,
dstFileStatus);
this.path = checkNotNull(path);
this.auditSpan = auditSpan;
@ -98,17 +94,10 @@ public S3AReadOpContext(
/**
* Get invoker to use for read operations.
* When S3Guard is enabled we use the S3Guard invoker,
* which deals with things like FileNotFoundException
* differently.
* @return invoker to use for read codepaths
*/
public Invoker getReadInvoker() {
if (isS3GuardEnabled) {
return s3guardInvoker;
} else {
return invoker;
}
return invoker;
}
/**

View File

@ -31,7 +31,6 @@
import java.util.concurrent.TimeUnit;
import com.amazonaws.AmazonClientException;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@ -77,7 +76,6 @@
* untranslated exceptions, as well as the translated ones.
* @see <a href="http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html">S3 Error responses</a>
* @see <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/ErrorBestPractices.html">Amazon S3 Error Best Practices</a>
* @see <a href="http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/CommonErrors.html">Dynamo DB Commmon errors</a>
*/
@SuppressWarnings("visibilitymodifier") // I want a struct of finals, for real.
public class S3ARetryPolicy implements RetryPolicy {
@ -191,10 +189,6 @@ protected Map<Class<? extends Exception>, RetryPolicy> createExceptionMap() {
policyMap.put(UnknownStoreException.class, fail);
policyMap.put(InvalidRequestException.class, fail);
// metadata stores should do retries internally when it makes sense
// so there is no point doing another layer of retries after that
policyMap.put(MetadataPersistenceException.class, fail);
// once the file has changed, trying again is not going to help
policyMap.put(RemoteFileChangedException.class, fail);
@ -234,11 +228,6 @@ protected Map<Class<? extends Exception>, RetryPolicy> createExceptionMap() {
policyMap.put(AWSS3IOException.class, retryIdempotentCalls);
policyMap.put(SocketTimeoutException.class, retryIdempotentCalls);
// Dynamo DB exceptions
// asking for more than you should get. It's a retry but should be logged
// trigger sleep
policyMap.put(ProvisionedThroughputExceededException.class, throttlePolicy);
return policyMap;
}

View File

@ -27,10 +27,6 @@
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.EnvironmentVariableCredentialsProvider;
import com.amazonaws.retry.RetryUtils;
import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
import com.amazonaws.services.dynamodbv2.model.LimitExceededException;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputExceededException;
import com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException;
import com.amazonaws.services.s3.model.AmazonS3Exception;
import com.amazonaws.services.s3.model.MultiObjectDeleteException;
import com.amazonaws.services.s3.model.S3ObjectSummary;
@ -92,7 +88,6 @@
import static org.apache.hadoop.fs.s3a.impl.InternalConstants.CSE_PADDING_LENGTH;
import static org.apache.hadoop.fs.s3a.impl.MultiObjectDeleteSupport.translateDeleteException;
import static org.apache.hadoop.io.IOUtils.cleanupWithLogger;
import static org.apache.hadoop.util.functional.RemoteIterators.filteringRemoteIterator;
/**
* Utility methods for S3A code.
@ -165,7 +160,6 @@ private S3AUtils() {
*
* @see <a href="http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html">S3 Error responses</a>
* @see <a href="http://docs.aws.amazon.com/AmazonS3/latest/dev/ErrorBestPractices.html">Amazon S3 Error Best Practices</a>
* @see <a href="http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/CommonErrors.html">Dynamo DB Commmon errors</a>
* @param operation operation
* @param path path operated on (must not be null)
* @param exception amazon exception raised
@ -215,11 +209,6 @@ public static IOException translateException(@Nullable String operation,
}
return new AWSClientIOException(message, exception);
} else {
if (exception instanceof AmazonDynamoDBException) {
// special handling for dynamo DB exceptions
return translateDynamoDBException(path, message,
(AmazonDynamoDBException)exception);
}
IOException ioe;
AmazonServiceException ase = (AmazonServiceException) exception;
// this exception is non-null if the service exception is an s3 one
@ -403,8 +392,7 @@ private static InterruptedIOException translateInterruptedException(
/**
* Is the exception an instance of a throttling exception. That
* is an AmazonServiceException with a 503 response, any
* exception from DynamoDB for limits exceeded, an
* is an AmazonServiceException with a 503 response, an
* {@link AWSServiceThrottledException},
* or anything which the AWS SDK's RetryUtils considers to be
* a throttling exception.
@ -413,8 +401,6 @@ private static InterruptedIOException translateInterruptedException(
*/
public static boolean isThrottleException(Exception ex) {
return ex instanceof AWSServiceThrottledException
|| ex instanceof ProvisionedThroughputExceededException
|| ex instanceof LimitExceededException
|| (ex instanceof AmazonServiceException
&& 503 == ((AmazonServiceException)ex).getStatusCode())
|| (ex instanceof SdkBaseException
@ -433,49 +419,6 @@ public static boolean isMessageTranslatableToEOF(SdkBaseException ex) {
ex.toString().contains(EOF_READ_DIFFERENT_LENGTH);
}
/**
* Translate a DynamoDB exception into an IOException.
*
* @param path path in the DDB
* @param message preformatted message for the exception
* @param ddbException exception
* @return an exception to throw.
*/
public static IOException translateDynamoDBException(final String path,
final String message,
final AmazonDynamoDBException ddbException) {
if (isThrottleException(ddbException)) {
return new AWSServiceThrottledException(message, ddbException);
}
if (ddbException instanceof ResourceNotFoundException) {
return (FileNotFoundException) new FileNotFoundException(message)
.initCause(ddbException);
}
final int statusCode = ddbException.getStatusCode();
final String errorCode = ddbException.getErrorCode();
IOException result = null;
// 400 gets used a lot by DDB
if (statusCode == 400) {
switch (errorCode) {
case "AccessDeniedException":
result = (IOException) new AccessDeniedException(
path,
null,
ddbException.toString())
.initCause(ddbException);
break;
default:
result = new AWSBadRequestException(message, ddbException);
}
}
if (result == null) {
result = new AWSServiceIOException(message, ddbException);
}
return result;
}
/**
* Get low level details of an amazon exception for logging; multi-line.
* @param e exception
@ -1258,7 +1201,7 @@ public static ClientConfiguration createAwsConf(Configuration conf,
* @param conf The Hadoop configuration
* @param bucket Optional bucket to use to look up per-bucket proxy secrets
* @param awsServiceIdentifier a string representing the AWS service (S3,
* DDB, etc) for which the ClientConfiguration is being created.
* etc) for which the ClientConfiguration is being created.
* @return new AWS client configuration
* @throws IOException problem creating AWS client configuration
*/
@ -1275,9 +1218,6 @@ public static ClientConfiguration createAwsConf(Configuration conf,
case AWS_SERVICE_IDENTIFIER_S3:
configKey = SIGNING_ALGORITHM_S3;
break;
case AWS_SERVICE_IDENTIFIER_DDB:
configKey = SIGNING_ALGORITHM_DDB;
break;
case AWS_SERVICE_IDENTIFIER_STS:
configKey = SIGNING_ALGORITHM_STS;
break;
@ -1443,21 +1383,16 @@ private static void initUserAgent(Configuration conf,
/**
* Convert the data of an iterator of {@link S3AFileStatus} to
* an array. Given tombstones are filtered out. If the iterator
* does return any item, an empty array is returned.
* an array.
* @param iterator a non-null iterator
* @param tombstones possibly empty set of tombstones
* @return a possibly-empty array of file status entries
* @throws IOException failure
*/
public static S3AFileStatus[] iteratorToStatuses(
RemoteIterator<S3AFileStatus> iterator, Set<Path> tombstones)
RemoteIterator<S3AFileStatus> iterator)
throws IOException {
// this will close the span afterwards
RemoteIterator<S3AFileStatus> source = filteringRemoteIterator(iterator,
st -> !tombstones.contains(st.getPath()));
S3AFileStatus[] statuses = RemoteIterators
.toArray(source, new S3AFileStatus[0]);
.toArray(iterator, new S3AFileStatus[0]);
return statuses;
}

View File

@ -1,76 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a;
import java.io.FileNotFoundException;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.retry.RetryPolicy;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_CONSISTENCY_RETRY_INTERVAL;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_CONSISTENCY_RETRY_INTERVAL_DEFAULT;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_CONSISTENCY_RETRY_LIMIT;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_CONSISTENCY_RETRY_LIMIT_DEFAULT;
import static org.apache.hadoop.io.retry.RetryPolicies.retryUpToMaximumCountWithProportionalSleep;
/**
* Slightly-modified retry policy for cases when the file is present in the
* MetadataStore, but may be still throwing FileNotFoundException from S3.
*/
public class S3GuardExistsRetryPolicy extends S3ARetryPolicy {
private static final Logger LOG = LoggerFactory.getLogger(
S3GuardExistsRetryPolicy.class);
/**
* Instantiate.
* @param conf configuration to read.
*/
public S3GuardExistsRetryPolicy(Configuration conf) {
super(conf);
}
@Override
protected Map<Class<? extends Exception>, RetryPolicy> createExceptionMap() {
Map<Class<? extends Exception>, RetryPolicy> b = super.createExceptionMap();
Configuration conf = getConfiguration();
// base policy from configuration
int limit = conf.getInt(S3GUARD_CONSISTENCY_RETRY_LIMIT,
S3GUARD_CONSISTENCY_RETRY_LIMIT_DEFAULT);
long interval = conf.getTimeDuration(S3GUARD_CONSISTENCY_RETRY_INTERVAL,
S3GUARD_CONSISTENCY_RETRY_INTERVAL_DEFAULT,
TimeUnit.MILLISECONDS);
RetryPolicy retryPolicy = retryUpToMaximumCountWithProportionalSleep(
limit,
interval,
TimeUnit.MILLISECONDS);
LOG.debug("Retrying on recoverable S3Guard table/S3 inconsistencies {}"
+ " times with an initial interval of {}ms", limit, interval);
b.put(FileNotFoundException.class, retryPolicy);
b.put(RemoteFileChangedException.class, retryPolicy);
return b;
}
}

View File

@ -20,7 +20,6 @@
import java.util.Collection;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
@ -28,9 +27,6 @@
import com.amazonaws.services.s3.model.S3ObjectSummary;
import org.slf4j.Logger;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.impl.ContextAccessors;
/**
* API version-independent container for S3 List responses.
*/
@ -101,25 +97,6 @@ public List<String> getCommonPrefixes() {
}
}
/**
* Is the list of object summaries empty
* after accounting for tombstone markers (if provided)?
* @param accessors callback for key to path mapping.
* @param tombstones Set of tombstone markers, or null if not applicable.
* @return false if summaries contains objects not accounted for by
* tombstones.
*/
public boolean isEmptyOfObjects(
final ContextAccessors accessors,
final Set<Path> tombstones) {
if (tombstones == null) {
return getObjectSummaries().isEmpty();
}
return isEmptyOfKeys(accessors,
objectSummaryKeys(),
tombstones);
}
/**
* Get the list of keys in the object summary.
* @return a possibly empty list
@ -131,55 +108,22 @@ private List<String> objectSummaryKeys() {
}
/**
* Does this listing have prefixes or objects after entries with
* tombstones have been stripped?
* @param accessors callback for key to path mapping.
* @param tombstones Set of tombstone markers, or null if not applicable.
* @return true if the reconciled list is non-empty
* Does this listing have prefixes or objects?
* @return true if the result is non-empty
*/
public boolean hasPrefixesOrObjects(
final ContextAccessors accessors,
final Set<Path> tombstones) {
public boolean hasPrefixesOrObjects() {
return !isEmptyOfKeys(accessors, getCommonPrefixes(), tombstones)
|| !isEmptyOfObjects(accessors, tombstones);
}
/**
* Helper function to determine if a collection of keys is empty
* after accounting for tombstone markers (if provided).
* @param accessors callback for key to path mapping.
* @param keys Collection of path (prefixes / directories or keys).
* @param tombstones Set of tombstone markers, or null if not applicable.
* @return true if the list is considered empty.
*/
public boolean isEmptyOfKeys(
final ContextAccessors accessors,
final Collection<String> keys,
final Set<Path> tombstones) {
if (tombstones == null) {
return keys.isEmpty();
}
for (String key : keys) {
Path qualified = accessors.keyToPath(key);
if (!tombstones.contains(qualified)) {
return false;
}
}
return true;
return !(getCommonPrefixes()).isEmpty()
|| !getObjectSummaries().isEmpty();
}
/**
* Does this listing represent an empty directory?
* @param contextAccessors callback for key to path mapping.
* @param dirKey directory key
* @param tombstones Set of tombstone markers, or null if not applicable.
* @return true if the list is considered empty.
*/
public boolean representsEmptyDirectory(
final ContextAccessors contextAccessors,
final String dirKey,
final Set<Path> tombstones) {
final String dirKey) {
// If looking for an empty directory, the marker must exist but
// no children.
// So the listing must contain the marker entry only as an object,
@ -190,7 +134,7 @@ public boolean representsEmptyDirectory(
}
/**
* Dmp the result at debug level.
* Dump the result at debug level.
* @param log log to use
*/
public void logAtDebug(Logger log) {

View File

@ -455,47 +455,6 @@ public enum Statistic {
"Duration Tracking of files uploaded from a local staging path",
TYPE_DURATION),
/* S3guard stats */
S3GUARD_METADATASTORE_PUT_PATH_REQUEST(
"s3guard_metadatastore_put_path_request",
"S3Guard metadata store put one metadata path request",
TYPE_COUNTER),
S3GUARD_METADATASTORE_PUT_PATH_LATENCY(
"s3guard_metadatastore_put_path_latency",
"S3Guard metadata store put one metadata path latency",
TYPE_QUANTILE),
S3GUARD_METADATASTORE_INITIALIZATION(
"s3guard_metadatastore_initialization",
"S3Guard metadata store initialization times",
TYPE_COUNTER),
S3GUARD_METADATASTORE_RECORD_DELETES(
"s3guard_metadatastore_record_deletes",
"S3Guard metadata store records deleted",
TYPE_COUNTER),
S3GUARD_METADATASTORE_RECORD_READS(
"s3guard_metadatastore_record_reads",
"S3Guard metadata store records read",
TYPE_COUNTER),
S3GUARD_METADATASTORE_RECORD_WRITES(
"s3guard_metadatastore_record_writes",
"S3Guard metadata store records written",
TYPE_COUNTER),
S3GUARD_METADATASTORE_RETRY("s3guard_metadatastore_retry",
"S3Guard metadata store retry events",
TYPE_COUNTER),
S3GUARD_METADATASTORE_THROTTLED("s3guard_metadatastore_throttled",
"S3Guard metadata store throttled events",
TYPE_COUNTER),
S3GUARD_METADATASTORE_THROTTLE_RATE(
"s3guard_metadatastore_throttle_rate",
"S3Guard metadata store throttle rate",
TYPE_QUANTILE),
S3GUARD_METADATASTORE_AUTHORITATIVE_DIRECTORIES_UPDATED(
"s3guard_metadatastore_authoritative_directories_updated",
"S3Guard metadata store authoritative directories updated from S3",
TYPE_COUNTER),
/* General Store operations */
STORE_EXISTS_PROBE(StoreStatisticNames.STORE_EXISTS_PROBE,
"Store Existence Probe",

View File

@ -53,8 +53,6 @@
import org.apache.hadoop.fs.s3a.api.RequestFactory;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.fs.s3a.statistics.S3AStatisticsContext;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
import org.apache.hadoop.fs.s3a.select.SelectBinding;
import org.apache.hadoop.fs.store.audit.AuditSpan;
import org.apache.hadoop.fs.store.audit.AuditSpanSource;
@ -82,7 +80,7 @@
* upload process.</li>
* <li>Other low-level access to S3 functions, for private use.</li>
* <li>Failure handling, including converting exceptions to IOEs.</li>
* <li>Integration with instrumentation and S3Guard.</li>
* <li>Integration with instrumentation.</li>
* <li>Evolution to add more low-level operations, such as S3 select.</li>
* </ul>
*
@ -324,7 +322,7 @@ public String initiateMultiPartUpload(String destKey) throws IOException {
/**
* Finalize a multipart PUT operation.
* This completes the upload, and, if that works, calls
* {@link S3AFileSystem#finishedWrite(String, long, String, String, BulkOperationState)}
* {@link S3AFileSystem#finishedWrite(String, long, String, String)}
* to update the filesystem.
* Retry policy: retrying, translated.
* @param destKey destination of the commit
@ -332,7 +330,6 @@ public String initiateMultiPartUpload(String destKey) throws IOException {
* @param partETags list of partial uploads
* @param length length of the upload
* @param retrying retrying callback
* @param operationState (nullable) operational state for a bulk update
* @return the result of the operation.
* @throws IOException on problems.
*/
@ -342,8 +339,7 @@ private CompleteMultipartUploadResult finalizeMultipartUpload(
String uploadId,
List<PartETag> partETags,
long length,
Retried retrying,
@Nullable BulkOperationState operationState) throws IOException {
Retried retrying) throws IOException {
if (partETags.isEmpty()) {
throw new PathIOException(destKey,
"No upload parts in multipart upload");
@ -361,7 +357,7 @@ private CompleteMultipartUploadResult finalizeMultipartUpload(
request);
});
owner.finishedWrite(destKey, length, uploadResult.getETag(),
uploadResult.getVersionId(), operationState);
uploadResult.getVersionId());
return uploadResult;
}
}
@ -397,8 +393,8 @@ public CompleteMultipartUploadResult completeMPUwithRetries(
uploadId,
partETags,
length,
(text, e, r, i) -> errorCount.incrementAndGet(),
null);
(text, e, r, i) -> errorCount.incrementAndGet()
);
}
/**
@ -587,16 +583,14 @@ public UploadResult uploadObject(PutObjectRequest putObjectRequest)
* Relies on retry code in filesystem
* @throws IOException on problems
* @param destKey destination key
* @param operationState operational state for a bulk update
*/
@Retries.OnceTranslated
public void revertCommit(String destKey,
@Nullable BulkOperationState operationState) throws IOException {
public void revertCommit(String destKey) throws IOException {
once("revert commit", destKey,
withinAuditSpan(getAuditSpan(), () -> {
Path destPath = owner.keyToQualifiedPath(destKey);
owner.deleteObjectAtPath(destPath,
destKey, true, operationState);
destKey, true);
owner.maybeCreateFakeParentDirectory(destPath);
}));
}
@ -610,7 +604,6 @@ public void revertCommit(String destKey,
* @param uploadId multipart operation Id
* @param partETags list of partial uploads
* @param length length of the upload
* @param operationState operational state for a bulk update
* @return the result of the operation.
* @throws IOException if problems arose which could not be retried, or
* the retry count was exceeded
@ -620,8 +613,7 @@ public CompleteMultipartUploadResult commitUpload(
String destKey,
String uploadId,
List<PartETag> partETags,
long length,
@Nullable BulkOperationState operationState)
long length)
throws IOException {
checkNotNull(uploadId);
checkNotNull(partETags);
@ -631,32 +623,8 @@ public CompleteMultipartUploadResult commitUpload(
uploadId,
partETags,
length,
Invoker.NO_OP,
operationState);
}
/**
* Initiate a commit operation through any metastore.
* @param path path under which the writes will all take place.
* @return an possibly null operation state from the metastore.
* @throws IOException failure to instantiate.
*/
public BulkOperationState initiateCommitOperation(
Path path) throws IOException {
return initiateOperation(path, BulkOperationState.OperationType.Commit);
}
/**
* Initiate a commit operation through any metastore.
* @param path path under which the writes will all take place.
* @param operationType operation to initiate
* @return an possibly null operation state from the metastore.
* @throws IOException failure to instantiate.
*/
public BulkOperationState initiateOperation(final Path path,
final BulkOperationState.OperationType operationType) throws IOException {
return S3Guard.initiateBulkWrite(owner.getMetadataStore(),
operationType, path);
Invoker.NO_OP
);
}
/**

View File

@ -43,7 +43,6 @@
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathIOException;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.store.audit.AuditSpanSource;
import org.apache.hadoop.util.functional.CallableRaisingIOE;
@ -261,11 +260,9 @@ UploadResult uploadObject(PutObjectRequest putObjectRequest)
* Relies on retry code in filesystem
* @throws IOException on problems
* @param destKey destination key
* @param operationState operational state for a bulk update
*/
@Retries.OnceTranslated
void revertCommit(String destKey,
@Nullable BulkOperationState operationState) throws IOException;
void revertCommit(String destKey) throws IOException;
/**
* This completes a multipart upload to the destination key via
@ -276,7 +273,6 @@ void revertCommit(String destKey,
* @param uploadId multipart operation Id
* @param partETags list of partial uploads
* @param length length of the upload
* @param operationState operational state for a bulk update
* @return the result of the operation.
* @throws IOException if problems arose which could not be retried, or
* the retry count was exceeded
@ -286,29 +282,9 @@ CompleteMultipartUploadResult commitUpload(
String destKey,
String uploadId,
List<PartETag> partETags,
long length,
@Nullable BulkOperationState operationState)
long length)
throws IOException;
/**
* Initiate a commit operation through any metastore.
* @param path path under which the writes will all take place.
* @return an possibly null operation state from the metastore.
* @throws IOException failure to instantiate.
*/
BulkOperationState initiateCommitOperation(
Path path) throws IOException;
/**
* Initiate a commit operation through any metastore.
* @param path path under which the writes will all take place.
* @param operationType operation to initiate
* @return an possibly null operation state from the metastore.
* @throws IOException failure to instantiate.
*/
BulkOperationState initiateOperation(Path path,
BulkOperationState.OperationType operationType) throws IOException;
/**
* Upload part of a multi-partition file.
* @param request request

View File

@ -45,7 +45,6 @@
* don't expect to be able to parse everything.
* It can generate simple models.
* @see <a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/s3-arn-format.html">Example S3 Policies</a>
* @see <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/api-permissions-reference.html">Dynamno DB Permissions</a>
*/
@InterfaceAudience.LimitedPrivate("Tests")
@InterfaceStability.Unstable

View File

@ -32,7 +32,7 @@
/**
* Operations, statements and policies covering the operations
* needed to work with S3 and S3Guard.
* needed to work with S3.
*/
@InterfaceAudience.LimitedPrivate("Tests")
@InterfaceStability.Unstable
@ -200,7 +200,7 @@ private RolePolicies() {
/**
* Actions needed to read a file in S3 through S3A, excluding
* S3Guard and SSE-KMS.
* SSE-KMS.
*/
private static final String[] S3_PATH_READ_OPERATIONS =
new String[]{
@ -213,7 +213,6 @@ private RolePolicies() {
* <ol>
* <li>bucket-level operations</li>
* <li>SSE-KMS key operations</li>
* <li>DynamoDB operations for S3Guard.</li>
* </ol>
* As this excludes the bucket list operations, it is not sufficient
* to read from a bucket on its own.
@ -230,7 +229,6 @@ private RolePolicies() {
* Policies which can be applied to bucket resources for read operations.
* <ol>
* <li>SSE-KMS key operations</li>
* <li>DynamoDB operations for S3Guard.</li>
* </ol>
*/
public static final String[] S3_BUCKET_READ_OPERATIONS =
@ -242,7 +240,7 @@ private RolePolicies() {
/**
* Actions needed to write data to an S3A Path.
* This includes the appropriate read operations, but
* not SSE-KMS or S3Guard support.
* not SSE-KMS support.
*/
public static final List<String> S3_PATH_RW_OPERATIONS =
Collections.unmodifiableList(Arrays.asList(new String[]{
@ -258,7 +256,7 @@ private RolePolicies() {
* This is purely the extra operations needed for writing atop
* of the read operation set.
* Deny these and a path is still readable, but not writeable.
* Excludes: bucket-ARN, SSE-KMS and S3Guard permissions.
* Excludes: bucket-ARN and SSE-KMS permissions.
*/
public static final List<String> S3_PATH_WRITE_OPERATIONS =
Collections.unmodifiableList(Arrays.asList(new String[]{
@ -270,7 +268,7 @@ private RolePolicies() {
/**
* Actions needed for R/W IO from the root of a bucket.
* Excludes: bucket-ARN, SSE-KMS and S3Guard permissions.
* Excludes: bucket-ARN and SSE-KMS permissions.
*/
public static final List<String> S3_ROOT_RW_OPERATIONS =
Collections.unmodifiableList(Arrays.asList(new String[]{
@ -281,79 +279,9 @@ private RolePolicies() {
S3_ABORT_MULTIPART_UPLOAD,
}));
/**
* All DynamoDB operations: {@value}.
*/
public static final String DDB_ALL_OPERATIONS = "dynamodb:*";
/**
* Operations needed for DDB/S3Guard Admin.
* For now: make this {@link #DDB_ALL_OPERATIONS}.
*/
public static final String DDB_ADMIN = DDB_ALL_OPERATIONS;
/**
* Permission for DDB describeTable() operation: {@value}.
* This is used during initialization.
*/
public static final String DDB_DESCRIBE_TABLE = "dynamodb:DescribeTable";
/**
* Permission to query the DDB table: {@value}.
*/
public static final String DDB_QUERY = "dynamodb:Query";
/**
* Permission for DDB operation to get a record: {@value}.
*/
public static final String DDB_GET_ITEM = "dynamodb:GetItem";
/**
* Permission for DDB write record operation: {@value}.
*/
public static final String DDB_PUT_ITEM = "dynamodb:PutItem";
/**
* Permission for DDB update single item operation: {@value}.
*/
public static final String DDB_UPDATE_ITEM = "dynamodb:UpdateItem";
/**
* Permission for DDB delete operation: {@value}.
*/
public static final String DDB_DELETE_ITEM = "dynamodb:DeleteItem";
/**
* Permission for DDB operation: {@value}.
*/
public static final String DDB_BATCH_GET_ITEM = "dynamodb:BatchGetItem";
/**
* Batch write permission for DDB: {@value}.
*/
public static final String DDB_BATCH_WRITE_ITEM = "dynamodb:BatchWriteItem";
/**
* All DynamoDB tables: {@value}.
*/
public static final String ALL_DDB_TABLES = "arn:aws:dynamodb:*";
/**
* Statement to allow all DDB access.
*/
public static final Statement STATEMENT_ALL_DDB =
allowAllDynamoDBOperations(ALL_DDB_TABLES);
/**
* Statement to allow all client operations needed for S3Guard,
* but none of the admin operations.
*/
public static final Statement STATEMENT_S3GUARD_CLIENT =
allowS3GuardClientOperations(ALL_DDB_TABLES);
/**
* Allow all S3 Operations.
* This does not cover DDB or S3-KMS
* This does not cover S3-KMS
*/
public static final Statement STATEMENT_ALL_S3 = statement(true,
S3_ALL_BUCKETS,
@ -368,36 +296,6 @@ private RolePolicies() {
S3_ALL_BUCKETS,
S3_GET_BUCKET_LOCATION);
/**
* Policy for all S3 and S3Guard operations, and SSE-KMS.
*/
public static final Policy ALLOW_S3_AND_SGUARD = policy(
STATEMENT_ALL_S3,
STATEMENT_ALL_DDB,
STATEMENT_ALLOW_SSE_KMS_RW,
STATEMENT_ALL_S3_GET_BUCKET_LOCATION
);
public static Statement allowS3GuardClientOperations(String tableArn) {
return statement(true,
tableArn,
DDB_BATCH_GET_ITEM,
DDB_BATCH_WRITE_ITEM,
DDB_DELETE_ITEM,
DDB_DESCRIBE_TABLE,
DDB_GET_ITEM,
DDB_PUT_ITEM,
DDB_QUERY,
DDB_UPDATE_ITEM
);
}
public static Statement allowAllDynamoDBOperations(String tableArn) {
return statement(true,
tableArn,
DDB_ALL_OPERATIONS);
}
/**
* From an S3 bucket name, build an ARN to refer to it.
* @param bucket bucket name.

View File

@ -31,8 +31,8 @@
* The permissions requested are from the perspective of
* S3A filesystem operations on the data, <i>not</i> the simpler
* model of "permissions on the the remote service".
* As an example, to use S3Guard effectively, the client needs full CRUD
* access to the table, even for {@link AccessLevel#READ}.
* As an example, AWS-KMS encryption permissions must
* also be requested.
*/
public interface AWSPolicyProvider {

View File

@ -210,7 +210,7 @@ public abstract AWSCredentialProviderList deployUnbonded()
/**
* Bind to the token identifier, returning the credential providers to use
* for the owner to talk to S3, DDB and related AWS Services.
* for the owner to talk to S3 and related AWS Services.
* @param retrievedIdentifier the unmarshalled data
* @return non-empty list of AWS credential providers to use for
* authenticating this client with AWS services.

View File

@ -128,8 +128,7 @@ public class S3ADelegationTokens extends AbstractDTService {
/**
* The access policies we want for operations.
* There's no attempt to ask for "admin" permissions here, e.g.
* those to manipulate S3Guard tables.
* There's no attempt to ask for "admin" permissions here.
*/
protected static final EnumSet<AWSPolicyProvider.AccessLevel> ACCESS_POLICY
= EnumSet.of(
@ -420,8 +419,6 @@ public Token<AbstractS3ATokenIdentifier> createDelegationToken(
requireServiceStarted();
checkArgument(encryptionSecrets != null,
"Null encryption secrets");
// this isn't done in in advance as it needs S3Guard initialized in the
// filesystem before it can generate complete policies.
List<RoleModel.Statement> statements = getPolicyProvider()
.listAWSPolicyRules(ACCESS_POLICY);
Optional<RoleModel.Policy> rolePolicy =

View File

@ -441,7 +441,7 @@ protected boolean requiresDelayedCommitOutputInFileSystem() {
return false;
}
/**
* Task recovery considered unsupported: Warn and fail.
* Task recovery considered Unsupported: Warn and fail.
* @param taskContext Context of the task whose output is being recovered
* @throws IOException always.
*/
@ -457,7 +457,7 @@ public void recoverTask(TaskAttemptContext taskContext) throws IOException {
* if the job requires a success marker on a successful job,
* create the file {@link CommitConstants#_SUCCESS}.
*
* While the classic committers create a 0-byte file, the S3Guard committers
* While the classic committers create a 0-byte file, the S3A committers
* PUT up a the contents of a {@link SuccessData} file.
* @param context job context
* @param pending the pending commits
@ -481,7 +481,7 @@ protected void maybeCreateSuccessMarkerFromCommits(JobContext context,
* if the job requires a success marker on a successful job,
* create the file {@link CommitConstants#_SUCCESS}.
*
* While the classic committers create a 0-byte file, the S3Guard committers
* While the classic committers create a 0-byte file, the S3A committers
* PUT up a the contents of a {@link SuccessData} file.
* @param context job context
* @param filenames list of filenames.

View File

@ -18,7 +18,6 @@
package org.apache.hadoop.fs.s3a.commit;
import javax.annotation.Nullable;
import java.io.Closeable;
import java.io.File;
import java.io.FileNotFoundException;
@ -39,7 +38,6 @@
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.Path;
@ -55,12 +53,10 @@
import org.apache.hadoop.fs.s3a.impl.AbstractStoreOperation;
import org.apache.hadoop.fs.s3a.impl.HeaderProcessing;
import org.apache.hadoop.fs.s3a.impl.InternalConstants;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.s3a.statistics.CommitterStatistics;
import org.apache.hadoop.fs.statistics.DurationTracker;
import org.apache.hadoop.fs.statistics.IOStatistics;
import org.apache.hadoop.fs.statistics.IOStatisticsSource;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.Progressable;
@ -70,7 +66,6 @@
import static org.apache.hadoop.fs.s3a.Statistic.COMMITTER_MATERIALIZE_FILE;
import static org.apache.hadoop.fs.s3a.Statistic.COMMITTER_STAGE_FILE_UPLOAD;
import static org.apache.hadoop.fs.s3a.commit.CommitConstants.*;
import static org.apache.hadoop.fs.s3a.Constants.*;
import static org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration;
import static org.apache.hadoop.util.functional.RemoteIterators.cleanupRemoteIterator;
@ -171,13 +166,11 @@ public IOStatistics getIOStatistics() {
/**
* Commit the operation, throwing an exception on any failure.
* @param commit commit to execute
* @param operationState S3Guard state of ongoing operation.
* @throws IOException on a failure
*/
private void commitOrFail(
final SinglePendingCommit commit,
final BulkOperationState operationState) throws IOException {
commit(commit, commit.getFilename(), operationState).maybeRethrow();
final SinglePendingCommit commit) throws IOException {
commit(commit, commit.getFilename()).maybeRethrow();
}
/**
@ -185,13 +178,11 @@ private void commitOrFail(
* and converted to an outcome.
* @param commit entry to commit
* @param origin origin path/string for outcome text
* @param operationState S3Guard state of ongoing operation.
* @return the outcome
*/
private MaybeIOE commit(
final SinglePendingCommit commit,
final String origin,
final BulkOperationState operationState) {
final String origin) {
LOG.debug("Committing single commit {}", commit);
MaybeIOE outcome;
String destKey = "unknown destination";
@ -203,7 +194,7 @@ private MaybeIOE commit(
commit.validate();
destKey = commit.getDestinationKey();
long l = trackDuration(statistics, COMMITTER_MATERIALIZE_FILE.getSymbol(),
() -> innerCommit(commit, operationState));
() -> innerCommit(commit));
LOG.debug("Successful commit of file length {}", l);
outcome = MaybeIOE.NONE;
statistics.commitCompleted(commit.getLength());
@ -226,20 +217,18 @@ private MaybeIOE commit(
/**
* Inner commit operation.
* @param commit entry to commit
* @param operationState S3Guard state of ongoing operation.
* @return bytes committed.
* @throws IOException failure
*/
private long innerCommit(
final SinglePendingCommit commit,
final BulkOperationState operationState) throws IOException {
final SinglePendingCommit commit) throws IOException {
// finalize the commit
writeOperations.commitUpload(
commit.getDestinationKey(),
commit.getUploadId(),
toPartEtags(commit.getEtags()),
commit.getLength(),
operationState);
commit.getLength()
);
return commit.getLength();
}
@ -439,14 +428,6 @@ public void createSuccessMarker(Path outputPath,
if (addMetrics) {
addFileSystemStatistics(successData.getMetrics());
}
// add any diagnostics
Configuration conf = fs.getConf();
successData.addDiagnostic(S3_METADATA_STORE_IMPL,
conf.getTrimmed(S3_METADATA_STORE_IMPL, ""));
successData.addDiagnostic(METADATASTORE_AUTHORITATIVE,
conf.getTrimmed(METADATASTORE_AUTHORITATIVE, "false"));
successData.addDiagnostic(AUTHORITATIVE_PATH,
conf.getTrimmed(AUTHORITATIVE_PATH, ""));
// now write
Path markerPath = new Path(outputPath, _SUCCESS);
@ -461,14 +442,12 @@ public void createSuccessMarker(Path outputPath,
/**
* Revert a pending commit by deleting the destination.
* @param commit pending commit
* @param operationState nullable operational state for a bulk update
* @throws IOException failure
*/
public void revertCommit(SinglePendingCommit commit,
BulkOperationState operationState) throws IOException {
public void revertCommit(SinglePendingCommit commit) throws IOException {
LOG.info("Revert {}", commit);
try {
writeOperations.revertCommit(commit.getDestinationKey(), operationState);
writeOperations.revertCommit(commit.getDestinationKey());
} finally {
statistics.commitReverted();
}
@ -617,7 +596,7 @@ public void jobCompleted(boolean success) {
* @throws IOException failure.
*/
public CommitContext initiateCommitOperation(Path path) throws IOException {
return new CommitContext(writeOperations.initiateCommitOperation(path));
return new CommitContext();
}
/**
@ -647,11 +626,7 @@ public static Optional<Long> extractMagicFileLength(FileSystem fs, Path path)
* Commit context.
*
* It is used to manage the final commit sequence where files become
* visible. It contains a {@link BulkOperationState} field, which, if
* there is a metastore, will be requested from the store so that it
* can track multiple creation operations within the same overall operation.
* This will be null if there is no metastore, or the store chooses not
* to provide one.
* visible.
*
* This can only be created through {@link #initiateCommitOperation(Path)}.
*
@ -660,40 +635,34 @@ public static Optional<Long> extractMagicFileLength(FileSystem fs, Path path)
*/
public final class CommitContext implements Closeable {
/**
* State of any metastore.
*/
private final BulkOperationState operationState;
/**
* Create.
* @param operationState any S3Guard bulk state.
*/
private CommitContext(@Nullable final BulkOperationState operationState) {
this.operationState = operationState;
private CommitContext() {
}
/**
* Commit the operation, throwing an exception on any failure.
* See {@link CommitOperations#commitOrFail(SinglePendingCommit, BulkOperationState)}.
* See {@link CommitOperations#commitOrFail(SinglePendingCommit)}.
* @param commit commit to execute
* @throws IOException on a failure
*/
public void commitOrFail(SinglePendingCommit commit) throws IOException {
CommitOperations.this.commitOrFail(commit, operationState);
CommitOperations.this.commitOrFail(commit);
}
/**
* Commit a single pending commit; exceptions are caught
* and converted to an outcome.
* See {@link CommitOperations#commit(SinglePendingCommit, String, BulkOperationState)}.
* See {@link CommitOperations#commit(SinglePendingCommit, String)}.
* @param commit entry to commit
* @param origin origin path/string for outcome text
* @return the outcome
*/
public MaybeIOE commit(SinglePendingCommit commit,
String origin) {
return CommitOperations.this.commit(commit, origin, operationState);
return CommitOperations.this.commit(commit, origin);
}
/**
@ -708,13 +677,13 @@ public void abortSingleCommit(final SinglePendingCommit commit)
}
/**
* See {@link CommitOperations#revertCommit(SinglePendingCommit, BulkOperationState)}.
* See {@link CommitOperations#revertCommit(SinglePendingCommit)}.
* @param commit pending commit
* @throws IOException failure
*/
public void revertCommit(final SinglePendingCommit commit)
throws IOException {
CommitOperations.this.revertCommit(commit, operationState);
CommitOperations.this.revertCommit(commit);
}
/**
@ -733,14 +702,12 @@ public void abortMultipartCommit(
@Override
public void close() throws IOException {
IOUtils.cleanupWithLogger(LOG, operationState);
}
@Override
public String toString() {
final StringBuilder sb = new StringBuilder(
"CommitContext{");
sb.append("operationState=").append(operationState);
sb.append('}');
return sb.toString();
}

View File

@ -47,7 +47,7 @@
* <ol>
* <li>File length == 0: classic {@code FileOutputCommitter}.</li>
* <li>Loadable as {@link SuccessData}:
* A s3guard committer with name in in {@link #committer} field.</li>
* An S3A committer with name in in {@link #committer} field.</li>
* <li>Not loadable? Something else.</li>
* </ol>
*

View File

@ -18,12 +18,9 @@
package org.apache.hadoop.fs.s3a.impl;
import javax.annotation.Nullable;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicLong;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.s3a.statistics.S3AStatisticsContext;
/**
@ -41,19 +38,12 @@ public class ActiveOperationContext {
*/
private final S3AStatisticsContext statisticsContext;
/**
* S3Guard bulk operation state, if (currently) set.
*/
@Nullable private BulkOperationState bulkOperationState;
public ActiveOperationContext(
final long operationId,
final S3AStatisticsContext statisticsContext,
@Nullable final BulkOperationState bulkOperationState) {
final S3AStatisticsContext statisticsContext) {
this.operationId = operationId;
this.statisticsContext = Objects.requireNonNull(statisticsContext,
"null statistics context");
this.bulkOperationState = bulkOperationState;
}
@Override
@ -61,16 +51,10 @@ public String toString() {
final StringBuilder sb = new StringBuilder(
"ActiveOperation{");
sb.append("operationId=").append(operationId);
sb.append(", bulkOperationState=").append(bulkOperationState);
sb.append('}');
return sb.toString();
}
@Nullable
public BulkOperationState getBulkOperationState() {
return bulkOperationState;
}
public long getOperationId() {
return operationId;
}

View File

@ -44,7 +44,7 @@
/**
* Change tracking for input streams: the version ID or etag of the object is
* tracked and compared on open/re-open. An initial version ID or etag may or
* may not be available, depending on usage (e.g. if S3Guard is utilized).
* may not be available.
*
* Self-contained for testing and use in different streams.
*/

View File

@ -18,7 +18,6 @@
package org.apache.hadoop.fs.s3a.impl;
import javax.annotation.Nullable;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
@ -40,10 +39,6 @@
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
import org.apache.hadoop.fs.s3a.s3guard.S3Guard;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.DurationInfo;
import static org.apache.hadoop.fs.store.audit.AuditingFunctions.callableWithinAuditSpan;
@ -53,69 +48,22 @@
/**
* Implementation of the delete() operation.
* <p>
* How S3Guard/Store inconsistency is handled:
* <ol>
* <li>
* The list operation does not ask for tombstone markers; objects
* under tombstones will be found and deleted.
* The {@code extraFilesDeleted} counter will be incremented here.
* </li>
* <li>
* That may result in recently deleted files being found and
* duplicate delete requests issued. This is mostly harmless.
* </li>
* <li>
* If a path is considered authoritative on the client, so only S3Guard
* is used for listings, we wrap up the delete with a scan of raw S3.
* This will find and eliminate OOB additions.
* </li>
* <li>
* Exception 1: simple directory markers of the form PATH + "/".
* These are treated as a signal that there are no children; no
* listing is made.
* </li>
* <li>
* Exception 2: delete(path, true) where path has a tombstone in S3Guard.
* Here the delete is downgraded to a no-op even before this operation
* is created. Thus: no listings of S3.
* </li>
* </ol>
* If this class is logged at debug, requests will be audited:
* the response to a bulk delete call will be reviewed to see if there
* were fewer files deleted than requested; that will be printed
* at WARN level. This is independent of handling rejected delete
* requests which raise exceptions -those are processed lower down.
* <p>
* Performance tuning:
* <p>
* The operation to POST a delete request (or issue many individual
* DELETE calls) then update the S3Guard table is done in an async
* operation so that it can overlap with the LIST calls for data.
* However, only one single operation is queued at a time.
* <p>
* Executing more than one batch delete is possible, it just
* adds complexity in terms of error handling as well as in
* the datastructures used to track outstanding operations.
* This issues only one bulk delete at a time,
* intending to update S3Guard after every request succeeded.
* Now that S3Guard has been removed, it
* would be possible to issue multiple delete calls
* in parallel.
* If this is done, then it may be good to experiment with different
* page sizes. The default value is
* {@link InternalConstants#MAX_ENTRIES_TO_DELETE}, the maximum a single
* POST permits.
* <p>
* 1. Smaller pages executed in parallel may have different
* performance characteristics when deleting very large directories,
* because it will be the DynamoDB calls which will come to dominate.
* Smaller pages executed in parallel may have different
* performance characteristics when deleting very large directories.
* Any exploration of options here MUST be done with performance
* measurements taken from test runs in EC2 against local DDB and S3 stores,
* measurements taken from test runs in EC2 against local S3 stores,
* so as to ensure network latencies do not skew the results.
* <p>
* 2. Note that as the DDB thread/connection pools will be shared across
* all active delete operations, speedups will be minimal unless
* those pools are large enough to cope the extra load.
* <p>
* There are also some opportunities to explore in
* {@code DynamoDBMetadataStore} with batching delete requests
* in the DDB APIs.
*/
public class DeleteOperation extends ExecutingStoreOperation<Boolean> {
@ -142,11 +90,6 @@ public class DeleteOperation extends ExecutingStoreOperation<Boolean> {
*/
private final int pageSize;
/**
* Metastore -never null but may be the NullMetadataStore.
*/
private final MetadataStore metadataStore;
/**
* Executor for async operations.
*/
@ -157,35 +100,16 @@ public class DeleteOperation extends ExecutingStoreOperation<Boolean> {
*/
private List<DeleteEntry> keys;
/**
* List of paths built up for incremental deletion on tree delete.
* At the end of the entire delete the full tree is scanned in S3Guard
* and tombstones added. For this reason this list of paths <i>must not</i>
* include directory markers, as that will break the scan.
*/
private List<Path> paths;
/**
* The single async delete operation, or null.
*/
private CompletableFuture<Void> deleteFuture;
/**
* Bulk Operation state if this is a bulk operation.
*/
private BulkOperationState operationState;
/**
* Counter of deleted files.
*/
private long filesDeleted;
/**
* Counter of files found in the S3 Store during a raw scan of the store
* after the previous listing was in auth-mode.
*/
private long extraFilesDeleted;
/**
* Constructor.
* @param context store context
@ -208,7 +132,6 @@ public DeleteOperation(final StoreContext context,
&& pageSize <= InternalConstants.MAX_ENTRIES_TO_DELETE,
"page size out of range: %s", pageSize);
this.pageSize = pageSize;
metadataStore = context.getMetadataStore();
executor = MoreExecutors.listeningDecorator(
context.createThrottledExecutor(1));
}
@ -217,10 +140,6 @@ public long getFilesDeleted() {
return filesDeleted;
}
public long getExtraFilesDeleted() {
return extraFilesDeleted;
}
/**
* Delete a file or directory tree.
* <p>
@ -230,10 +149,14 @@ public long getExtraFilesDeleted() {
* Only one delete at a time is submitted, however, to reduce the
* complexity of recovering from failures.
* <p>
* The DynamoDB store deletes paths in parallel itself, so that
* potentially slow part of the process is somewhat speeded up.
* The extra parallelization here is to list files from the store/DDB while
* that delete operation is in progress.
* With S3Guard removed, the problem of updating any
* DynamoDB store has gone away -delete calls could now be issued
* in parallel.
* However, rate limiting may be required to keep write load
* below the throttling point. Every entry in a single
* bulk delete call counts as a single write request -overloading
* an S3 partition with delete calls has been a problem in
* the past.
*
* @return true, except in the corner cases of root directory deletion
* @throws PathIsNotEmptyDirectoryException if the path is a dir and this
@ -294,9 +217,7 @@ public Boolean execute() throws IOException {
* Delete a directory tree.
* <p>
* This is done by asking the filesystem for a list of all objects under
* the directory path, without using any S3Guard tombstone markers to hide
* objects which may be returned in S3 listings but which are considered
* deleted.
* the directory path.
* <p>
* Once the first {@link #pageSize} worth of objects has been listed, a batch
* delete is queued for execution in a separate thread; subsequent batches
@ -304,9 +225,6 @@ public Boolean execute() throws IOException {
* being deleted in the separate thread.
* <p>
* After all listed objects are queued for deletion,
* if the path is considered authoritative in the client, a final scan
* of S3 <i>without S3Guard</i> is executed, so as to find and delete
* any out-of-band objects in the tree.
* @param path directory path
* @param dirKey directory key
* @throws IOException failure
@ -314,12 +232,7 @@ public Boolean execute() throws IOException {
@Retries.RetryTranslated
protected void deleteDirectoryTree(final Path path,
final String dirKey) throws IOException {
// create an operation state so that the store can manage the bulk
// operation if it needs to
operationState = S3Guard.initiateBulkWrite(
metadataStore,
BulkOperationState.OperationType.Delete,
path);
try (DurationInfo ignored =
new DurationInfo(LOG, false, "deleting %s", dirKey)) {
@ -327,11 +240,11 @@ protected void deleteDirectoryTree(final Path path,
resetDeleteList();
deleteFuture = null;
// list files including any under tombstones through S3Guard
// list files
LOG.debug("Getting objects for directory prefix {} to delete", dirKey);
final RemoteIterator<S3ALocatedFileStatus> locatedFiles =
callbacks.listFilesAndDirectoryMarkers(path, status,
false, true);
true);
// iterate through and delete. The next() call will block when a new S3
// page is required; this any active delete submitted to the executor
@ -345,50 +258,6 @@ protected void deleteDirectoryTree(final Path path,
submitNextBatch();
maybeAwaitCompletion(deleteFuture);
// if s3guard is authoritative we follow up with a bulk list and
// delete process on S3 this helps recover from any situation where S3
// and S3Guard have become inconsistent.
// This is only needed for auth paths; by performing the previous listing
// without tombstone filtering, any files returned by the non-auth
// S3 list which were hidden under tombstones will have been found
// and deleted.
if (callbacks.allowAuthoritative(path)) {
LOG.debug("Path is authoritatively guarded;"
+ " listing files on S3 for completeness");
// let the ongoing delete finish to avoid duplicates
final RemoteIterator<S3AFileStatus> objects =
callbacks.listObjects(path, dirKey);
// iterate through and delete. The next() call will block when a new S3
// page is required; this any active delete submitted to the executor
// will run in parallel with this.
while (objects.hasNext()) {
// get the next entry in the listing.
extraFilesDeleted++;
S3AFileStatus next = objects.next();
LOG.debug("Found Unlisted entry {}", next);
queueForDeletion(deletionKey(next), null,
next.isDirectory());
}
if (extraFilesDeleted > 0) {
LOG.debug("Raw S3 Scan found {} extra file(s) to delete",
extraFilesDeleted);
// there is no more data:
// await any ongoing operation
submitNextBatch();
maybeAwaitCompletion(deleteFuture);
}
}
// final cleanup of the directory tree in the metastore, including the
// directory entry itself.
try (DurationInfo ignored2 =
new DurationInfo(LOG, false, "Delete metastore")) {
metadataStore.deleteSubtree(path, operationState);
}
} finally {
IOUtils.cleanupWithLogger(LOG, operationState);
}
LOG.debug("Delete \"{}\" completed; deleted {} objects", path,
filesDeleted);
@ -412,7 +281,7 @@ private String deletionKey(final S3AFileStatus stat) {
*/
private void queueForDeletion(
final S3AFileStatus stat) throws IOException {
queueForDeletion(deletionKey(stat), stat.getPath(), stat.isDirectory());
queueForDeletion(deletionKey(stat), stat.isDirectory());
}
/**
@ -422,21 +291,13 @@ private void queueForDeletion(
* complete.
*
* @param key key to delete
* @param deletePath nullable path of the key
* @param isDirMarker is the entry a directory?
* @throws IOException failure of the previous batch of deletions.
*/
private void queueForDeletion(final String key,
@Nullable final Path deletePath,
boolean isDirMarker) throws IOException {
LOG.debug("Adding object to delete: \"{}\"", key);
keys.add(new DeleteEntry(key, isDirMarker));
if (deletePath != null) {
if (!isDirMarker) {
paths.add(deletePath);
}
}
if (keys.size() == pageSize) {
submitNextBatch();
}
@ -455,7 +316,7 @@ private void submitNextBatch()
maybeAwaitCompletion(deleteFuture);
// delete the current page of keys and paths
deleteFuture = submitDelete(keys, paths);
deleteFuture = submitDelete(keys);
// reset the references so a new list can be built up.
resetDeleteList();
}
@ -466,7 +327,6 @@ private void submitNextBatch()
*/
private void resetDeleteList() {
keys = new ArrayList<>(pageSize);
paths = new ArrayList<>(pageSize);
}
/**
@ -484,33 +344,28 @@ private void deleteObjectAtPath(
throws IOException {
LOG.debug("delete: {} {}", (isFile ? "file" : "dir marker"), key);
filesDeleted++;
callbacks.deleteObjectAtPath(path, key, isFile, operationState);
callbacks.deleteObjectAtPath(path, key, isFile);
}
/**
* Delete a single page of keys and optionally the metadata.
* For a large page, it is the metadata size which dominates.
* Its possible to invoke this with empty lists of keys or paths.
* If both lists are empty no work is submitted and null is returned.
* Delete a single page of keys.
* If the list is empty no work is submitted and null is returned.
*
* @param keyList keys to delete.
* @param pathList paths to update the metastore with.
* @return the submitted future or null
*/
private CompletableFuture<Void> submitDelete(
final List<DeleteEntry> keyList,
final List<Path> pathList) {
final List<DeleteEntry> keyList) {
if (keyList.isEmpty() && pathList.isEmpty()) {
if (keyList.isEmpty()) {
return null;
}
filesDeleted += keyList.size();
return submit(executor,
callableWithinAuditSpan(
getAuditSpan(), () -> {
asyncDeleteAction(operationState,
asyncDeleteAction(
keyList,
pathList,
LOG.isDebugEnabled());
return null;
}));
@ -520,26 +375,21 @@ private CompletableFuture<Void> submitDelete(
* The action called in the asynchronous thread to delete
* the keys from S3 and paths from S3Guard.
*
* @param state ongoing operation state
* @param keyList keys to delete.
* @param pathList paths to update the metastore with.
* @param auditDeletedKeys should the results be audited and undeleted
* entries logged?
* @throws IOException failure
*/
@Retries.RetryTranslated
private void asyncDeleteAction(
final BulkOperationState state,
final List<DeleteEntry> keyList,
final List<Path> pathList,
final boolean auditDeletedKeys)
throws IOException {
List<DeleteObjectsResult.DeletedObject> deletedObjects = new ArrayList<>();
try (DurationInfo ignored =
new DurationInfo(LOG, false,
"Delete page of %d keys", keyList.size())) {
DeleteObjectsResult result = null;
List<Path> undeletedObjects = new ArrayList<>();
DeleteObjectsResult result;
if (!keyList.isEmpty()) {
// first delete the files.
List<DeleteObjectsRequest.KeyVersion> files = keyList.stream()
@ -552,8 +402,6 @@ private void asyncDeleteAction(
() -> callbacks.removeKeys(
files,
false,
undeletedObjects,
state,
!auditDeletedKeys));
if (result != null) {
deletedObjects.addAll(result.getDeletedObjects());
@ -564,26 +412,17 @@ private void asyncDeleteAction(
.map(e -> e.keyVersion)
.collect(Collectors.toList());
LOG.debug("Deleting of {} directory markers", dirs.size());
// This is invoked with deleteFakeDir = true, so
// S3Guard is not updated.
// This is invoked with deleteFakeDir.
result = Invoker.once("Remove S3 Dir Markers",
status.getPath().toString(),
() -> callbacks.removeKeys(
dirs,
true,
undeletedObjects,
state,
!auditDeletedKeys));
if (result != null) {
deletedObjects.addAll(result.getDeletedObjects());
}
}
if (!pathList.isEmpty()) {
// delete file paths only. This stops tombstones
// being added until the final directory cleanup
// (HADOOP-17244)
metadataStore.deletePaths(pathList, state);
}
if (auditDeletedKeys) {
// audit the deleted keys
if (deletedObjects.size() != keyList.size()) {
@ -605,8 +444,11 @@ private void asyncDeleteAction(
}
/**
* Deletion entry; dir marker state is tracked to control S3Guard
* update policy.
* Deletion entry; dir marker state is tracked to allow
* delete requests to be split into file
* and marker delete phases.
* Without S3Guard, the split is only used
* to choose which statistics to update.
*/
private static final class DeleteEntry {
private final DeleteObjectsRequest.KeyVersion keyVersion;

View File

@ -137,17 +137,6 @@ private InternalConstants() {
*/
public static final int CSE_PADDING_LENGTH = 16;
/**
* Error message to indicate S3-CSE is incompatible with S3Guard.
*/
public static final String CSE_S3GUARD_INCOMPATIBLE = "S3-CSE cannot be "
+ "used with S3Guard";
/**
* Error message to indicate Access Points are incompatible with S3Guard.
*/
public static final String AP_S3GUARD_INCOMPATIBLE = "Access Points cannot be used with S3Guard";
/**
* Error message to indicate Access Points are required to be used for S3 access.
*/

View File

@ -28,7 +28,6 @@
import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
import org.apache.hadoop.fs.s3a.S3ListRequest;
import org.apache.hadoop.fs.s3a.S3ListResult;
import org.apache.hadoop.fs.s3a.s3guard.ITtlTimeProvider;
import org.apache.hadoop.fs.statistics.DurationTrackerFactory;
import org.apache.hadoop.fs.store.audit.AuditSpan;
@ -44,37 +43,33 @@ public interface ListingOperationCallbacks {
* Initiate a {@code listObjectsAsync} operation, incrementing metrics
* in the process.
*
* Retry policy: retry untranslated.
* Retry policy: failures will come from the future.
* @param request request to initiate
* @param trackerFactory tracker with statistics to update
* @param span audit span for this operation
* @return the results
* @throws IOException if the retry invocation raises one (it shouldn't).
*/
@Retries.RetryRaw
CompletableFuture<S3ListResult> listObjectsAsync(
S3ListRequest request,
DurationTrackerFactory trackerFactory,
AuditSpan span)
throws IOException;
AuditSpan span);
/**
* List the next set of objects.
* Retry policy: retry untranslated.
* Retry policy: failures will come from the future.
* @param request last list objects request to continue
* @param prevResult last paged result to continue from
* @param trackerFactory tracker with statistics to update
* @param span audit span for the IO
* @return the next result object
* @throws IOException none, just there for retryUntranslated.
*/
@Retries.RetryRaw
CompletableFuture<S3ListResult> continueListObjectsAsync(
S3ListRequest request,
S3ListResult prevResult,
DurationTrackerFactory trackerFactory,
AuditSpan span)
throws IOException;
AuditSpan span);
/**
* Build a {@link S3ALocatedFileStatus} from a {@link FileStatus} instance.
@ -116,19 +111,4 @@ S3ListRequest createListObjectsRequest(
*/
int getMaxKeys();
/**
* Get the updated time provider for the current fs instance.
* @return implementation of {@link ITtlTimeProvider}
*/
ITtlTimeProvider getUpdatedTtlTimeProvider();
/**
* Is the path for this instance considered authoritative on the client,
* that is: will listing/status operations only be handled by the metastore,
* with no fallback to S3.
* @param p path
* @return true iff the path is authoritative on the client.
*/
boolean allowAuthoritative(Path p);
}

View File

@ -116,8 +116,7 @@ public Boolean execute() throws IOException {
// if we get here there is no directory at the destination.
// so create one.
String key = getStoreContext().pathToKey(dir);
// this will create the marker file, delete the parent entries
// and update S3Guard
// Create the marker file, maybe delete the parent entries
callbacks.createFakeDirectory(key);
return true;
}

View File

@ -20,50 +20,27 @@
import java.io.IOException;
import java.nio.file.AccessDeniedException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.Objects;
import java.util.function.Function;
import java.util.stream.Collectors;
import com.amazonaws.services.s3.model.DeleteObjectsRequest;
import com.amazonaws.services.s3.model.MultiObjectDeleteException;
import org.apache.hadoop.classification.VisibleForTesting;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.commons.lang3.tuple.Triple;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.AWSS3IOException;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
import org.apache.hadoop.fs.s3a.s3guard.PathMetadata;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
/**
* Support for Multi Object Deletion.
* This is used to be a complex piece of code as it was required to
* update s3guard.
* Now all that is left is the exception extraction for better
* reporting,
*/
public final class MultiObjectDeleteSupport extends AbstractStoreOperation {
public final class MultiObjectDeleteSupport {
private static final Logger LOG = LoggerFactory.getLogger(
MultiObjectDeleteSupport.class);
private final BulkOperationState operationState;
/**
* Initiate with a store context.
* @param context store context.
* @param operationState any ongoing bulk operation.
*/
public MultiObjectDeleteSupport(final StoreContext context,
final BulkOperationState operationState) {
super(context);
this.operationState = operationState;
private MultiObjectDeleteSupport() {
}
/**
@ -89,7 +66,7 @@ public static IOException translateDeleteException(
final MultiObjectDeleteException deleteException) {
List<MultiObjectDeleteException.DeleteError> errors
= deleteException.getErrors();
LOG.warn("Bulk delete operation failed to delete all objects;"
LOG.info("Bulk delete operation failed to delete all objects;"
+ " failure count = {}",
errors.size());
final StringBuilder result = new StringBuilder(
@ -104,7 +81,7 @@ public static IOException translateDeleteException(
? (" (" + error.getVersionId() + ")")
: ""),
error.getMessage());
LOG.warn(item);
LOG.info(item);
result.append(item);
if (exitCode == null || exitCode.isEmpty() || ACCESS_DENIED.equals(code)) {
exitCode = code;
@ -117,301 +94,4 @@ public static IOException translateDeleteException(
return new AWSS3IOException(result.toString(), deleteException);
}
}
/**
* Process a multi object delete exception by building two paths from
* the delete request: one of all deleted files, one of all undeleted values.
* The latter are those rejected in the delete call.
* @param deleteException the delete exception.
* @param keysToDelete the keys in the delete request
* @return tuple of (undeleted, deleted) paths.
*/
public Pair<List<KeyPath>, List<KeyPath>> splitUndeletedKeys(
final MultiObjectDeleteException deleteException,
final Collection<DeleteObjectsRequest.KeyVersion> keysToDelete) {
LOG.debug("Processing delete failure; keys to delete count = {};"
+ " errors in exception {}; successful deletions = {}",
keysToDelete.size(),
deleteException.getErrors().size(),
deleteException.getDeletedObjects().size());
// convert the collection of keys being deleted into paths
final List<KeyPath> pathsBeingDeleted = keysToKeyPaths(keysToDelete);
// Take this ist of paths
// extract all undeleted entries contained in the exception and
// then remove them from the original list.
List<KeyPath> undeleted = removeUndeletedPaths(deleteException,
pathsBeingDeleted,
getStoreContext()::keyToPath);
return Pair.of(undeleted, pathsBeingDeleted);
}
/**
* Given a list of delete requests, convert them all to paths.
* @param keysToDelete list of keys for the delete operation.
* @return the paths.
*/
public List<Path> keysToPaths(
final Collection<DeleteObjectsRequest.KeyVersion> keysToDelete) {
return toPathList(keysToKeyPaths(keysToDelete));
}
/**
* Given a list of delete requests, convert them all to keypaths.
* @param keysToDelete list of keys for the delete operation.
* @return list of keypath entries
*/
public List<KeyPath> keysToKeyPaths(
final Collection<DeleteObjectsRequest.KeyVersion> keysToDelete) {
return convertToKeyPaths(keysToDelete,
getStoreContext()::keyToPath);
}
/**
* Given a list of delete requests, convert them all to paths.
* @param keysToDelete list of keys for the delete operation.
* @param qualifier path qualifier
* @return the paths.
*/
public static List<KeyPath> convertToKeyPaths(
final Collection<DeleteObjectsRequest.KeyVersion> keysToDelete,
final Function<String, Path> qualifier) {
List<KeyPath> l = new ArrayList<>(keysToDelete.size());
for (DeleteObjectsRequest.KeyVersion kv : keysToDelete) {
String key = kv.getKey();
Path p = qualifier.apply(key);
boolean isDir = key.endsWith("/");
l.add(new KeyPath(key, p, isDir));
}
return l;
}
/**
* Process a delete failure by removing from the metastore all entries
* which where deleted, as inferred from the delete failures exception
* and the original list of files to delete declares to have been deleted.
* @param deleteException the delete exception.
* @param keysToDelete collection of keys which had been requested.
* @param retainedMarkers list built up of retained markers.
* @return a tuple of (undeleted, deleted, failures)
*/
public Triple<List<Path>, List<Path>, List<Pair<Path, IOException>>>
processDeleteFailure(
final MultiObjectDeleteException deleteException,
final List<DeleteObjectsRequest.KeyVersion> keysToDelete,
final List<Path> retainedMarkers) {
final MetadataStore metadataStore =
checkNotNull(getStoreContext().getMetadataStore(),
"context metadatastore");
final List<Pair<Path, IOException>> failures = new ArrayList<>();
final Pair<List<KeyPath>, List<KeyPath>> outcome =
splitUndeletedKeys(deleteException, keysToDelete);
List<KeyPath> deleted = outcome.getRight();
List<Path> deletedPaths = new ArrayList<>();
List<KeyPath> undeleted = outcome.getLeft();
retainedMarkers.clear();
List<Path> undeletedPaths = toPathList((List<KeyPath>) undeleted);
// sort shorter keys first,
// so that if the left key is longer than the first it is considered
// smaller, so appears in the list first.
// thus when we look for a dir being empty, we know it holds
deleted.sort((l, r) -> r.getKey().length() - l.getKey().length());
// now go through and delete from S3Guard all paths listed in
// the result which are either files or directories with
// no children.
deleted.forEach(kp -> {
Path path = kp.getPath();
try{
boolean toDelete = true;
if (kp.isDirectoryMarker()) {
// its a dir marker, which could be an empty dir
// (which is then tombstoned), or a non-empty dir, which
// is not tombstoned.
// for this to be handled, we have to have removed children
// from the store first, which relies on the sort
PathMetadata pmentry = metadataStore.get(path, true);
if (pmentry != null && !pmentry.isDeleted()) {
toDelete = pmentry.getFileStatus().isEmptyDirectory()
== Tristate.TRUE;
} else {
toDelete = false;
}
}
if (toDelete) {
LOG.debug("Removing deleted object from S3Guard Store {}", path);
metadataStore.delete(path, operationState);
} else {
LOG.debug("Retaining S3Guard directory entry {}", path);
retainedMarkers.add(path);
}
} catch (IOException e) {
// trouble: we failed to delete the far end entry
// try with the next one.
// if this is a big network failure, this is going to be noisy.
LOG.warn("Failed to update S3Guard store with deletion of {}", path);
failures.add(Pair.of(path, e));
}
// irrespective of the S3Guard outcome, it is declared as deleted, as
// it is no longer in the S3 store.
deletedPaths.add(path);
});
if (LOG.isDebugEnabled()) {
undeleted.forEach(p -> LOG.debug("Deleted {}", p));
}
return Triple.of(undeletedPaths, deletedPaths, failures);
}
/**
* Given a list of keypaths, convert to a list of paths.
* @param keyPaths source list
* @return a listg of paths
*/
public static List<Path> toPathList(final List<KeyPath> keyPaths) {
return keyPaths.stream()
.map(KeyPath::getPath)
.collect(Collectors.toList());
}
/**
* Build a list of undeleted paths from a {@code MultiObjectDeleteException}.
* Outside of unit tests, the qualifier function should be
* {@link S3AFileSystem#keyToQualifiedPath(String)}.
* @param deleteException the delete exception.
* @param qualifierFn function to qualify paths
* @return the possibly empty list of paths.
*/
@VisibleForTesting
public static List<Path> extractUndeletedPaths(
final MultiObjectDeleteException deleteException,
final Function<String, Path> qualifierFn) {
return toPathList(extractUndeletedKeyPaths(deleteException, qualifierFn));
}
/**
* Build a list of undeleted paths from a {@code MultiObjectDeleteException}.
* Outside of unit tests, the qualifier function should be
* {@link S3AFileSystem#keyToQualifiedPath(String)}.
* @param deleteException the delete exception.
* @param qualifierFn function to qualify paths
* @return the possibly empty list of paths.
*/
@VisibleForTesting
public static List<KeyPath> extractUndeletedKeyPaths(
final MultiObjectDeleteException deleteException,
final Function<String, Path> qualifierFn) {
List<MultiObjectDeleteException.DeleteError> errors
= deleteException.getErrors();
return errors.stream()
.map((error) -> {
String key = error.getKey();
Path path = qualifierFn.apply(key);
boolean isDir = key.endsWith("/");
return new KeyPath(key, path, isDir);
})
.collect(Collectors.toList());
}
/**
* Process a {@code MultiObjectDeleteException} by
* removing all undeleted paths from the list of paths being deleted.
* The original list is updated, and so becomes the list of successfully
* deleted paths.
* @param deleteException the delete exception.
* @param pathsBeingDeleted list of paths which were being deleted.
* This has all undeleted paths removed, leaving only those deleted.
* @return the list of undeleted entries
*/
@VisibleForTesting
static List<KeyPath> removeUndeletedPaths(
final MultiObjectDeleteException deleteException,
final Collection<KeyPath> pathsBeingDeleted,
final Function<String, Path> qualifier) {
// get the undeleted values
List<KeyPath> undeleted = extractUndeletedKeyPaths(deleteException,
qualifier);
// and remove them from the undeleted list, matching on key
for (KeyPath undel : undeleted) {
pathsBeingDeleted.removeIf(kp -> kp.getPath().equals(undel.getPath()));
}
return undeleted;
}
/**
* A delete operation failed.
* Currently just returns the list of all paths.
* @param ex exception.
* @param keysToDelete the keys which were being deleted.
* @return all paths which were not deleted.
*/
public List<Path> processDeleteFailureGenericException(Exception ex,
final List<DeleteObjectsRequest.KeyVersion> keysToDelete) {
return keysToPaths(keysToDelete);
}
/**
* Representation of a (key, path) which couldn't be deleted;
* the dir marker flag is inferred from the key suffix.
* <p>
* Added because Pairs of Lists of Triples was just too complex
* for Java code.
* </p>
*/
public static final class KeyPath {
/** Key in bucket. */
private final String key;
/** Full path. */
private final Path path;
/** Is this a directory marker? */
private final boolean directoryMarker;
public KeyPath(final String key,
final Path path,
final boolean directoryMarker) {
this.key = key;
this.path = path;
this.directoryMarker = directoryMarker;
}
public String getKey() {
return key;
}
public Path getPath() {
return path;
}
public boolean isDirectoryMarker() {
return directoryMarker;
}
@Override
public String toString() {
return "KeyPath{" +
"key='" + key + '\'' +
", path=" + path +
", directoryMarker=" + directoryMarker +
'}';
}
/**
* Equals test is on key alone.
*/
@Override
public boolean equals(final Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
KeyPath keyPath = (KeyPath) o;
return key.equals(keyPath.key);
}
@Override
public int hashCode() {
return Objects.hash(key);
}
}
}

View File

@ -118,7 +118,7 @@ void configureSocketFactory(ClientConfiguration awsConf,
* See also {@code com.amazonaws.services.s3.model.Region.fromValue()}
* for its conversion logic.
* @param region region from S3 call.
* @return the region to use in DDB etc.
* @return the region to use in AWS services.
*/
public static String fixBucketRegion(final String region) {
return region == null || region.equals("US")

View File

@ -37,7 +37,6 @@
import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
import org.apache.hadoop.fs.s3a.S3AReadOpContext;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
/**
* These are all the callbacks which the {@link RenameOperation}
@ -87,21 +86,18 @@ S3AReadOpContext createReadContext(
void finishRename(Path sourceRenamed, Path destCreated) throws IOException;
/**
* Delete an object, also updating the metastore.
* Delete an object.
* This call does <i>not</i> create any mock parent entries.
* Retry policy: retry untranslated; delete considered idempotent.
* @param path path to delete
* @param key key of entry
* @param isFile is the path a file (used for instrumentation only)
* @param operationState (nullable) operational state for a bulk update
* @throws AmazonClientException problems working with S3
* @throws IOException IO failure in the metastore
* @throws IOException from invoker signature only -should not be raised.
*/
@Retries.RetryTranslated
void deleteObjectAtPath(Path path,
String key,
boolean isFile,
BulkOperationState operationState)
boolean isFile)
throws IOException;
/**
@ -109,7 +105,6 @@ void deleteObjectAtPath(Path path,
*
* @param path path to list from
* @param status optional status of path to list.
* @param collectTombstones should tombstones be collected from S3Guard?
* @param includeSelf should the listing include this path if present?
* @return an iterator.
* @throws IOException failure
@ -118,7 +113,6 @@ void deleteObjectAtPath(Path path,
RemoteIterator<S3ALocatedFileStatus> listFilesAndDirectoryMarkers(
Path path,
S3AFileStatus status,
boolean collectTombstones,
boolean includeSelf) throws IOException;
/**
@ -140,17 +134,10 @@ CopyResult copyFile(String srcKey,
throws IOException;
/**
* Remove keys from the store, updating the metastore on a
* partial delete represented as a MultiObjectDeleteException failure by
* deleting all those entries successfully deleted and then rethrowing
* the MultiObjectDeleteException.
* Remove keys from the store.
* @param keysToDelete collection of keys to delete on the s3-backend.
* if empty, no request is made of the object store.
* @param deleteFakeDir indicates whether this is for deleting fake dirs.
* @param undeletedObjectsOnFailure List which will be built up of all
* files that were not deleted. This happens even as an exception
* is raised.
* @param operationState bulk operation state
* @param quiet should a bulk query be quiet, or should its result list
* all deleted keys
* @return the deletion result if a multi object delete was invoked
@ -162,28 +149,16 @@ CopyResult copyFile(String srcKey,
* @throws AmazonClientException amazon-layer failure.
* @throws IOException other IO Exception.
*/
@Retries.RetryMixed
@Retries.RetryRaw
DeleteObjectsResult removeKeys(
List<DeleteObjectsRequest.KeyVersion> keysToDelete,
boolean deleteFakeDir,
List<Path> undeletedObjectsOnFailure,
BulkOperationState operationState,
boolean quiet)
throws MultiObjectDeleteException, AmazonClientException,
IOException;
/**
* Is the path for this instance considered authoritative on the client,
* that is: will listing/status operations only be handled by the metastore,
* with no fallback to S3.
* @param p path
* @return true iff the path is authoritative on the client.
*/
boolean allowAuthoritative(Path p);
/**
* Create an iterator over objects in S3 only; S3Guard
* is not involved.
* Create an iterator over objects in S3.
* The listing includes the key itself, if found.
* @param path path of the listing.
* @param key object key

View File

@ -26,14 +26,14 @@
import java.util.concurrent.atomic.AtomicLong;
import com.amazonaws.AmazonClientException;
import com.amazonaws.SdkBaseException;
import com.amazonaws.services.s3.model.DeleteObjectsRequest;
import com.amazonaws.services.s3.transfer.model.CopyResult;
import org.apache.hadoop.util.Lists;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.Invoker;
import org.apache.hadoop.fs.s3a.RenameFailedException;
import org.apache.hadoop.fs.s3a.Retries;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
@ -41,23 +41,17 @@
import org.apache.hadoop.fs.s3a.S3AReadOpContext;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
import org.apache.hadoop.fs.s3a.s3guard.RenameTracker;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.OperationDuration;
import static org.apache.hadoop.fs.s3a.S3AUtils.translateException;
import static org.apache.hadoop.fs.store.audit.AuditingFunctions.callableWithinAuditSpan;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.s3a.Constants.FS_S3A_BLOCK_SIZE;
import static org.apache.hadoop.fs.s3a.S3AUtils.objectRepresentsDirectory;
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.submit;
import static org.apache.hadoop.fs.s3a.impl.CallableSupplier.waitForCompletion;
import static org.apache.hadoop.fs.s3a.impl.InternalConstants.DEFAULT_BLOCKSIZE;
import static org.apache.hadoop.fs.s3a.impl.InternalConstants.RENAME_PARALLEL_LIMIT;
/**
* A parallelized rename operation which updates the metastore in the
* process, through whichever {@link RenameTracker} the store provides.
* A parallelized rename operation.
* <p></p>
* The parallel execution is in groups of size
* {@link InternalConstants#RENAME_PARALLEL_LIMIT}; it is only
@ -68,9 +62,6 @@
* is initiated.
* If it succeeds, the rename continues with the next group of files.
* <p></p>
* The RenameTracker has the task of keeping the metastore up to date
* as the rename proceeds.
* <p></p>
* Directory Markers which have child entries are never copied; only those
* which represent empty directories are copied in the rename.
* The {@link DirMarkerTracker} tracks which markers must be copied, and
@ -121,11 +112,6 @@ public class RenameOperation extends ExecutingStoreOperation<Long> {
*/
private final int pageSize;
/**
* Rename tracker.
*/
private RenameTracker renameTracker;
/**
* List of active copies.
*/
@ -138,14 +124,6 @@ public class RenameOperation extends ExecutingStoreOperation<Long> {
private final List<DeleteObjectsRequest.KeyVersion> keysToDelete =
new ArrayList<>();
/**
* List of paths to delete, which will be passed to the rename
* tracker after the deletion succeeds.
*/
private final List<Path> pathsToDelete = new ArrayList<>();
private final long blocksize;
/**
* Initiate the rename.
*
@ -177,8 +155,6 @@ public RenameOperation(
this.destKey = destKey;
this.destStatus = destStatus;
this.callbacks = callbacks;
blocksize = storeContext.getConfiguration()
.getLongBytes(FS_S3A_BLOCK_SIZE, DEFAULT_BLOCKSIZE);
this.pageSize = pageSize;
}
@ -212,13 +188,6 @@ private void completeActiveCopies(String reason) throws IOException {
* Only queuing objects here whose copy operation has
* been submitted and so is in that thread pool.
* </li>
* <li>
* If a path is supplied, then after the delete is executed
* (and completes) the rename tracker from S3Guard will be
* told of its deletion. Do not set this for directory
* markers with children, as it may mistakenly add
* tombstones into the table.
* </li>
* </ol>
* This method must only be called from the primary thread.
* @param path path to the object.
@ -226,9 +195,6 @@ private void completeActiveCopies(String reason) throws IOException {
*/
private void queueToDelete(Path path, String key) {
LOG.debug("Queueing to delete {}", path);
if (path != null) {
pathsToDelete.add(path);
}
keysToDelete.add(new DeleteObjectsRequest.KeyVersion(key));
}
@ -272,28 +238,15 @@ private void completeActiveCopiesAndDeleteSources(String reason)
throws IOException {
completeActiveCopies(reason);
removeSourceObjects(
keysToDelete,
pathsToDelete);
keysToDelete
);
// now reset the lists.
keysToDelete.clear();
pathsToDelete.clear();
}
@Retries.RetryMixed
public Long execute() throws IOException {
executeOnlyOnce();
final StoreContext storeContext = getStoreContext();
final MetadataStore metadataStore = checkNotNull(
storeContext.getMetadataStore(),
"No metadata store in context");
// Validation completed: time to begin the operation.
// The store-specific rename tracker is used to keep the store
// to date with the in-progress operation.
// for the null store, these are all no-ops.
renameTracker = metadataStore.initiateRenameOperation(
storeContext,
sourcePath, sourceStatus, destPath);
// The path to whichever file or directory is created by the
// rename. When deleting markers all parents of
@ -317,21 +270,12 @@ public Long execute() throws IOException {
try {
completeActiveCopies("failure handling");
} catch (IOException e) {
// a failure to update the metastore after a rename failure is what
// we'd see on a network problem, expired credentials and other
// unrecoverable errors.
// Downgrading to warn because an exception is already
// about to be thrown.
LOG.warn("While completing all active copies", e);
}
// notify the rename tracker of the failure
throw renameTracker.renameFailed(ex);
throw convertToIOException(ex);
}
// At this point the rename has completed successfully in the S3 store.
// Tell the metastore this fact and let it complete its changes
renameTracker.completeRename();
callbacks.finishRename(sourcePath, destCreated);
return bytesCopied.get();
}
@ -362,19 +306,15 @@ protected Path renameFileToDest() throws IOException {
// destination either does not exist or is a file to overwrite.
LOG.debug("rename: renaming file {} to {}", sourcePath,
copyDestinationPath);
copySourceAndUpdateTracker(
sourcePath,
copySource(
sourceKey,
sourceAttributes,
readContext,
copyDestinationPath,
copyDestinationKey,
false);
copyDestinationKey);
bytesCopied.addAndGet(sourceStatus.getLen());
// delete the source
callbacks.deleteObjectAtPath(sourcePath, sourceKey, true, null);
// and update the tracker
renameTracker.sourceObjectsDeleted(Lists.newArrayList(sourcePath));
callbacks.deleteObjectAtPath(sourcePath, sourceKey, true);
return copyDestinationPath;
}
@ -402,15 +342,12 @@ protected void recursiveDirectoryRename() throws IOException {
if (destStatus != null
&& destStatus.isEmptyDirectory() == Tristate.TRUE) {
// delete unnecessary fake directory at the destination.
// this MUST be done before anything else so that
// rollback code doesn't get confused and insert a tombstone
// marker.
LOG.debug("Deleting fake directory marker at destination {}",
destStatus.getPath());
// Although the dir marker policy doesn't always need to do this,
// it's simplest just to be consistent here.
// note: updates the metastore as well a S3.
callbacks.deleteObjectAtPath(destStatus.getPath(), dstKey, false, null);
callbacks.deleteObjectAtPath(destStatus.getPath(), dstKey, false);
}
Path parentPath = storeContext.keyToPath(srcKey);
@ -423,7 +360,6 @@ protected void recursiveDirectoryRename() throws IOException {
final RemoteIterator<S3ALocatedFileStatus> iterator =
callbacks.listFilesAndDirectoryMarkers(parentPath,
sourceStatus,
true,
true);
while (iterator.hasNext()) {
// get the next entry in the listing.
@ -464,7 +400,7 @@ protected void recursiveDirectoryRename() throws IOException {
queueToDelete(childSourcePath, key);
// now begin the single copy
CompletableFuture<Path> copy = initiateCopy(child, key,
childSourcePath, newDestKey, childDestPath);
newDestKey, childDestPath);
activeCopies.add(copy);
bytesCopied.addAndGet(sourceStatus.getLen());
}
@ -483,9 +419,6 @@ protected void recursiveDirectoryRename() throws IOException {
// have been deleted.
completeActiveCopiesAndDeleteSources("final copy and delete");
// We moved all the children, now move the top-level dir
// Empty directory should have been added as the object summary
renameTracker.moveSourceDirectory();
}
/**
@ -511,7 +444,6 @@ private void endOfLoopActions() throws IOException {
/**
* Process all directory markers at the end of the rename.
* All leaf markers are queued to be copied in the store;
* this updates the metastore tracker as it does so.
* <p></p>
* Why not simply create new markers? All the metadata
* gets copied too, so if there was anything relevant then
@ -553,7 +485,6 @@ private OperationDuration copyEmptyDirectoryMarkers(
"copying %d leaf markers with %d surplus not copied",
leafMarkers.size(), surplus.size());
for (DirMarkerTracker.Marker entry: leafMarkers.values()) {
Path source = entry.getPath();
String key = entry.getKey();
String newDestKey =
dstKey + key.substring(srcKey.length());
@ -564,7 +495,6 @@ private OperationDuration copyEmptyDirectoryMarkers(
initiateCopy(
entry.getStatus(),
key,
source,
newDestKey,
childDestPath));
queueToDelete(entry);
@ -579,7 +509,6 @@ private OperationDuration copyEmptyDirectoryMarkers(
* Initiate a copy operation in the executor.
* @param source status of the source object.
* @param key source key
* @param childSourcePath source as a path.
* @param newDestKey destination key
* @param childDestPath destination path.
* @return the future.
@ -587,7 +516,6 @@ private OperationDuration copyEmptyDirectoryMarkers(
protected CompletableFuture<Path> initiateCopy(
final S3ALocatedFileStatus source,
final String key,
final Path childSourcePath,
final String newDestKey,
final Path childDestPath) {
S3ObjectAttributes sourceAttributes =
@ -599,113 +527,67 @@ protected CompletableFuture<Path> initiateCopy(
// queue the copy operation for execution in the thread pool
return submit(getStoreContext().getExecutor(),
callableWithinAuditSpan(getAuditSpan(), () ->
copySourceAndUpdateTracker(
childSourcePath,
copySource(
key,
sourceAttributes,
callbacks.createReadContext(source),
childDestPath,
newDestKey,
true)));
newDestKey)));
}
/**
* This invoked to copy a file or directory marker then update the
* rename operation on success.
* This is invoked to copy a file or directory marker.
* It may be called in its own thread.
* @param sourceFile source path of the copy; may have a trailing / on it.
* @param srcKey source key
* @param srcAttributes status of the source object
* @param destination destination as a qualified path.
* @param destinationKey destination key
* @param addAncestors should ancestors be added to the metastore?
* @return the destination path.
* @throws IOException failure
*/
@Retries.RetryTranslated
private Path copySourceAndUpdateTracker(
final Path sourceFile,
private Path copySource(
final String srcKey,
final S3ObjectAttributes srcAttributes,
final S3AReadOpContext readContext,
final Path destination,
final String destinationKey,
final boolean addAncestors) throws IOException {
final String destinationKey) throws IOException {
long len = srcAttributes.getLen();
CopyResult copyResult;
try (DurationInfo ignored = new DurationInfo(LOG, false,
"Copy file from %s to %s (length=%d)", srcKey, destinationKey, len)) {
copyResult = callbacks.copyFile(srcKey, destinationKey,
callbacks.copyFile(srcKey, destinationKey,
srcAttributes, readContext);
}
if (objectRepresentsDirectory(srcKey)) {
renameTracker.directoryMarkerCopied(
sourceFile,
destination,
addAncestors);
} else {
S3ObjectAttributes destAttributes = new S3ObjectAttributes(
destination,
copyResult,
srcAttributes.getServerSideEncryptionAlgorithm(),
srcAttributes.getServerSideEncryptionKey(),
len);
renameTracker.fileCopied(
sourceFile,
srcAttributes,
destAttributes,
destination,
blocksize,
addAncestors);
}
return destination;
}
/**
* Remove source objects and update the metastore by way of
* the rename tracker.
* Remove source objects.
* @param keys list of keys to delete
* @param paths list of paths matching the keys to delete 1:1.
* @throws IOException failure
*/
@Retries.RetryTranslated
private void removeSourceObjects(
final List<DeleteObjectsRequest.KeyVersion> keys,
final List<Path> paths)
final List<DeleteObjectsRequest.KeyVersion> keys)
throws IOException {
List<Path> undeletedObjects = new ArrayList<>();
try {
// remove the keys
// remove the keys
// list what is being deleted for the interest of anyone
// who is trying to debug why objects are no longer there.
if (LOG.isDebugEnabled()) {
LOG.debug("Initiating delete operation for {} objects", keys.size());
for (DeleteObjectsRequest.KeyVersion key : keys) {
LOG.debug(" {} {}", key.getKey(),
key.getVersion() != null ? key.getVersion() : "");
}
// list what is being deleted for the interest of anyone
// who is trying to debug why objects are no longer there.
if (LOG.isDebugEnabled()) {
LOG.debug("Initiating delete operation for {} objects", keys.size());
for (DeleteObjectsRequest.KeyVersion key : keys) {
LOG.debug(" {} {}", key.getKey(),
key.getVersion() != null ? key.getVersion() : "");
}
// this will update the metastore on a failure, but on
// a successful operation leaves the store as is.
callbacks.removeKeys(
keys,
false,
undeletedObjects,
renameTracker.getOperationState(),
true);
// and clear the list.
} catch (AmazonClientException | IOException e) {
// Failed.
// Notify the rename tracker.
// removeKeys will have already purged the metastore of
// all keys it has known to delete; this is just a final
// bit of housekeeping and a chance to tune exception
// reporting.
// The returned IOE is rethrown.
throw renameTracker.deleteFailed(e, paths, undeletedObjects);
}
renameTracker.sourceObjectsDeleted(paths);
Invoker.once("rename " + sourcePath + " to " + destPath,
sourcePath.toString(), () ->
callbacks.removeKeys(
keys,
false,
true));
}
/**
@ -724,4 +606,22 @@ private String maybeAddTrailingSlash(String key) {
}
}
/**
* Convert a passed in exception (expected to be an IOE or AWS exception)
* into an IOException.
* @param ex exception caught
* @return the exception to throw in the failure handler.
*/
protected IOException convertToIOException(final Exception ex) {
if (ex instanceof IOException) {
return (IOException) ex;
} else if (ex instanceof SdkBaseException) {
return translateException("rename " + sourcePath + " to " + destPath,
sourcePath.toString(),
(SdkBaseException) ex);
} else {
// should never happen, but for completeness
return new IOException(ex);
}
}
}

View File

@ -38,13 +38,12 @@
import com.amazonaws.services.s3.model.PartETag;
import com.amazonaws.services.s3.model.UploadPartRequest;
import com.amazonaws.services.s3.model.UploadPartResult;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.thirdparty.com.google.common.base.Charsets;
import org.apache.hadoop.util.Preconditions;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.fs.BBPartHandle;
import org.apache.hadoop.fs.BBUploadHandle;
import org.apache.hadoop.fs.PartHandle;
@ -55,8 +54,8 @@
import org.apache.hadoop.fs.impl.AbstractMultipartUploader;
import org.apache.hadoop.fs.s3a.WriteOperations;
import org.apache.hadoop.fs.s3a.statistics.S3AMultipartUploaderStatistics;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
import org.apache.hadoop.fs.statistics.IOStatistics;
import org.apache.hadoop.util.Preconditions;
import static org.apache.hadoop.fs.statistics.IOStatisticsLogging.ioStatisticsToString;
@ -80,16 +79,6 @@ class S3AMultipartUploader extends AbstractMultipartUploader {
private final S3AMultipartUploaderStatistics statistics;
/**
* Bulk state; demand created and then retained.
*/
private BulkOperationState operationState;
/**
* Was an operation state requested but not returned?
*/
private boolean noOperationState;
/**
* Instatiate; this is called by the builder.
* @param builder builder
@ -109,14 +98,6 @@ class S3AMultipartUploader extends AbstractMultipartUploader {
this.statistics = Objects.requireNonNull(statistics);
}
@Override
public void close() throws IOException {
if (operationState != null) {
operationState.close();
}
super.close();
}
@Override
public IOStatistics getIOStatistics() {
return statistics.getIOStatistics();
@ -133,22 +114,6 @@ public String toString() {
return sb.toString();
}
/**
* Retrieve the operation state; create one on demand if needed
* <i>and there has been no unsuccessful attempt to create one.</i>
* @return an active operation state.
* @throws IOException failure
*/
private synchronized BulkOperationState retrieveOperationState()
throws IOException {
if (operationState == null && !noOperationState) {
operationState = writeOperations.initiateOperation(getBasePath(),
BulkOperationState.OperationType.Upload);
noOperationState = operationState != null;
}
return operationState;
}
@Override
public CompletableFuture<UploadHandle> startUpload(
final Path filePath)
@ -238,7 +203,6 @@ public CompletableFuture<PathHandle> complete(
"Duplicate PartHandles");
// retrieve/create operation state for scalability of completion.
final BulkOperationState state = retrieveOperationState();
long finalLen = totalLength;
return context.submit(new CompletableFuture<>(),
() -> {
@ -247,8 +211,8 @@ public CompletableFuture<PathHandle> complete(
key,
uploadIdStr,
eTags,
finalLen,
state);
finalLen
);
byte[] eTag = result.getETag().getBytes(Charsets.UTF_8);
statistics.uploadCompleted();

View File

@ -40,8 +40,6 @@
import org.apache.hadoop.fs.s3a.S3AStorageStatistics;
import org.apache.hadoop.fs.s3a.Statistic;
import org.apache.hadoop.fs.s3a.statistics.S3AStatisticsContext;
import org.apache.hadoop.fs.s3a.s3guard.ITtlTimeProvider;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
import org.apache.hadoop.fs.store.audit.ActiveThreadSpanSource;
import org.apache.hadoop.fs.store.audit.AuditSpan;
import org.apache.hadoop.fs.store.audit.AuditSpanSource;
@ -111,21 +109,8 @@ public class StoreContext implements ActiveThreadSpanSource<AuditSpan> {
/** List algorithm. */
private final boolean useListV1;
/**
* To allow this context to be passed down to the metastore, this field
* wll be null until initialized.
*/
private final MetadataStore metadataStore;
private final ContextAccessors contextAccessors;
/**
* Source of time.
*/
/** Time source for S3Guard TTLs. */
private final ITtlTimeProvider timeProvider;
/** Operation Auditor. */
private final AuditSpanSource<AuditSpanS3A> auditor;
@ -149,10 +134,8 @@ public class StoreContext implements ActiveThreadSpanSource<AuditSpan> {
final S3AInputPolicy inputPolicy,
final ChangeDetectionPolicy changeDetectionPolicy,
final boolean multiObjectDeleteEnabled,
final MetadataStore metadataStore,
final boolean useListV1,
final ContextAccessors contextAccessors,
final ITtlTimeProvider timeProvider,
final AuditSpanSource<AuditSpanS3A> auditor,
final boolean isCSEEnabled) {
this.fsURI = fsURI;
@ -171,10 +154,8 @@ public class StoreContext implements ActiveThreadSpanSource<AuditSpan> {
this.inputPolicy = inputPolicy;
this.changeDetectionPolicy = changeDetectionPolicy;
this.multiObjectDeleteEnabled = multiObjectDeleteEnabled;
this.metadataStore = metadataStore;
this.useListV1 = useListV1;
this.contextAccessors = contextAccessors;
this.timeProvider = timeProvider;
this.auditor = auditor;
this.isCSEEnabled = isCSEEnabled;
}
@ -224,10 +205,6 @@ public boolean isMultiObjectDeleteEnabled() {
return multiObjectDeleteEnabled;
}
public MetadataStore getMetadataStore() {
return metadataStore;
}
public boolean isUseListV1() {
return useListV1;
}
@ -368,14 +345,6 @@ public String getBucketLocation() throws IOException {
return contextAccessors.getBucketLocation();
}
/**
* Get the time provider.
* @return the time source.
*/
public ITtlTimeProvider getTimeProvider() {
return timeProvider;
}
/**
* Build the full S3 key for a request from the status entry,
* possibly adding a "/" if it represents directory and it does

View File

@ -27,8 +27,6 @@
import org.apache.hadoop.fs.s3a.S3AStorageStatistics;
import org.apache.hadoop.fs.s3a.audit.AuditSpanS3A;
import org.apache.hadoop.fs.s3a.statistics.S3AStatisticsContext;
import org.apache.hadoop.fs.s3a.s3guard.ITtlTimeProvider;
import org.apache.hadoop.fs.s3a.s3guard.MetadataStore;
import org.apache.hadoop.fs.store.audit.AuditSpanSource;
import org.apache.hadoop.security.UserGroupInformation;
@ -63,14 +61,10 @@ public class StoreContextBuilder {
private boolean multiObjectDeleteEnabled = true;
private MetadataStore metadataStore;
private boolean useListV1 = false;
private ContextAccessors contextAccessors;
private ITtlTimeProvider timeProvider;
private AuditSpanSource<AuditSpanS3A> auditor;
private boolean isCSEEnabled;
@ -147,12 +141,6 @@ public StoreContextBuilder setMultiObjectDeleteEnabled(
return this;
}
public StoreContextBuilder setMetadataStore(
final MetadataStore store) {
this.metadataStore = store;
return this;
}
public StoreContextBuilder setUseListV1(
final boolean useV1) {
this.useListV1 = useV1;
@ -165,12 +153,6 @@ public StoreContextBuilder setContextAccessors(
return this;
}
public StoreContextBuilder setTimeProvider(
final ITtlTimeProvider provider) {
this.timeProvider = provider;
return this;
}
/**
* Set builder value.
* @param value new value
@ -193,7 +175,6 @@ public StoreContextBuilder setEnableCSE(
return this;
}
@SuppressWarnings("deprecation")
public StoreContext build() {
return new StoreContext(fsURI,
bucket,
@ -208,10 +189,8 @@ public StoreContext build() {
inputPolicy,
changeDetectionPolicy,
multiObjectDeleteEnabled,
metadataStore,
useListV1,
contextAccessors,
timeProvider,
auditor,
isCSEEnabled);
}

View File

@ -1,223 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.service.launcher.AbstractLaunchableService;
import org.apache.hadoop.service.launcher.ServiceLaunchException;
import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_FAIL;
import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_USAGE;
/**
* Entry point for S3Guard diagnostics operations against DynamoDB tables.
*/
public class AbstractS3GuardDynamoDBDiagnostic
extends AbstractLaunchableService {
private S3AFileSystem filesystem;
private DynamoDBMetadataStore store;
private URI uri;
private List<String> arguments;
/**
* Constructor.
* @param name entry point name.
*/
public AbstractS3GuardDynamoDBDiagnostic(final String name) {
super(name);
}
/**
* Constructor. If the store is set then that is the store for the operation,
* otherwise the filesystem's binding is used instead.
* @param name entry point name.
* @param filesystem filesystem
* @param store optional metastore.
* @param uri URI. Must be set if filesystem == null.
*/
public AbstractS3GuardDynamoDBDiagnostic(
final String name,
@Nullable final S3AFileSystem filesystem,
@Nullable final DynamoDBMetadataStore store,
@Nullable final URI uri) {
super(name);
this.store = store;
this.filesystem = filesystem;
if (store == null) {
require(filesystem != null, "No filesystem or URI");
bindStore(filesystem);
}
if (uri == null) {
require(filesystem != null, "No filesystem or URI");
setUri(filesystem.getUri());
} else {
setUri(uri);
}
}
/**
* Require a condition to hold, otherwise an exception is thrown.
* @param condition condition to be true
* @param error text on failure.
* @throws ServiceLaunchException if the condition is not met
*/
protected static void require(boolean condition, String error) {
if (!condition) {
throw failure(error);
}
}
/**
* Generate a failure exception for throwing.
* @param message message
* @param ex optional nested exception.
* @return an exception to throw
*/
protected static ServiceLaunchException failure(String message,
Throwable ex) {
return new ServiceLaunchException(EXIT_FAIL, message, ex);
}
/**
* Generate a failure exception for throwing.
* @param message message
* @return an exception to throw
*/
protected static ServiceLaunchException failure(String message) {
return new ServiceLaunchException(EXIT_FAIL, message);
}
@Override
public Configuration bindArgs(final Configuration config,
final List<String> args)
throws Exception {
this.arguments = args;
return super.bindArgs(config, args);
}
/**
* Get the argument list.
* @return the argument list.
*/
protected List<String> getArguments() {
return arguments;
}
/**
* Bind to the store from a CLI argument.
* @param fsURI filesystem URI
* @throws IOException failure
*/
protected void bindFromCLI(String fsURI)
throws IOException {
Configuration conf = getConfig();
setUri(fsURI);
FileSystem fs = FileSystem.get(getUri(), conf);
require(fs instanceof S3AFileSystem,
"Not an S3A Filesystem: " + fsURI);
filesystem = (S3AFileSystem) fs;
bindStore(filesystem);
setUri(fs.getUri());
}
/**
* Binds the {@link #store} field to the metastore of
* the filesystem -which must have a DDB metastore.
* @param fs filesystem to bind the store to.
*/
private void bindStore(final S3AFileSystem fs) {
require(fs.hasMetadataStore(),
"Filesystem has no metadata store: " + fs.getUri());
MetadataStore ms = fs.getMetadataStore();
require(ms instanceof DynamoDBMetadataStore,
"Filesystem " + fs.getUri()
+ " does not have a DynamoDB metadata store: " + ms);
store = (DynamoDBMetadataStore) ms;
}
protected DynamoDBMetadataStore getStore() {
return store;
}
public S3AFileSystem getFilesystem() {
return filesystem;
}
public URI getUri() {
return uri;
}
public void setUri(final URI uri) {
String fsURI = uri.toString();
if (!fsURI.endsWith("/")) {
setUri(fsURI);
} else {
this.uri = uri;
}
}
/**
* Set the URI from a string; will add a "/" if needed.
* @param fsURI filesystem URI.
* @throws RuntimeException if the fsURI parameter is not a valid URI.
*/
public void setUri(String fsURI) {
if (fsURI != null) {
if (!fsURI.endsWith("/")) {
fsURI += "/";
}
try {
setUri(new URI(fsURI));
} catch (URISyntaxException e) {
throw new RuntimeException(e);
}
}
}
/**
* Get the list of arguments, after validating the list size.
* @param argMin minimum number of entries.
* @param argMax maximum number of entries.
* @param usage Usage message.
* @return the argument list, which will be in the range.
* @throws ServiceLaunchException if the argument list is not valid.
*/
protected List<String> getArgumentList(final int argMin,
final int argMax,
final String usage) {
List<String> arg = getArguments();
if (arg == null || arg.size() < argMin || arg.size() > argMax) {
// no arguments: usage message
throw new ServiceLaunchException(EXIT_USAGE, usage);
}
return arg;
}
}

View File

@ -1,255 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.ArrayDeque;
import java.util.Collection;
import java.util.Queue;
import org.apache.hadoop.classification.VisibleForTesting;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathIOException;
import org.apache.hadoop.fs.s3a.impl.AbstractStoreOperation;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.service.launcher.LauncherExitCodes;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.ExitCodeProvider;
import org.apache.hadoop.util.ExitUtil;
import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_NOT_ACCEPTABLE;
/**
* Audit a directory tree for being authoritative.
* One aspect of the audit to be aware of: the root directory is
* always considered authoritative, even though, because there is no
* matching entry in any of the stores, it is not strictly true.
*/
public class AuthoritativeAuditOperation extends AbstractStoreOperation {
private static final Logger LOG = LoggerFactory.getLogger(
AuthoritativeAuditOperation.class);
/**
* Exception error code when a path is non-auth in the DB}.
*/
public static final int ERROR_ENTRY_NOT_AUTH_IN_DDB = EXIT_NOT_ACCEPTABLE;
/**
* Exception error code when a path is not configured to be
* auth in the S3A FS Config: {@value}.
*/
public static final int ERROR_PATH_NOT_AUTH_IN_FS = 5;
/**
* Exception error string: {@value}.
*/
public static final String E_NONAUTH
= "Directory is not marked as authoritative in the S3Guard store";
/** The metastore to audit. */
private final DynamoDBMetadataStore metastore;
/** require all directories to be authoritative. */
private final boolean requireAuthoritative;
/**
* Verbose switch.
*/
private final boolean verbose;
/**
* Constructor.
* @param storeContext store context.
* @param metastore metastore
* @param requireAuthoritative require all directories to be authoritative
* @param verbose verbose output
*/
public AuthoritativeAuditOperation(
final StoreContext storeContext,
final DynamoDBMetadataStore metastore,
final boolean requireAuthoritative,
final boolean verbose) {
super(storeContext);
this.metastore = metastore;
this.requireAuthoritative = requireAuthoritative;
this.verbose = verbose;
}
/**
* Examine the path metadata and verify that the dir is authoritative.
* @param md metadata.
* @param requireAuth require all directories to be authoritative
* @throws NonAuthoritativeDirException if it is !auth and requireAuth=true.
*/
private void verifyAuthDir(final DDBPathMetadata md,
final boolean requireAuth)
throws PathIOException {
final Path path = md.getFileStatus().getPath();
boolean isAuth = path.isRoot() || md.isAuthoritativeDir();
if (!isAuth && requireAuth) {
throw new NonAuthoritativeDirException(path);
}
}
/**
* Examine the path metadata, declare whether it should be queued for
* recursive scanning.
* @param md metadata.
* @return true if it is a dir to scan.
*/
private boolean isDirectory(PathMetadata md) {
return !md.getFileStatus().isFile();
}
/**
* Audit the tree.
* @param path qualified path to scan
* @return tuple(dirs scanned, nonauth dirs found)
* @throws IOException IO failure
* @throws ExitUtil.ExitException if a non-auth dir was found.
*/
public Pair<Integer, Integer> audit(Path path) throws IOException {
try (DurationInfo ignored =
new DurationInfo(LOG, "Audit %s", path)) {
return executeAudit(path, requireAuthoritative, true);
}
}
/**
* Audit the tree.
* This is the internal code which throws a NonAuthoritativePathException
* on failures; tests may use it.
* @param path path to scan
* @param requireAuth require all directories to be authoritative
* @param recursive recurse?
* @return tuple(dirs scanned, nonauth dirs found)
* @throws IOException IO failure
* @throws NonAuthoritativeDirException if a non-auth dir was found.
*/
@VisibleForTesting
Pair<Integer, Integer> executeAudit(
final Path path,
final boolean requireAuth,
final boolean recursive) throws IOException {
int dirs = 0;
int nonauth = 0;
final Queue<DDBPathMetadata> queue = new ArrayDeque<>();
final boolean isRoot = path.isRoot();
final DDBPathMetadata baseData = metastore.get(path);
if (baseData == null) {
throw new ExitUtil.ExitException(LauncherExitCodes.EXIT_NOT_FOUND,
"No S3Guard entry for path " + path);
}
if (isRoot || isDirectory(baseData)) {
// we have the root entry or an authoritative a directory
queue.add(baseData);
} else {
LOG.info("Path represents file");
return Pair.of(0, 0);
}
while (!queue.isEmpty()) {
dirs++;
final DDBPathMetadata dir = queue.poll();
final Path p = dir.getFileStatus().getPath();
LOG.debug("Directory {}", dir.prettyPrint());
// log a message about the dir state, with root treated specially
if (!p.isRoot()) {
if (!dir.isAuthoritativeDir()) {
LOG.warn("Directory {} is not authoritative", p);
nonauth++;
verifyAuthDir(dir, requireAuth);
} else {
LOG.info("Directory {}", p);
}
} else {
// this is done to avoid the confusing message about root not being
// authoritative
LOG.info("Root directory {}", p);
}
// list its children
if (recursive) {
final DirListingMetadata entry = metastore.listChildren(p);
if (entry != null) {
final Collection<PathMetadata> listing = entry.getListing();
int files = 0, subdirs = 0;
for (PathMetadata e : listing) {
if (isDirectory(e)) {
// queue for auditing
queue.add((DDBPathMetadata) e);
subdirs++;
} else {
files++;
}
}
if (verbose && files > 0 || subdirs > 0) {
LOG.info(" files {}; directories {}", files, subdirs);
}
} else {
LOG.info("Directory {} has been deleted", dir);
}
}
}
// end of scan
if (dirs == 1 && isRoot) {
LOG.info("The store has no directories to scan");
} else {
LOG.info("Scanned {} directories - {} were not marked as authoritative",
dirs, nonauth);
}
return Pair.of(dirs, nonauth);
}
/**
* A directory was found which was non-authoritative.
* The exit code for this operation is
* {@link LauncherExitCodes#EXIT_NOT_ACCEPTABLE} -This is what the S3Guard
* will return.
*/
public static final class NonAuthoritativeDirException
extends PathIOException implements ExitCodeProvider {
/**
* Instantiate.
* @param path the path which is non-authoritative.
*/
private NonAuthoritativeDirException(final Path path) {
super(path.toString(), E_NONAUTH);
}
@Override
public int getExitCode() {
return ERROR_ENTRY_NOT_AUTH_IN_DDB;
}
@Override
public String toString() {
return getMessage();
}
}
}

View File

@ -1,110 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.Closeable;
import java.io.IOException;
/**
* This represents state which may be passed to bulk IO operations
* to enable them to store information about the state of the ongoing
* operation across invocations.
* <p>
* A bulk operation state <i>MUST</i> only be be used for the single store
* from which it was created, and <i>MUST</i>only for the duration of a single
* bulk update operation.
* <p>
* Passing in the state is to allow the stores to maintain state about
* updates they have already made to their store during this single operation:
* a cache of what has happened. It is not a list of operations to be applied.
* If a list of operations to perform is built up (e.g. during rename)
* that is the duty of the caller, not this state.
* <p>
* After the operation has completed, it <i>MUST</i> be closed so
* as to guarantee that all state is released.
*/
public class BulkOperationState implements Closeable {
private final OperationType operation;
/**
* Constructor.
* @param operation the type of the operation.
*/
public BulkOperationState(final OperationType operation) {
this.operation = operation;
}
/**
* Get the operation type.
* @return the operation type.
*/
public OperationType getOperation() {
return operation;
}
@Override
public void close() throws IOException {
}
/**
* Enumeration of operations which can be performed in bulk.
* This can be used by the stores however they want.
* One special aspect: renames are to be done through a {@link RenameTracker}.
* Callers will be blocked from initiating a rename through
* {@code S3Guard#initiateBulkWrite()}
*/
public enum OperationType {
/** Writing data. */
Put,
/**
* Rename: add and delete.
* After the rename, the tree under the destination path
* can be tagged as authoritative.
*/
Rename,
/** Pruning: deleting entries and updating parents. */
Prune,
/** Commit operation. */
Commit,
/** Deletion operation. */
Delete,
/** FSCK operation. */
Fsck,
/**
* Bulk directory tree import.
* After an import, the entire tree under the path has been
* enumerated and should be tagged as authoritative.
*/
Import,
/**
* Listing update.
*/
Listing,
/**
* Mkdir operation.
*/
Mkdir,
/**
* Multipart upload operation.
*/
Upload
}
}

View File

@ -1,80 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
/**
* {@code DDBPathMetadata} wraps {@link PathMetadata} and adds the
* isAuthoritativeDir flag to provide support for authoritative directory
* listings in {@link DynamoDBMetadataStore}.
*/
public class DDBPathMetadata extends PathMetadata {
private boolean isAuthoritativeDir;
public DDBPathMetadata(PathMetadata pmd) {
super(pmd.getFileStatus(), pmd.isEmptyDirectory(), pmd.isDeleted(),
pmd.getLastUpdated());
this.isAuthoritativeDir = false;
this.setLastUpdated(pmd.getLastUpdated());
}
public DDBPathMetadata(S3AFileStatus fileStatus) {
super(fileStatus);
this.isAuthoritativeDir = false;
}
public DDBPathMetadata(S3AFileStatus fileStatus, Tristate isEmptyDir,
boolean isDeleted, long lastUpdated) {
super(fileStatus, isEmptyDir, isDeleted, lastUpdated);
this.isAuthoritativeDir = false;
}
public DDBPathMetadata(S3AFileStatus fileStatus, Tristate isEmptyDir,
boolean isDeleted, boolean isAuthoritativeDir, long lastUpdated) {
super(fileStatus, isEmptyDir, isDeleted, lastUpdated);
this.isAuthoritativeDir = isAuthoritativeDir;
}
public boolean isAuthoritativeDir() {
return isAuthoritativeDir;
}
public void setAuthoritativeDir(boolean authoritativeDir) {
isAuthoritativeDir = authoritativeDir;
}
@Override
public boolean equals(Object o) {
return super.equals(o);
}
@Override public int hashCode() {
return super.hashCode();
}
@Override public String toString() {
return "DDBPathMetadata{" +
"isAuthoritativeDir=" + isAuthoritativeDir +
", PathMetadata=" + super.toString() +
'}';
}
}

View File

@ -1,188 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import com.amazonaws.SdkBaseException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.util.DurationInfo;
import static org.apache.hadoop.fs.s3a.s3guard.S3Guard.addMoveAncestors;
import static org.apache.hadoop.fs.s3a.s3guard.S3Guard.addMoveDir;
import static org.apache.hadoop.fs.s3a.s3guard.S3Guard.addMoveFile;
/**
* This is the rename updating strategy originally used:
* a collection of source paths and a list of destinations are created,
* then updated at the end (possibly slow).
* <p>
* It is not currently instantiated by any of the active trackers,
* but is preserved to show that the original rename strategy
* can be implemented via the tracker model.
*/
public class DelayedUpdateRenameTracker extends RenameTracker {
private final MetadataStore metadataStore;
private final Collection<Path> sourcePaths = new HashSet<>();
private final List<PathMetadata> destMetas = new ArrayList<>();
private final List<Path> deletedPaths = new ArrayList<>();
public DelayedUpdateRenameTracker(
final StoreContext storeContext,
final MetadataStore metadataStore,
final Path sourceRoot,
final Path dest,
final BulkOperationState operationState) {
super("DelayedUpdateRenameTracker", storeContext, metadataStore,
sourceRoot, dest, operationState);
this.metadataStore = storeContext.getMetadataStore();
}
@Override
public synchronized void fileCopied(
final Path sourcePath,
final S3ObjectAttributes sourceAttributes,
final S3ObjectAttributes destAttributes,
final Path destPath,
final long blockSize,
final boolean addAncestors) throws IOException {
addMoveFile(metadataStore,
sourcePaths,
destMetas,
sourcePath,
destPath,
sourceAttributes.getLen(),
blockSize,
getOwner(),
destAttributes.getETag(),
destAttributes.getVersionId());
// Ancestor directories may not be listed, so we explicitly add them
if (addAncestors) {
addMoveAncestors(metadataStore,
sourcePaths,
destMetas,
getSourceRoot(),
sourcePath,
destPath,
getOwner());
}
}
@Override
public synchronized void directoryMarkerCopied(final Path sourcePath,
final Path destPath,
final boolean addAncestors) throws IOException {
addMoveDir(metadataStore, sourcePaths, destMetas,
sourcePath,
destPath, getOwner());
// Ancestor directories may not be listed, so we explicitly add them
if (addAncestors) {
addMoveAncestors(metadataStore,
sourcePaths,
destMetas,
getSourceRoot(),
sourcePath,
destPath,
getOwner());
}
}
@Override
public synchronized void moveSourceDirectory() throws IOException {
if (!sourcePaths.contains(getSourceRoot())) {
addMoveDir(metadataStore, sourcePaths, destMetas,
getSourceRoot(),
getDest(), getOwner());
}
}
@Override
public synchronized void sourceObjectsDeleted(
final Collection<Path> paths) throws IOException {
// add to the list of deleted paths.
deletedPaths.addAll(paths);
}
@Override
public void completeRename() throws IOException {
metadataStore.move(sourcePaths, destMetas, getOperationState());
super.completeRename();
}
@Override
public IOException renameFailed(final Exception ex) {
LOG.warn("Rename has failed; updating s3guard with destination state");
try (DurationInfo ignored = new DurationInfo(LOG,
"Cleaning up deleted paths")) {
// the destination paths are updated; the source is left alone.
metadataStore.move(new ArrayList<>(0), destMetas, getOperationState());
for (Path deletedPath : deletedPaths) {
// this is not ideal in that it may leave parent stuff around.
metadataStore.delete(deletedPath, getOperationState());
}
deleteParentPaths();
} catch (IOException | SdkBaseException e) {
LOG.warn("Ignoring error raised in AWS SDK ", e);
}
return super.renameFailed(ex);
}
/**
* Delete all the parent paths we know to be empty (by walking up the tree
* deleting as appropriate).
* @throws IOException failure
*/
private void deleteParentPaths() throws IOException {
Set<Path> parentPaths = new HashSet<>();
for (Path deletedPath : deletedPaths) {
Path parent = deletedPath.getParent();
if (!parent.equals(getSourceRoot())) {
parentPaths.add(parent);
}
}
// now there's a set of parent paths. We now want to
// get them ordered by depth, so that deeper entries come first
// that way: when we check for a parent path existing we can
// see if it really is empty.
List<Path> parents = new ArrayList<>(parentPaths);
parents.sort(PathOrderComparators.TOPMOST_PATH_LAST);
for (Path parent : parents) {
PathMetadata md = metadataStore.get(parent, true);
if (md != null && md.isEmptyDirectory() == Tristate.TRUE) {
// if were confident that this is empty: delete it.
metadataStore.delete(parent, getOperationState());
}
}
}
}

View File

@ -1,142 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.Collection;
import java.util.LinkedList;
import java.util.NoSuchElementException;
import java.util.Queue;
import org.apache.hadoop.util.Preconditions;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
/**
* {@code DescendantsIterator} is a {@link RemoteIterator} that implements
* pre-ordering breadth-first traversal (BFS) of a path and all of its
* descendants recursively. After visiting each path, that path's direct
* children are discovered by calling {@link MetadataStore#listChildren(Path)}.
* Each iteration returns the next direct child, and if that child is a
* directory, also pushes it onto a queue to discover its children later.
*
* For example, assume the consistent store contains metadata representing this
* file system structure:
*
* <pre>
* /dir1
* |-- dir2
* | |-- file1
* | `-- file2
* `-- dir3
* |-- dir4
* | `-- file3
* |-- dir5
* | `-- file4
* `-- dir6
* </pre>
*
* Consider this code sample:
* <pre>
* final PathMetadata dir1 = get(new Path("/dir1"));
* for (DescendantsIterator descendants = new DescendantsIterator(dir1);
* descendants.hasNext(); ) {
* final FileStatus status = descendants.next().getFileStatus();
* System.out.printf("%s %s%n", status.isDirectory() ? 'D' : 'F',
* status.getPath());
* }
* </pre>
*
* The output is:
* <pre>
* D /dir1
* D /dir1/dir2
* D /dir1/dir3
* F /dir1/dir2/file1
* F /dir1/dir2/file2
* D /dir1/dir3/dir4
* D /dir1/dir3/dir5
* F /dir1/dir3/dir4/file3
* F /dir1/dir3/dir5/file4
* D /dir1/dir3/dir6
* </pre>
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class DescendantsIterator implements RemoteIterator<S3AFileStatus> {
private final MetadataStore metadataStore;
private final Queue<PathMetadata> queue = new LinkedList<>();
/**
* Creates a new {@code DescendantsIterator}.
*
* @param ms the associated {@link MetadataStore}
* @param meta base path for descendants iteration, which will be the first
* returned during iteration (except root). Null makes empty iterator.
* @throws IOException if errors happen during metadata store listing
*/
public DescendantsIterator(MetadataStore ms, PathMetadata meta)
throws IOException {
Preconditions.checkNotNull(ms);
this.metadataStore = ms;
if (meta != null) {
final Path path = meta.getFileStatus().getPath();
if (path.isRoot()) {
DirListingMetadata rootListing = ms.listChildren(path);
if (rootListing != null) {
rootListing = rootListing.withoutTombstones();
queue.addAll(rootListing.getListing());
}
} else {
queue.add(meta);
}
}
}
@Override
public boolean hasNext() throws IOException {
return !queue.isEmpty();
}
@Override
public S3AFileStatus next() throws IOException {
if (!hasNext()) {
throw new NoSuchElementException("No more descendants.");
}
PathMetadata next;
next = queue.poll();
if (next.getFileStatus().isDirectory()) {
final Path path = next.getFileStatus().getPath();
DirListingMetadata meta = metadataStore.listChildren(path);
if (meta != null) {
Collection<PathMetadata> more = meta.withoutTombstones().getListing();
if (!more.isEmpty()) {
queue.addAll(more);
}
}
}
return next.getFileStatus();
}
}

View File

@ -1,372 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.net.URI;
import java.net.URISyntaxException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import org.apache.hadoop.util.Preconditions;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
/**
* {@code DirListingMetadata} models a directory listing stored in a
* {@link MetadataStore}. Instances of this class are mutable and thread-safe.
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class DirListingMetadata extends ExpirableMetadata {
/**
* Convenience parameter for passing into constructor.
*/
public static final Collection<PathMetadata> EMPTY_DIR =
Collections.emptyList();
private final Path path;
/** Using a map for fast find / remove with large directories. */
private Map<Path, PathMetadata> listMap = new ConcurrentHashMap<>();
private boolean isAuthoritative;
/**
* Create a directory listing metadata container.
*
* @param path Path of the directory. If this path has a host component, then
* all paths added later via {@link #put(PathMetadata)} must also have
* the same host.
* @param listing Entries in the directory.
* @param isAuthoritative true iff listing is the full contents of the
* directory, and the calling client reports that this may be cached as
* the full and authoritative listing of all files in the directory.
* @param lastUpdated last updated time on which expiration is based.
*/
public DirListingMetadata(Path path, Collection<PathMetadata> listing,
boolean isAuthoritative, long lastUpdated) {
checkPathAbsolute(path);
this.path = path;
if (listing != null) {
for (PathMetadata entry : listing) {
Path childPath = entry.getFileStatus().getPath();
checkChildPath(childPath);
listMap.put(childPath, entry);
}
}
this.isAuthoritative = isAuthoritative;
this.setLastUpdated(lastUpdated);
}
public DirListingMetadata(Path path, Collection<PathMetadata> listing,
boolean isAuthoritative) {
this(path, listing, isAuthoritative, 0);
}
/**
* Copy constructor.
* @param d the existing {@link DirListingMetadata} object.
*/
public DirListingMetadata(DirListingMetadata d) {
path = d.path;
isAuthoritative = d.isAuthoritative;
this.setLastUpdated(d.getLastUpdated());
listMap = new ConcurrentHashMap<>(d.listMap);
}
/**
* @return {@code Path} of the directory that contains this listing.
*/
public Path getPath() {
return path;
}
/**
* @return entries in the directory
*/
public Collection<PathMetadata> getListing() {
return Collections.unmodifiableCollection(listMap.values());
}
/**
* List all tombstones.
* @return all tombstones in the listing.
*/
public Set<Path> listTombstones() {
Set<Path> tombstones = new HashSet<>();
for (PathMetadata meta : listMap.values()) {
if (meta.isDeleted()) {
tombstones.add(meta.getFileStatus().getPath());
}
}
return tombstones;
}
/**
* Get the directory listing excluding tombstones.
* Returns a new DirListingMetadata instances, without the tombstones -the
* lastUpdated field is copied from this instance.
* @return a new DirListingMetadata without the tombstones.
*/
public DirListingMetadata withoutTombstones() {
Collection<PathMetadata> filteredList = new ArrayList<>();
for (PathMetadata meta : listMap.values()) {
if (!meta.isDeleted()) {
filteredList.add(meta);
}
}
return new DirListingMetadata(path, filteredList, isAuthoritative,
this.getLastUpdated());
}
/**
* @return number of entries tracked. This is not the same as the number
* of entries in the actual directory unless {@link #isAuthoritative()} is
* true.
* It will also include any tombstones.
*/
public int numEntries() {
return listMap.size();
}
/**
* @return true iff this directory listing is full and authoritative within
* the scope of the {@code MetadataStore} that returned it.
*/
public boolean isAuthoritative() {
return isAuthoritative;
}
/**
* Is the underlying directory known to be empty?
* @return FALSE if directory is known to have a child entry, TRUE if
* directory is known to be empty, UNKNOWN otherwise.
*/
public Tristate isEmpty() {
if (getListing().isEmpty()) {
if (isAuthoritative()) {
return Tristate.TRUE;
} else {
// This listing is empty, but may not be full list of underlying dir.
return Tristate.UNKNOWN;
}
} else { // not empty listing
// There exists at least one child, dir not empty.
return Tristate.FALSE;
}
}
/**
* Marks this directory listing as full and authoritative.
* @param authoritative see {@link #isAuthoritative()}.
*/
public void setAuthoritative(boolean authoritative) {
this.isAuthoritative = authoritative;
}
/**
* Lookup entry within this directory listing. This may return null if the
* {@code MetadataStore} only tracks a partial set of the directory entries.
* In the case where {@link #isAuthoritative()} is true, however, this
* function returns null iff the directory is known not to contain the listing
* at given path (within the scope of the {@code MetadataStore} that returned
* it).
*
* @param childPath path of entry to look for.
* @return entry, or null if it is not present or not being tracked.
*/
public PathMetadata get(Path childPath) {
checkChildPath(childPath);
return listMap.get(childPath);
}
/**
* Replace an entry with a tombstone.
* @param childPath path of entry to replace.
*/
public void markDeleted(Path childPath, long lastUpdated) {
checkChildPath(childPath);
listMap.put(childPath, PathMetadata.tombstone(childPath, lastUpdated));
}
/**
* Remove entry from this directory.
*
* @param childPath path of entry to remove.
*/
public void remove(Path childPath) {
checkChildPath(childPath);
listMap.remove(childPath);
}
/**
* Add an entry to the directory listing. If this listing already contains a
* {@code FileStatus} with the same path, it will be replaced.
*
* @param childPathMetadata entry to add to this directory listing.
* @return true if the status was added or replaced with a new value. False
* if the same FileStatus value was already present.
*/
public boolean put(PathMetadata childPathMetadata) {
Preconditions.checkNotNull(childPathMetadata,
"childPathMetadata must be non-null");
final S3AFileStatus fileStatus = childPathMetadata.getFileStatus();
Path childPath = childStatusToPathKey(fileStatus);
PathMetadata newValue = childPathMetadata;
PathMetadata oldValue = listMap.put(childPath, childPathMetadata);
return oldValue == null || !oldValue.equals(newValue);
}
@Override
public String toString() {
return "DirListingMetadata{" +
"path=" + path +
", listMap=" + listMap +
", isAuthoritative=" + isAuthoritative +
", lastUpdated=" + this.getLastUpdated() +
'}';
}
/**
* Remove expired entries from the listing based on TTL.
* @param ttl the ttl time
* @param now the current time
* @return the expired values.
*/
public synchronized List<PathMetadata> removeExpiredEntriesFromListing(
long ttl, long now) {
List<PathMetadata> expired = new ArrayList<>();
final Iterator<Map.Entry<Path, PathMetadata>> iterator =
listMap.entrySet().iterator();
while (iterator.hasNext()) {
final Map.Entry<Path, PathMetadata> entry = iterator.next();
// we filter iff the lastupdated is not 0 and the entry is expired
PathMetadata metadata = entry.getValue();
if (metadata.getLastUpdated() != 0
&& (metadata.getLastUpdated() + ttl) <= now) {
expired.add(metadata);
iterator.remove();
}
}
return expired;
}
/**
* Log contents to supplied StringBuilder in a pretty fashion.
* @param sb target StringBuilder
*/
public void prettyPrint(StringBuilder sb) {
sb.append(String.format("DirMeta %-20s %-18s",
path.toString(),
isAuthoritative ? "Authoritative" : "Not Authoritative"));
for (Map.Entry<Path, PathMetadata> entry : listMap.entrySet()) {
sb.append("\n key: ").append(entry.getKey()).append(": ");
entry.getValue().prettyPrint(sb);
}
sb.append("\n");
}
public String prettyPrint() {
StringBuilder sb = new StringBuilder();
prettyPrint(sb);
return sb.toString();
}
/**
* Checks that child path is valid.
* @param childPath path to check.
*/
private void checkChildPath(Path childPath) {
checkPathAbsolute(childPath);
// If this dir's path has host (and thus scheme), so must its children
URI parentUri = path.toUri();
URI childUri = childPath.toUri();
if (parentUri.getHost() != null) {
Preconditions.checkNotNull(childUri.getHost(), "Expected non-null URI " +
"host: %s", childUri);
Preconditions.checkArgument(
childUri.getHost().equals(parentUri.getHost()),
"childUri %s and parentUri %s must have the same host",
childUri, parentUri);
Preconditions.checkNotNull(childUri.getScheme(), "No scheme in path %s",
childUri);
}
Preconditions.checkArgument(!childPath.isRoot(),
"childPath cannot be the root path: %s", childPath);
Preconditions.checkArgument(parentUri.getPath().equals(
childPath.getParent().toUri().getPath()),
"childPath %s must be a child of %s", childPath, path);
}
/**
* For Paths that are handed in directly, we assert they are in consistent
* format with checkPath(). For paths that are supplied embedded in
* FileStatus, we attempt to fill in missing scheme and host, when this
* DirListingMetadata is associated with one.
*
* @return Path suitable for consistent hashtable lookups
* @throws NullPointerException null status argument
* @throws IllegalArgumentException bad status values or failure to
* create a URI.
*/
private Path childStatusToPathKey(FileStatus status) {
Path p = status.getPath();
Preconditions.checkNotNull(p, "Child status' path cannot be null");
Preconditions.checkArgument(!p.isRoot(),
"childPath cannot be the root path: %s", p);
Preconditions.checkArgument(p.getParent().equals(path),
"childPath %s must be a child of %s", p, path);
URI uri = p.toUri();
URI parentUri = path.toUri();
// If FileStatus' path is missing host, but should have one, add it.
if (uri.getHost() == null && parentUri.getHost() != null) {
try {
return new Path(new URI(parentUri.getScheme(), parentUri.getHost(),
uri.getPath(), uri.getFragment()));
} catch (URISyntaxException e) {
throw new IllegalArgumentException("FileStatus path invalid with" +
" added " + parentUri.getScheme() + "://" + parentUri.getHost() +
" added", e);
}
}
return p;
}
private void checkPathAbsolute(Path p) {
Preconditions.checkNotNull(p, "path must be non-null");
Preconditions.checkArgument(p.isAbsolute(), "path must be absolute: %s", p);
}
}

View File

@ -1,792 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import java.io.Closeable;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URI;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.Collections;
import java.util.Date;
import java.util.Deque;
import java.util.List;
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathIOException;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.Listing;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
import org.apache.hadoop.fs.s3a.S3ListRequest;
import org.apache.hadoop.fs.store.audit.AuditSpan;
import org.apache.hadoop.service.Service;
import org.apache.hadoop.service.launcher.LauncherExitCodes;
import org.apache.hadoop.service.launcher.ServiceLaunchException;
import org.apache.hadoop.service.launcher.ServiceLauncher;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.ExitUtil;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.s3a.S3AUtils.ACCEPT_ALL;
/**
* This is a low-level diagnostics entry point which does a CVE/TSV dump of
* the DDB state.
* As it also lists the filesystem, it actually changes the state of the store
* during the operation.
*/
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class DumpS3GuardDynamoTable extends AbstractS3GuardDynamoDBDiagnostic {
private static final Logger LOG =
LoggerFactory.getLogger(DumpS3GuardDynamoTable.class);
/**
* Application name.
*/
public static final String NAME = "DumpS3GuardDynamoTable";
/**
* Usage.
*/
private static final String USAGE_MESSAGE = NAME
+ " <filesystem> <dest-file>";
/**
* Suffix for the flat list: {@value}.
*/
public static final String FLAT_CSV = "-flat.csv";
/**
* Suffix for the raw S3 dump: {@value}.
*/
public static final String RAW_CSV = "-s3.csv";
/**
* Suffix for the DDB scan: {@value}.
*/
public static final String SCAN_CSV = "-scan.csv";
/**
* Suffix for the second DDB scan: : {@value}.
*/
public static final String SCAN2_CSV = "-scan-2.csv";
/**
* Suffix for the treewalk scan of the S3A Filesystem: {@value}.
*/
public static final String TREE_CSV = "-tree.csv";
/**
* Suffix for a recursive treewalk through the metastore: {@value}.
*/
public static final String STORE_CSV = "-store.csv";
/**
* Path in the local filesystem to save the data.
*/
private String destPath;
private Pair<Long, Long> scanEntryResult;
private Pair<Long, Long> secondScanResult;
private long rawObjectStoreCount;
private long listStatusCount;
private long treewalkCount;
/**
* Instantiate.
* @param name application name.
*/
public DumpS3GuardDynamoTable(final String name) {
super(name);
}
/**
* Instantiate with default name.
*/
public DumpS3GuardDynamoTable() {
this(NAME);
}
/**
* Bind to a specific FS + store.
* @param fs filesystem
* @param store metastore to use
* @param destFile the base filename for output
* @param uri URI of store -only needed if FS is null.
*/
public DumpS3GuardDynamoTable(
final S3AFileSystem fs,
final DynamoDBMetadataStore store,
final File destFile,
final URI uri) {
super(NAME, fs, store, uri);
this.destPath = destFile.getAbsolutePath();
}
/**
* Bind to the argument list, including validating the CLI.
* @throws Exception failure.
*/
@Override
protected void serviceStart() throws Exception {
if (getStore() == null) {
List<String> arg = getArgumentList(2, 2, USAGE_MESSAGE);
bindFromCLI(arg.get(0));
destPath = arg.get(1);
}
}
/**
* Dump the filesystem and the metastore.
* @return the exit code.
* @throws ServiceLaunchException on failure.
* @throws IOException IO failure.
*/
@Override
public int execute() throws ServiceLaunchException, IOException {
try {
final File scanFile = new File(
destPath + SCAN_CSV).getCanonicalFile();
File parentDir = scanFile.getParentFile();
if (!parentDir.mkdirs() && !parentDir.isDirectory()) {
throw new PathIOException(parentDir.toString(),
"Could not create destination directory");
}
try (CsvFile csv = new CsvFile(scanFile);
DurationInfo ignored = new DurationInfo(LOG,
"scanFile dump to %s", scanFile)) {
scanEntryResult = scanMetastore(csv);
}
if (getFilesystem() != null) {
Path basePath = getFilesystem().qualify(new Path(getUri()));
final File destFile = new File(destPath + STORE_CSV)
.getCanonicalFile();
LOG.info("Writing Store details to {}", destFile);
try (CsvFile csv = new CsvFile(destFile);
DurationInfo ignored = new DurationInfo(LOG, "List metastore")) {
LOG.info("Base path: {}", basePath);
dumpMetastore(csv, basePath);
}
// these operations all update the metastore as they list,
// that is: they are side-effecting.
final File treewalkFile = new File(destPath + TREE_CSV)
.getCanonicalFile();
try (CsvFile csv = new CsvFile(treewalkFile);
DurationInfo ignored = new DurationInfo(LOG,
"Treewalk to %s", treewalkFile)) {
treewalkCount = treewalkFilesystem(csv, basePath);
}
final File flatlistFile = new File(
destPath + FLAT_CSV).getCanonicalFile();
try (CsvFile csv = new CsvFile(flatlistFile);
DurationInfo ignored = new DurationInfo(LOG,
"Flat list to %s", flatlistFile)) {
listStatusCount = listStatusFilesystem(csv, basePath);
}
final File rawFile = new File(
destPath + RAW_CSV).getCanonicalFile();
try (CsvFile csv = new CsvFile(rawFile);
DurationInfo ignored = new DurationInfo(LOG,
"Raw dump to %s", rawFile)) {
rawObjectStoreCount = dumpRawS3ObjectStore(csv);
}
final File scanFile2 = new File(
destPath + SCAN2_CSV).getCanonicalFile();
try (CsvFile csv = new CsvFile(scanFile);
DurationInfo ignored = new DurationInfo(LOG,
"scanFile dump to %s", scanFile2)) {
secondScanResult = scanMetastore(csv);
}
}
return LauncherExitCodes.EXIT_SUCCESS;
} catch (IOException | RuntimeException e) {
LOG.error("failure", e);
throw e;
}
}
/**
* Push all elements of a list to a queue, such that the first entry
* on the list becomes the head of the queue.
* @param queue queue to update
* @param entries list of entries to add.
* @param <T> type of queue
*/
private <T> void pushAll(Deque<T> queue, List<T> entries) {
Collections.reverse(entries);
for (T t : entries) {
queue.push(t);
}
}
/**
* Dump the filesystem via a treewalk.
* If metastore entries mark directories as deleted, this
* walk will not explore them.
* @param csv destination.
* @param base base path.
* @return number of entries found.
* @throws IOException IO failure.
*/
protected long treewalkFilesystem(
final CsvFile csv,
final Path base) throws IOException {
ArrayDeque<Path> queue = new ArrayDeque<>();
queue.add(base);
long count = 0;
while (!queue.isEmpty()) {
Path path = queue.pop();
count++;
FileStatus[] fileStatuses;
try {
fileStatuses = getFilesystem().listStatus(path);
} catch (FileNotFoundException e) {
LOG.warn("File {} was not found", path);
continue;
}
// entries
for (FileStatus fileStatus : fileStatuses) {
csv.entry((S3AFileStatus) fileStatus);
}
// scan through the list, building up a reverse list of all directories
// found.
List<Path> dirs = new ArrayList<>(fileStatuses.length);
for (FileStatus fileStatus : fileStatuses) {
if (fileStatus.isDirectory()
&& !(fileStatus.getPath().equals(path))) {
// directory: add to the end of the queue.
dirs.add(fileStatus.getPath());
} else {
// file: just increment the count
count++;
}
// now push the dirs list in reverse
// so that they have been added in the sort order as returned.
pushAll(queue, dirs);
}
}
return count;
}
/**
* Dump the filesystem via a recursive listStatus call.
* @param csv destination.
* @return number of entries found.
* @throws IOException IO failure.
*/
protected long listStatusFilesystem(
final CsvFile csv,
final Path path) throws IOException {
long count = 0;
RemoteIterator<S3ALocatedFileStatus> iterator = getFilesystem()
.listFilesAndEmptyDirectories(path, true);
while (iterator.hasNext()) {
S3ALocatedFileStatus status = iterator.next();
csv.entry(status.toS3AFileStatus());
}
return count;
}
/**
* Dump the raw S3 Object Store.
* @param csv destination.
* @return number of entries found.
* @throws IOException IO failure.
*/
protected long dumpRawS3ObjectStore(
final CsvFile csv) throws IOException {
S3AFileSystem fs = getFilesystem();
long count = 0;
Path rootPath = fs.qualify(new Path("/"));
try (AuditSpan span = fs.createSpan("DumpS3GuardDynamoTable",
rootPath.toString(), null)) {
Listing listing = fs.getListing();
S3ListRequest request = listing.createListObjectsRequest("", null, span);
count = 0;
RemoteIterator<S3AFileStatus> st =
listing.createFileStatusListingIterator(rootPath, request,
ACCEPT_ALL,
new Listing.AcceptAllButSelfAndS3nDirs(rootPath),
span);
while (st.hasNext()) {
count++;
S3AFileStatus next = st.next();
LOG.debug("[{}] {}", count, next);
csv.entry(next);
}
LOG.info("entry count: {}", count);
}
return count;
}
/**
* list children under the metastore from a base path, through
* a recursive query + walk strategy.
* @param csv dest
* @param basePath base path
* @throws IOException failure.
*/
protected void dumpMetastore(final CsvFile csv,
final Path basePath) throws IOException {
dumpStoreEntries(csv, getStore().listChildren(basePath));
}
/**
* Recursive Store Dump.
* @param csv open CSV file.
* @param dir directory listing
* @return (directories, files)
* @throws IOException failure
*/
private Pair<Long, Long> dumpStoreEntries(
CsvFile csv,
DirListingMetadata dir) throws IOException {
ArrayDeque<DirListingMetadata> queue = new ArrayDeque<>();
queue.add(dir);
long files = 0, dirs = 1;
while (!queue.isEmpty()) {
DirListingMetadata next = queue.pop();
List<DDBPathMetadata> childDirs = new ArrayList<>();
Collection<PathMetadata> listing = next.getListing();
// sort by name
List<PathMetadata> sorted = new ArrayList<>(listing);
sorted.sort(new PathOrderComparators.PathMetadataComparator(
(l, r) -> l.compareTo(r)));
for (PathMetadata pmd : sorted) {
DDBPathMetadata ddbMd = (DDBPathMetadata) pmd;
dumpEntry(csv, ddbMd);
if (ddbMd.getFileStatus().isDirectory()) {
childDirs.add(ddbMd);
} else {
files++;
}
}
List<DirListingMetadata> childMD = new ArrayList<>(childDirs.size());
for (DDBPathMetadata childDir : childDirs) {
childMD.add(getStore().listChildren(
childDir.getFileStatus().getPath()));
}
pushAll(queue, childMD);
}
return Pair.of(dirs, files);
}
/**
* Dump a single entry, and log it.
* @param csv CSV output file.
* @param md metadata to log.
*/
private void dumpEntry(CsvFile csv, DDBPathMetadata md) {
LOG.debug("{}", md.prettyPrint());
csv.entry(md);
}
/**
* Scan the metastore for all entries and dump them.
* There's no attempt to sort the output.
* @param csv file
* @return tuple of (live entries, tombstones).
*/
private Pair<Long, Long> scanMetastore(CsvFile csv) {
S3GuardTableAccess tableAccess = new S3GuardTableAccess(getStore());
ExpressionSpecBuilder builder = new ExpressionSpecBuilder();
Iterable<DDBPathMetadata> results =
getStore().wrapWithRetries(tableAccess.scanMetadata(builder));
long live = 0;
long tombstone = 0;
for (DDBPathMetadata md : results) {
if (!(md instanceof S3GuardTableAccess.VersionMarker)) {
// print it
csv.entry(md);
if (md.isDeleted()) {
tombstone++;
} else {
live++;
}
}
}
return Pair.of(live, tombstone);
}
public Pair<Long, Long> getScanEntryResult() {
return scanEntryResult;
}
public Pair<Long, Long> getSecondScanResult() {
return secondScanResult;
}
public long getRawObjectStoreCount() {
return rawObjectStoreCount;
}
public long getListStatusCount() {
return listStatusCount;
}
public long getTreewalkCount() {
return treewalkCount;
}
/**
* Convert a timestamp in milliseconds to a human string.
* @param millis epoch time in millis
* @return a string for the CSV file.
*/
private static String stringify(long millis) {
return new Date(millis).toString();
}
/**
* This is the JVM entry point for the service launcher.
*
* Converts the arguments to a list, then invokes
* {@link #serviceMain(List, AbstractS3GuardDynamoDBDiagnostic)}.
* @param args command line arguments.
*/
public static void main(String[] args) {
try {
serviceMain(Arrays.asList(args), new DumpS3GuardDynamoTable());
} catch (ExitUtil.ExitException e) {
ExitUtil.terminate(e);
}
}
/**
* The real main function, which takes the arguments as a list.
* Argument 0 MUST be the service classname
* @param argsList the list of arguments
* @param service service to launch.
*/
static void serviceMain(
final List<String> argsList,
final AbstractS3GuardDynamoDBDiagnostic service) {
ServiceLauncher<Service> serviceLauncher =
new ServiceLauncher<>(service.getName());
ExitUtil.ExitException ex = serviceLauncher.launchService(
new Configuration(),
service,
argsList,
false,
true);
if (ex != null) {
throw ex;
}
}
/**
* Entry point to dump the metastore and s3 store world views
* <p>
* Both the FS and the store will be dumped: the store is scanned
* before and after the sequence to show what changes were made to
* the store during the list operation.
* @param fs fs to dump. If null a store must be provided.
* @param store store to dump (fallback to FS)
* @param conf configuration to use (fallback to fs)
* @param destFile base name of the output files.
* @param uri URI of store -only needed if FS is null.
* @throws ExitUtil.ExitException failure.
* @return the store
*/
public static DumpS3GuardDynamoTable dumpStore(
@Nullable final S3AFileSystem fs,
@Nullable DynamoDBMetadataStore store,
@Nullable Configuration conf,
final File destFile,
@Nullable URI uri) throws ExitUtil.ExitException {
ServiceLauncher<Service> serviceLauncher =
new ServiceLauncher<>(NAME);
if (conf == null) {
conf = checkNotNull(fs, "No filesystem").getConf();
}
if (store == null) {
store = (DynamoDBMetadataStore) checkNotNull(fs, "No filesystem")
.getMetadataStore();
}
DumpS3GuardDynamoTable dump = new DumpS3GuardDynamoTable(fs,
store,
destFile,
uri);
ExitUtil.ExitException ex = serviceLauncher.launchService(
conf,
dump,
Collections.emptyList(),
false,
true);
if (ex != null && ex.getExitCode() != 0) {
throw ex;
}
LOG.info("Results:");
Pair<Long, Long> r = dump.getScanEntryResult();
LOG.info("Metastore entries: {}", r);
LOG.info("Metastore scan total {}, entries {}, tombstones {}",
r.getLeft() + r.getRight(),
r.getLeft(),
r.getRight());
LOG.info("S3 count {}", dump.getRawObjectStoreCount());
LOG.info("Treewalk Count {}", dump.getTreewalkCount());
LOG.info("List Status Count {}", dump.getListStatusCount());
r = dump.getSecondScanResult();
if (r != null) {
LOG.info("Second metastore scan total {}, entries {}, tombstones {}",
r.getLeft() + r.getRight(),
r.getLeft(),
r.getRight());
}
return dump;
}
/**
* Writer for generating test CSV files.
*
* Quotes are manged by passing in a long whose specific bits control
* whether or not a row is quoted, bit 0 for column 0, etc.
*
* There is no escaping of values here.
*/
private static final class CsvFile implements Closeable {
/** constant to quote all columns. */
public static final long ALL_QUOTES = 0x7fffffff;
/** least significant bit is used for first column; 1 mean 'quote'. */
public static final int ROW_QUOTE_MAP = 0b1110_1001_1111;
/** quote nothing: {@value}. */
public static final long NO_QUOTES = 0;
private final Path path;
private final PrintWriter out;
private final String separator;
private final String eol;
private final String quote;
/**
* Create.
* @param path filesystem path.
* @param out output write.
* @param separator separator of entries.
* @param eol EOL marker.
* @param quote quote marker.
* @throws IOException failure.
*/
private CsvFile(
final Path path,
final PrintWriter out,
final String separator,
final String eol,
final String quote) throws IOException {
this.separator = checkNotNull(separator);
this.eol = checkNotNull(eol);
this.quote = checkNotNull(quote);
this.path = path;
this.out = checkNotNull(out);
header();
}
/**
* Create to a file, with UTF-8 output and the standard
* options of the TSV file.
* @param file destination file.
* @throws IOException failure.
*/
private CsvFile(File file) throws IOException {
this(null,
new PrintWriter(file, "UTF-8"), "\t", "\n", "\"");
}
/**
* Close the file, if not already done.
* @throws IOException on a failure.
*/
@Override
public synchronized void close() throws IOException {
if (out != null) {
out.close();
}
}
public Path getPath() {
return path;
}
public String getSeparator() {
return separator;
}
public String getEol() {
return eol;
}
/**
* Write a row.
* Entries are quoted if the bit for that column is true.
* @param quotes quote policy: every bit defines the rule for that element
* @param columns columns to write
* @return self for ease of chaining.
*/
public CsvFile row(long quotes, Object... columns) {
checkNotNull(out);
for (int i = 0; i < columns.length; i++) {
if (i != 0) {
out.write(separator);
}
boolean toQuote = (quotes & 1) == 1;
// unsigned right shift to make next column flag @ position 0
quotes = quotes >>> 1;
if (toQuote) {
out.write(quote);
}
Object column = columns[i];
out.write(column != null ? column.toString() : "");
if (toQuote) {
out.write(quote);
}
}
out.write(eol);
return this;
}
/**
* Write a line.
* @param line line to print
* @return self for ease of chaining.
*/
public CsvFile line(String line) {
out.write(line);
out.write(eol);
return this;
}
/**
* Get the output stream.
* @return the stream.
*/
public PrintWriter getOut() {
return out;
}
/**
* Print the header.
*/
void header() {
row(CsvFile.ALL_QUOTES,
"type",
"deleted",
"path",
"is_auth_dir",
"is_empty_dir",
"len",
"updated",
"updated_s",
"last_modified",
"last_modified_s",
"etag",
"version");
}
/**
* Add a metadata entry.
* @param md metadata.
*/
void entry(DDBPathMetadata md) {
S3AFileStatus fileStatus = md.getFileStatus();
row(ROW_QUOTE_MAP,
fileStatus.isDirectory() ? "dir" : "file",
md.isDeleted(),
fileStatus.getPath().toString(),
md.isAuthoritativeDir(),
md.isEmptyDirectory().name(),
fileStatus.getLen(),
md.getLastUpdated(),
stringify(md.getLastUpdated()),
fileStatus.getModificationTime(),
stringify(fileStatus.getModificationTime()),
fileStatus.getETag(),
fileStatus.getVersionId());
}
/**
* filesystem entry: no metadata.
* @param fileStatus file status
*/
void entry(S3AFileStatus fileStatus) {
row(ROW_QUOTE_MAP,
fileStatus.isDirectory() ? "dir" : "file",
"false",
fileStatus.getPath().toString(),
"",
fileStatus.isEmptyDirectory().name(),
fileStatus.getLen(),
"",
"",
fileStatus.getModificationTime(),
stringify(fileStatus.getModificationTime()),
fileStatus.getETag(),
fileStatus.getVersionId());
}
}
}

View File

@ -1,136 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import org.apache.hadoop.util.Preconditions;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.s3a.Constants;
import org.apache.hadoop.fs.s3a.S3AUtils;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_REGION_KEY;
/**
* Interface to create a DynamoDB client.
*
* Implementation should be configured for setting and getting configuration.
*/
@InterfaceAudience.Private
public interface DynamoDBClientFactory extends Configurable {
Logger LOG = LoggerFactory.getLogger(DynamoDBClientFactory.class);
/**
* Create a DynamoDB client object from configuration.
*
* The DynamoDB client to create does not have to relate to any S3 buckets.
* All information needed to create a DynamoDB client is from the hadoop
* configuration. Specially, if the region is not configured, it will use the
* provided region parameter. If region is neither configured nor provided,
* it will indicate an error.
*
* @param defaultRegion the default region of the AmazonDynamoDB client
* @param bucket Optional bucket to use to look up per-bucket proxy secrets
* @param credentials credentials to use for authentication.
* @return a new DynamoDB client
* @throws IOException if any IO error happens
*/
AmazonDynamoDB createDynamoDBClient(final String defaultRegion,
final String bucket,
final AWSCredentialsProvider credentials) throws IOException;
/**
* The default implementation for creating an AmazonDynamoDB.
*/
class DefaultDynamoDBClientFactory extends Configured
implements DynamoDBClientFactory {
@Override
public AmazonDynamoDB createDynamoDBClient(String defaultRegion,
final String bucket,
final AWSCredentialsProvider credentials)
throws IOException {
Preconditions.checkNotNull(getConf(),
"Should have been configured before usage");
final Configuration conf = getConf();
final ClientConfiguration awsConf = S3AUtils
.createAwsConf(conf, bucket, Constants.AWS_SERVICE_IDENTIFIER_DDB);
final String region = getRegion(conf, defaultRegion);
LOG.debug("Creating DynamoDB client in region {}", region);
return AmazonDynamoDBClientBuilder.standard()
.withCredentials(credentials)
.withClientConfiguration(awsConf)
.withRegion(region)
.build();
}
/**
* Helper method to get and validate the AWS region for DynamoDBClient.
*
* @param conf configuration
* @param defaultRegion the default region
* @return configured region or else the provided default region
* @throws IOException if the region is not valid
*/
static String getRegion(Configuration conf, String defaultRegion)
throws IOException {
String region = conf.getTrimmed(S3GUARD_DDB_REGION_KEY);
if (StringUtils.isEmpty(region)) {
region = defaultRegion;
}
try {
Regions.fromName(region);
} catch (IllegalArgumentException | NullPointerException e) {
throw new IOException("Invalid region specified: " + region + "; " +
"Region can be configured with " + S3GUARD_DDB_REGION_KEY + ": " +
validRegionsString());
}
return region;
}
private static String validRegionsString() {
final String delimiter = ", ";
Regions[] regions = Regions.values();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < regions.length; i++) {
if (i > 0) {
sb.append(delimiter);
}
sb.append(regions[i].getName());
}
return sb.toString();
}
}
}

View File

@ -1,756 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InterruptedIOException;
import java.nio.file.AccessDeniedException;
import java.util.ArrayList;
import java.util.Date;
import java.util.List;
import java.util.Map;
import java.util.Optional;
import com.amazonaws.AmazonClientException;
import com.amazonaws.SdkBaseException;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.document.DynamoDB;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.PrimaryKey;
import com.amazonaws.services.dynamodbv2.document.PutItemOutcome;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException;
import com.amazonaws.services.dynamodbv2.model.BillingMode;
import com.amazonaws.services.dynamodbv2.model.CreateTableRequest;
import com.amazonaws.services.dynamodbv2.model.ListTagsOfResourceRequest;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughput;
import com.amazonaws.services.dynamodbv2.model.ProvisionedThroughputDescription;
import com.amazonaws.services.dynamodbv2.model.ResourceInUseException;
import com.amazonaws.services.dynamodbv2.model.ResourceNotFoundException;
import com.amazonaws.services.dynamodbv2.model.SSESpecification;
import com.amazonaws.services.dynamodbv2.model.ScanRequest;
import com.amazonaws.services.dynamodbv2.model.ScanResult;
import com.amazonaws.services.dynamodbv2.model.TableDescription;
import com.amazonaws.services.dynamodbv2.model.Tag;
import com.amazonaws.services.dynamodbv2.model.TagResourceRequest;
import com.amazonaws.waiters.WaiterTimedOutException;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.s3a.AWSClientIOException;
import org.apache.hadoop.fs.s3a.Invoker;
import org.apache.hadoop.fs.s3a.Retries;
import org.apache.hadoop.io.retry.RetryPolicies;
import org.apache.hadoop.io.retry.RetryPolicy;
import static java.lang.String.valueOf;
import static org.apache.commons.lang3.StringUtils.isEmpty;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_CAPACITY_READ_DEFAULT;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_CAPACITY_READ_KEY;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_CAPACITY_WRITE_DEFAULT;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_CAPACITY_WRITE_KEY;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_CREATE_KEY;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_SSE_CMK;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_SSE_ENABLED;
import static org.apache.hadoop.fs.s3a.Constants.S3GUARD_DDB_TABLE_TAG;
import static org.apache.hadoop.fs.s3a.S3AUtils.lookupPassword;
import static org.apache.hadoop.fs.s3a.S3AUtils.translateDynamoDBException;
import static org.apache.hadoop.fs.s3a.S3AUtils.translateException;
import static org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.E_ON_DEMAND_NO_SET_CAPACITY;
import static org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.VERSION;
import static org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.VERSION_MARKER_ITEM_NAME;
import static org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.VERSION_MARKER_TAG_NAME;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.attributeDefinitions;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.createVersionMarker;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.createVersionMarkerPrimaryKey;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.extractCreationTimeFromMarker;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.extractVersionFromMarker;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.keySchema;
/**
* Managing dynamo tables for S3Guard dynamodb based metadatastore.
* Factored out from DynamoDBMetadataStore.
*/
public class DynamoDBMetadataStoreTableManager {
public static final Logger LOG = LoggerFactory.getLogger(
DynamoDBMetadataStoreTableManager.class);
/** Error: version marker not found in table but the table is not empty. */
public static final String E_NO_VERSION_MARKER_AND_NOT_EMPTY
= "S3Guard table lacks version marker, and it is not empty.";
/** Error: version mismatch. */
public static final String E_INCOMPATIBLE_TAG_VERSION
= "Database table is from an incompatible S3Guard version based on table TAG.";
/** Error: version mismatch. */
public static final String E_INCOMPATIBLE_ITEM_VERSION
= "Database table is from an incompatible S3Guard version based on table ITEM.";
/** The AWS managed CMK for DynamoDB server side encryption. */
public static final String SSE_DEFAULT_MASTER_KEY = "alias/aws/dynamodb";
/** Invoker for IO. Until configured properly, use try-once. */
private Invoker invoker = new Invoker(RetryPolicies.TRY_ONCE_THEN_FAIL,
Invoker.NO_OP
);
final private AmazonDynamoDB amazonDynamoDB;
final private DynamoDB dynamoDB;
final private String tableName;
final private String region;
final private Configuration conf;
final private Invoker readOp;
final private RetryPolicy batchWriteRetryPolicy;
private Table table;
private String tableArn;
public DynamoDBMetadataStoreTableManager(DynamoDB dynamoDB,
String tableName,
String region,
AmazonDynamoDB amazonDynamoDB,
Configuration conf,
Invoker readOp,
RetryPolicy batchWriteCapacityExceededEvents) {
this.dynamoDB = dynamoDB;
this.amazonDynamoDB = amazonDynamoDB;
this.tableName = tableName;
this.region = region;
this.conf = conf;
this.readOp = readOp;
this.batchWriteRetryPolicy = batchWriteCapacityExceededEvents;
}
/**
* Create a table if it does not exist and wait for it to become active.
*
* If a table with the intended name already exists, then it uses that table.
* Otherwise, it will automatically create the table if the config
* {@link org.apache.hadoop.fs.s3a.Constants#S3GUARD_DDB_TABLE_CREATE_KEY} is
* enabled. The DynamoDB table creation API is asynchronous. This method wait
* for the table to become active after sending the creation request, so
* overall, this method is synchronous, and the table is guaranteed to exist
* after this method returns successfully.
*
* The wait for a table becoming active is Retry+Translated; it can fail
* while a table is not yet ready.
*
* @throws IOException if table does not exist and auto-creation is disabled;
* or table is being deleted, or any other I/O exception occurred.
*/
@VisibleForTesting
@Retries.RetryTranslated
Table initTable() throws IOException {
table = dynamoDB.getTable(tableName);
try {
try {
LOG.debug("Binding to table {}", tableName);
TableDescription description = table.describe();
LOG.debug("Table state: {}", description);
tableArn = description.getTableArn();
final String status = description.getTableStatus();
switch (status) {
case "CREATING":
LOG.debug("Table {} in region {} is being created/updated. This may"
+ " indicate that the table is being operated by another "
+ "concurrent thread or process. Waiting for active...",
tableName, region);
waitForTableActive(table);
break;
case "DELETING":
throw new FileNotFoundException("DynamoDB table "
+ "'" + tableName + "' is being "
+ "deleted in region " + region);
case "UPDATING":
// table being updated; it can still be used.
LOG.debug("Table is being updated.");
break;
case "ACTIVE":
break;
default:
throw new IOException("Unknown DynamoDB table status " + status
+ ": tableName='" + tableName + "', region=" + region);
}
verifyVersionCompatibility();
final Item versionMarker = getVersionMarkerItem();
Long created = extractCreationTimeFromMarker(versionMarker);
LOG.debug("Using existing DynamoDB table {} in region {} created {}",
tableName, region, (created != null) ? new Date(created) : null);
} catch (ResourceNotFoundException rnfe) {
if (conf.getBoolean(S3GUARD_DDB_TABLE_CREATE_KEY, false)) {
long readCapacity = conf.getLong(S3GUARD_DDB_TABLE_CAPACITY_READ_KEY,
S3GUARD_DDB_TABLE_CAPACITY_READ_DEFAULT);
long writeCapacity = conf.getLong(
S3GUARD_DDB_TABLE_CAPACITY_WRITE_KEY,
S3GUARD_DDB_TABLE_CAPACITY_WRITE_DEFAULT);
ProvisionedThroughput capacity;
if (readCapacity > 0 && writeCapacity > 0) {
capacity = new ProvisionedThroughput(
readCapacity,
writeCapacity);
} else {
// at least one capacity value is <= 0
// verify they are both exactly zero
Preconditions.checkArgument(
readCapacity == 0 && writeCapacity == 0,
"S3Guard table read capacity %d and and write capacity %d"
+ " are inconsistent", readCapacity, writeCapacity);
// and set the capacity to null for per-request billing.
capacity = null;
}
createTable(capacity);
} else {
throw (FileNotFoundException) new FileNotFoundException(
"DynamoDB table '" + tableName + "' does not "
+ "exist in region " + region +
"; auto-creation is turned off")
.initCause(rnfe);
}
}
} catch (AmazonClientException e) {
throw translateException("initTable", tableName, e);
}
return table;
}
protected void tagTableWithVersionMarker() throws AmazonDynamoDBException {
try {
TagResourceRequest tagResourceRequest = new TagResourceRequest()
.withResourceArn(table.getDescription().getTableArn())
.withTags(newVersionMarkerTag());
amazonDynamoDB.tagResource(tagResourceRequest);
} catch (AmazonDynamoDBException e) {
LOG.debug("Exception during tagging table: {}", e.getMessage(), e);
}
}
protected static Item getVersionMarkerFromTags(Table table,
AmazonDynamoDB addb) throws IOException {
List<Tag> tags = null;
try {
final TableDescription description = table.describe();
ListTagsOfResourceRequest listTagsOfResourceRequest =
new ListTagsOfResourceRequest()
.withResourceArn(description.getTableArn());
tags = addb.listTagsOfResource(listTagsOfResourceRequest).getTags();
} catch (ResourceNotFoundException e) {
LOG.error("Table: {} not found.", table.getTableName());
throw e;
} catch (AmazonDynamoDBException e) {
LOG.debug("Exception while getting tags from the dynamo table: {}",
e.getMessage(), e);
throw translateDynamoDBException(table.getTableName(),
"Retrieving tags.", e);
}
if (tags == null) {
return null;
}
final Optional<Tag> first = tags.stream()
.filter(tag -> tag.getKey().equals(VERSION_MARKER_TAG_NAME)).findFirst();
if (first.isPresent()) {
final Tag vmTag = first.get();
return createVersionMarker(
vmTag.getKey(), Integer.parseInt(vmTag.getValue()), 0
);
} else {
return null;
}
}
/**
* Create a table, wait for it to become active, then add the version
* marker.
* Creating an setting up the table isn't wrapped by any retry operations;
* the wait for a table to become available is RetryTranslated.
* The tags are added to the table during creation, not after creation.
* We can assume that tagging and creating the table is a single atomic
* operation.
*
* @param capacity capacity to provision. If null: create a per-request
* table.
* @throws IOException on any failure.
* @throws InterruptedIOException if the wait was interrupted
*/
@Retries.OnceMixed
private void createTable(ProvisionedThroughput capacity) throws IOException {
try {
String mode;
CreateTableRequest request = new CreateTableRequest()
.withTableName(tableName)
.withKeySchema(keySchema())
.withAttributeDefinitions(attributeDefinitions())
.withSSESpecification(getSseSpecFromConfig())
.withTags(getTableTagsFromConfig());
if (capacity != null) {
mode = String.format("with provisioned read capacity %d and"
+ " write capacity %s",
capacity.getReadCapacityUnits(), capacity.getWriteCapacityUnits());
request.withProvisionedThroughput(capacity);
} else {
mode = "with pay-per-request billing";
request.withBillingMode(BillingMode.PAY_PER_REQUEST);
}
LOG.info("Creating non-existent DynamoDB table {} in region {} {}",
tableName, region, mode);
table = dynamoDB.createTable(request);
LOG.debug("Awaiting table becoming active");
} catch (ResourceInUseException e) {
LOG.warn("ResourceInUseException while creating DynamoDB table {} "
+ "in region {}. This may indicate that the table was "
+ "created by another concurrent thread or process.",
tableName, region);
}
waitForTableActive(table);
putVersionMarkerItemToTable();
}
/**
* Get DynamoDB table server side encryption (SSE) settings from configuration.
*/
private SSESpecification getSseSpecFromConfig() {
final SSESpecification sseSpecification = new SSESpecification();
boolean enabled = conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false);
if (!enabled) {
// Do not set other options if SSE is disabled. Otherwise it will throw
// ValidationException.
return sseSpecification;
}
sseSpecification.setEnabled(Boolean.TRUE);
String cmk = null;
try {
// Get DynamoDB table SSE CMK from a configuration/credential provider.
cmk = lookupPassword("", conf, S3GUARD_DDB_TABLE_SSE_CMK);
} catch (IOException e) {
LOG.error("Cannot retrieve " + S3GUARD_DDB_TABLE_SSE_CMK, e);
}
if (isEmpty(cmk)) {
// Using Amazon managed default master key for DynamoDB table
return sseSpecification;
}
if (SSE_DEFAULT_MASTER_KEY.equals(cmk)) {
LOG.warn("Ignoring default DynamoDB table KMS Master Key {}",
SSE_DEFAULT_MASTER_KEY);
} else {
sseSpecification.setSSEType("KMS");
sseSpecification.setKMSMasterKeyId(cmk);
}
return sseSpecification;
}
/**
* Return tags from configuration and the version marker for adding to
* dynamo table during creation.
*/
@Retries.OnceRaw
public List<Tag> getTableTagsFromConfig() {
List<Tag> tags = new ArrayList<>();
// from configuration
Map<String, String> tagProperties =
conf.getPropsWithPrefix(S3GUARD_DDB_TABLE_TAG);
for (Map.Entry<String, String> tagMapEntry : tagProperties.entrySet()) {
Tag tag = new Tag().withKey(tagMapEntry.getKey())
.withValue(tagMapEntry.getValue());
tags.add(tag);
}
// add the version marker
tags.add(newVersionMarkerTag());
return tags;
}
/**
* Create a new version marker tag.
* @return a new version marker tag
*/
private static Tag newVersionMarkerTag() {
return new Tag().withKey(VERSION_MARKER_TAG_NAME).withValue(valueOf(VERSION));
}
/**
* Verify that a table version is compatible with this S3Guard client.
*
* Checks for consistency between the version marker as the item and tag.
*
* <pre>
* 1. If the table lacks both version markers AND it's empty,
* both markers will be added.
* If the table is not empty the check throws IOException
* 2. If there's no version marker ITEM, the compatibility with the TAG
* will be checked, and the version marker ITEM will be added if the
* TAG version is compatible.
* If the TAG version is not compatible, the check throws OException
* 3. If there's no version marker TAG, the compatibility with the ITEM
* version marker will be checked, and the version marker ITEM will be
* added if the ITEM version is compatible.
* If the ITEM version is not compatible, the check throws IOException
* 4. If the TAG and ITEM versions are both present then both will be checked
* for compatibility. If the ITEM or TAG version marker is not compatible,
* the check throws IOException
* </pre>
*
* @throws IOException on any incompatibility
*/
@VisibleForTesting
protected void verifyVersionCompatibility() throws IOException {
final Item versionMarkerItem = getVersionMarkerItem();
Item versionMarkerFromTag = null;
boolean canReadDdbTags = true;
try {
versionMarkerFromTag = getVersionMarkerFromTags(table, amazonDynamoDB);
} catch (AccessDeniedException e) {
LOG.debug("Can not read tags of table.");
canReadDdbTags = false;
}
LOG.debug("versionMarkerItem: {}; versionMarkerFromTag: {}",
versionMarkerItem, versionMarkerFromTag);
if (versionMarkerItem == null && versionMarkerFromTag == null) {
if (!isEmptyTable(tableName, amazonDynamoDB)) {
LOG.error("Table is not empty but missing the version maker. Failing.");
throw new IOException(E_NO_VERSION_MARKER_AND_NOT_EMPTY
+ " Table: " + tableName);
}
if (canReadDdbTags) {
LOG.info("Table {} contains no version marker item and tag. " +
"The table is empty, so the version marker will be added " +
"as TAG and ITEM.", tableName);
putVersionMarkerItemToTable();
tagTableWithVersionMarker();
}
if (!canReadDdbTags) {
LOG.info("Table {} contains no version marker item and the tags are not readable. " +
"The table is empty, so the ITEM version marker will be added .", tableName);
putVersionMarkerItemToTable();
}
}
if (versionMarkerItem == null && versionMarkerFromTag != null) {
final int tagVersionMarker =
extractVersionFromMarker(versionMarkerFromTag);
throwExceptionOnVersionMismatch(tagVersionMarker, tableName,
E_INCOMPATIBLE_TAG_VERSION);
LOG.info("Table {} contains no version marker ITEM but contains " +
"compatible version marker TAG. Restoring the version marker " +
"item from tag.", tableName);
putVersionMarkerItemToTable();
}
if (versionMarkerItem != null && versionMarkerFromTag == null
&& canReadDdbTags) {
final int itemVersionMarker =
extractVersionFromMarker(versionMarkerItem);
throwExceptionOnVersionMismatch(itemVersionMarker, tableName,
E_INCOMPATIBLE_ITEM_VERSION);
LOG.info("Table {} contains no version marker TAG but contains " +
"compatible version marker ITEM. Restoring the version marker " +
"item from item.", tableName);
tagTableWithVersionMarker();
}
if (versionMarkerItem != null && versionMarkerFromTag != null) {
final int tagVersionMarker =
extractVersionFromMarker(versionMarkerFromTag);
final int itemVersionMarker =
extractVersionFromMarker(versionMarkerItem);
throwExceptionOnVersionMismatch(tagVersionMarker, tableName,
E_INCOMPATIBLE_TAG_VERSION);
throwExceptionOnVersionMismatch(itemVersionMarker, tableName,
E_INCOMPATIBLE_ITEM_VERSION);
LOG.debug("Table {} contains correct version marker TAG and ITEM.",
tableName);
}
}
private static boolean isEmptyTable(String tableName, AmazonDynamoDB aadb) {
final ScanRequest req = new ScanRequest().withTableName(
tableName).withLimit(1);
final ScanResult result = aadb.scan(req);
return result.getCount() == 0;
}
private static void throwExceptionOnVersionMismatch(int actual,
String tableName,
String exMsg) throws IOException {
if (VERSION != actual) {
throw new IOException(exMsg + " Table " + tableName
+ " Expected version: " + VERSION + " actual tag version: " +
actual);
}
}
/**
* Add version marker to the dynamo table.
*/
@Retries.OnceRaw
private void putVersionMarkerItemToTable() {
final Item marker = createVersionMarker(VERSION_MARKER_ITEM_NAME, VERSION,
System.currentTimeMillis());
putItem(marker);
}
/**
* Wait for table being active.
* @param t table to block on.
* @throws IOException IO problems
* @throws InterruptedIOException if the wait was interrupted
* @throws IllegalArgumentException if an exception was raised in the waiter
*/
@Retries.RetryTranslated
private void waitForTableActive(Table t) throws IOException {
invoker.retry("Waiting for active state of table " + tableName,
null,
true,
() -> {
try {
t.waitForActive();
} catch (IllegalArgumentException ex) {
throw translateTableWaitFailure(tableName, ex);
} catch (InterruptedException e) {
LOG.warn("Interrupted while waiting for table {} in region {}"
+ " active",
tableName, region, e);
Thread.currentThread().interrupt();
throw (InterruptedIOException)
new InterruptedIOException("DynamoDB table '"
+ tableName + "' is not active yet in region " + region)
.initCause(e);
}
});
}
/**
* Handle a table wait failure by extracting any inner cause and
* converting it, or, if unconvertable by wrapping
* the IllegalArgumentException in an IOE.
*
* @param name name of the table
* @param e exception
* @return an IOE to raise.
*/
@VisibleForTesting
static IOException translateTableWaitFailure(
final String name, IllegalArgumentException e) {
final SdkBaseException ex = extractInnerException(e);
if (ex != null) {
if (ex instanceof WaiterTimedOutException) {
// a timeout waiting for state change: extract the
// message from the outer exception, but translate
// the inner one for the throttle policy.
return new AWSClientIOException(e.getMessage(), ex);
} else {
return translateException(e.getMessage(), name, ex);
}
} else {
return new IOException(e);
}
}
/**
* Take an {@code IllegalArgumentException} raised by a DDB operation
* and if it contains an inner SDK exception, unwrap it.
* @param ex exception.
* @return the inner AWS exception or null.
*/
public static SdkBaseException extractInnerException(
IllegalArgumentException ex) {
if (ex.getCause() instanceof SdkBaseException) {
return (SdkBaseException) ex.getCause();
} else {
return null;
}
}
/**
* Get the version mark item in the existing DynamoDB table.
*
* As the version marker item may be created by another concurrent thread or
* process, we sleep and retry a limited number times if the lookup returns
* with a null value.
* DDB throttling is always retried.
*/
@VisibleForTesting
@Retries.RetryTranslated
protected Item getVersionMarkerItem() throws IOException {
final PrimaryKey versionMarkerKey =
createVersionMarkerPrimaryKey(VERSION_MARKER_ITEM_NAME);
int retryCount = 0;
// look for a version marker, with usual throttling/failure retries.
Item versionMarker = queryVersionMarker(versionMarkerKey);
while (versionMarker == null) {
// The marker was null.
// Two possibilities
// 1. This isn't a S3Guard table.
// 2. This is a S3Guard table in construction; another thread/process
// is about to write/actively writing the version marker.
// So that state #2 is handled, batchWriteRetryPolicy is used to manage
// retries.
// This will mean that if the cause is actually #1, failure will not
// be immediate. As this will ultimately result in a failure to
// init S3Guard and the S3A FS, this isn't going to be a performance
// bottleneck -simply a slightly slower failure report than would otherwise
// be seen.
// "if your settings are broken, performance is not your main issue"
try {
RetryPolicy.RetryAction action = batchWriteRetryPolicy.shouldRetry(null,
retryCount, 0, true);
if (action.action == RetryPolicy.RetryAction.RetryDecision.FAIL) {
break;
} else {
LOG.warn("No version marker found in the DynamoDB table: {}. " +
"Sleeping {} ms before next retry", tableName, action.delayMillis);
Thread.sleep(action.delayMillis);
}
} catch (Exception e) {
throw new IOException("initTable: Unexpected exception " + e, e);
}
retryCount++;
versionMarker = queryVersionMarker(versionMarkerKey);
}
return versionMarker;
}
/**
* Issue the query to get the version marker, with throttling for overloaded
* DDB tables.
* @param versionMarkerKey key to look up
* @return the marker
* @throws IOException failure
*/
@Retries.RetryTranslated
private Item queryVersionMarker(final PrimaryKey versionMarkerKey)
throws IOException {
return readOp.retry("getVersionMarkerItem",
VERSION_MARKER_ITEM_NAME, true,
() -> table.getItem(versionMarkerKey));
}
/**
* PUT a single item to the table.
* @param item item to put
* @return the outcome.
*/
@Retries.OnceRaw
private PutItemOutcome putItem(Item item) {
LOG.debug("Putting item {}", item);
return table.putItem(item);
}
/**
* Provision the table with given read and write capacity units.
* Call will fail if the table is busy, or the new values match the current
* ones.
* <p>
* Until the AWS SDK lets us switch a table to on-demand, an attempt to
* set the I/O capacity to zero will fail.
* @param readCapacity read units: must be greater than zero
* @param writeCapacity write units: must be greater than zero
* @throws IOException on a failure
*/
@Retries.RetryTranslated
void provisionTable(Long readCapacity, Long writeCapacity)
throws IOException {
if (readCapacity == 0 || writeCapacity == 0) {
// table is pay on demand
throw new IOException(E_ON_DEMAND_NO_SET_CAPACITY);
}
final ProvisionedThroughput toProvision = new ProvisionedThroughput()
.withReadCapacityUnits(readCapacity)
.withWriteCapacityUnits(writeCapacity);
invoker.retry("ProvisionTable", tableName, true,
() -> {
final ProvisionedThroughputDescription p =
table.updateTable(toProvision).getProvisionedThroughput();
LOG.info("Provision table {} in region {}: readCapacityUnits={}, "
+ "writeCapacityUnits={}",
tableName, region, p.getReadCapacityUnits(),
p.getWriteCapacityUnits());
});
}
@Retries.RetryTranslated
public void destroy() throws IOException {
if (table == null) {
LOG.info("In destroy(): no table to delete");
return;
}
LOG.info("Deleting DynamoDB table {} in region {}", tableName, region);
Preconditions.checkNotNull(dynamoDB, "Not connected to DynamoDB");
try {
invoker.retry("delete", null, true,
() -> table.delete());
table.waitForDelete();
} catch (IllegalArgumentException ex) {
throw new TableDeleteTimeoutException(tableName,
"Timeout waiting for the table " + getTableArn()
+ " to be deleted", ex);
} catch (FileNotFoundException rnfe) {
LOG.info("FileNotFoundException while deleting DynamoDB table {} in "
+ "region {}. This may indicate that the table does not exist, "
+ "or has been deleted by another concurrent thread or process.",
tableName, region);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
LOG.warn("Interrupted while waiting for DynamoDB table {} being deleted",
tableName, ie);
throw new InterruptedIOException("Table " + tableName
+ " in region " + region + " has not been deleted");
}
}
@Retries.RetryTranslated
@VisibleForTesting
void provisionTableBlocking(Long readCapacity, Long writeCapacity)
throws IOException {
provisionTable(readCapacity, writeCapacity);
waitForTableActive(table);
}
public Table getTable() {
return table;
}
public String getTableArn() {
return tableArn;
}
}

View File

@ -1,39 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
/**
* Expirable Metadata abstract class is for storing the field needed for
* metadata classes in S3Guard that could be expired with TTL.
*/
public abstract class ExpirableMetadata {
private long lastUpdated = 0;
public long getLastUpdated() {
return lastUpdated;
}
public void setLastUpdated(long lastUpdated) {
this.lastUpdated = lastUpdated;
}
public boolean isExpired(long ttl, long now) {
return (lastUpdated + ttl) <= now;
}
}

View File

@ -1,47 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
/**
* This interface is defined for handling TTL expiry of metadata in S3Guard.
*
* TTL can be tested by implementing this interface and setting is as
* {@code S3Guard.ttlTimeProvider}. By doing this, getNow() can return any
* value preferred and flaky tests could be avoided. By default getNow()
* returns the EPOCH in runtime.
*
* Time is measured in milliseconds,
*/
public interface ITtlTimeProvider {
/**
* The current time in milliseconds.
* Assuming this calls System.currentTimeMillis(), this is a native iO call
* and so should be invoked sparingly (i.e. evaluate before any loop, rather
* than inside).
* @return the current time.
*/
long getNow();
/**
* The TTL of the metadata.
* @return time in millis after which metadata is considered out of date.
*/
long getMetadataTtl();
}

View File

@ -1,272 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import java.io.IOException;
import java.util.HashSet;
import java.util.Set;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.fs.s3a.S3ALocatedFileStatus;
import org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation;
import org.apache.hadoop.util.DurationInfo;
/**
* Import a directory tree into the metastore.
* This code was moved from S3GuardTool and enhanced to mark
* the destination tree as authoritative.
*/
class ImportOperation extends ExecutingStoreOperation<Long> {
private static final Logger LOG = LoggerFactory.getLogger(
ImportOperation.class);
/**
* Source file system: must not be guarded.
*/
private final S3AFileSystem filesystem;
/**
* Destination metadata store.
*/
private final MetadataStore store;
/**
* Source entry: File or directory.
*/
private final S3AFileStatus status;
/**
* If importing the directory tree -should it be marked
* authoritative afterwards?
*/
private final boolean authoritative;
private final boolean verbose;
/**
* For DDB the BulkOperation tracking eliminates the need for this cache,
* but it is retained here for the local store and to allow for
* ease of moving to operations which may update the store in parallel with
* writing.
*/
private final Set<Path> dirCache = new HashSet<>();
/**
* Import.
* @param filesystem Unguarded FS to scan.
* @param store store to update
* @param status source status
* @param authoritative should the imported tree be marked as authoritative
* @param verbose Verbose output
*/
ImportOperation(final S3AFileSystem filesystem,
final MetadataStore store,
final S3AFileStatus status,
final boolean authoritative,
final boolean verbose) {
super(filesystem.createStoreContext());
this.verbose = verbose;
Preconditions.checkState(!filesystem.hasMetadataStore(),
"Source filesystem for import has a metadata store");
this.filesystem = filesystem;
this.store = store;
this.status = status;
this.authoritative = authoritative;
}
private S3AFileSystem getFilesystem() {
return filesystem;
}
private MetadataStore getStore() {
return store;
}
private FileStatus getStatus() {
return status;
}
@Override
public Long execute() throws IOException {
final long items;
if (status.isFile()) {
PathMetadata meta = new PathMetadata(status);
getStore().put(meta, null);
items = 1;
} else {
try (DurationInfo ignored =
new DurationInfo(LOG, "Importing %s", getStatus().getPath())) {
items = importDir();
}
}
return items;
}
/**
* Recursively import every path under path.
* @return number of items inserted into MetadataStore
* @throws IOException on I/O errors.
*/
private long importDir() throws IOException {
Preconditions.checkArgument(status.isDirectory());
long totalCountOfEntriesWritten = 0;
final Path basePath = status.getPath();
final MetadataStore ms = getStore();
LOG.info("Importing directory {}", basePath);
try (BulkOperationState operationState = ms
.initiateBulkWrite(
BulkOperationState.OperationType.Import,
basePath)) {
long countOfFilesWritten = 0;
long countOfDirsWritten = 0;
RemoteIterator<S3ALocatedFileStatus> it = getFilesystem()
.listFilesAndEmptyDirectoriesForceNonAuth(basePath, true);
while (it.hasNext()) {
S3ALocatedFileStatus located = it.next();
S3AFileStatus child;
final Path path = located.getPath();
final boolean isDirectory = located.isDirectory();
if (isDirectory) {
child = DynamoDBMetadataStore.makeDirStatus(path,
located.getOwner());
dirCache.add(path);
// and update the dir count
countOfDirsWritten++;
} else {
child = located.toS3AFileStatus();
}
int parentsWritten = putParentsIfNotPresent(child, operationState);
LOG.debug("Wrote {} parent entries", parentsWritten);
// We don't blindly overwrite any existing file entry in S3Guard with a
// new one, Because that may lose the version information.
// instead we merge them
if (!isDirectory) {
final PathMetadata existingEntry = S3Guard.getWithTtl(ms, path, null,
false, true);
if (existingEntry != null) {
final S3AFileStatus existingStatus = existingEntry.getFileStatus();
if (existingStatus.isFile()) {
// source is also a file.
// we only worry about an update if the timestamp is different,
final String existingEtag = existingStatus.getETag();
final String childEtag = child.getETag();
if (child.getModificationTime()
!= existingStatus.getModificationTime()
|| existingStatus.getLen() != child.getLen()
|| existingEtag == null
|| !existingEtag.equals(childEtag)) {
// files are potentially different, though a modtime change
// can just be a clock skew problem
// so if the etag is unchanged, we propagate any versionID
if (childEtag.equals(existingEtag)) {
// copy over any version ID.
child.setVersionId(existingStatus.getVersionId());
}
} else {
// the entry modtimes match
child = null;
}
}
}
if (child != null) {
countOfFilesWritten++;
}
}
if (child != null) {
// there's an entry to add.
// log entry spaced to same width
String t = isDirectory ? "Dir " : "File";
if (verbose) {
LOG.info("{} {}", t, path);
} else {
LOG.debug("{} {}", t, path);
}
S3Guard.putWithTtl(
ms,
new PathMetadata(child),
getFilesystem().getTtlTimeProvider(),
operationState);
totalCountOfEntriesWritten++;
}
}
LOG.info("Updated S3Guard with {} files and {} directory entries",
countOfFilesWritten, countOfDirsWritten);
// here all entries are imported.
// tell the store that everything should be marked as auth
if (authoritative) {
LOG.info("Marking directory tree {} as authoritative",
basePath);
ms.markAsAuthoritative(basePath, operationState);
}
}
return totalCountOfEntriesWritten;
}
/**
* Put parents into metastore and cache if the parents are not present.
*
* There's duplication here with S3Guard DDB ancestor state, but this
* is designed to work across implementations.
* @param fileStatus the file or an empty directory.
* @param operationState store's bulk update state.
* @return number of entries written.
* @throws IOException on I/O errors.
*/
private int putParentsIfNotPresent(FileStatus fileStatus,
@Nullable BulkOperationState operationState) throws IOException {
Preconditions.checkNotNull(fileStatus);
Path parent = fileStatus.getPath().getParent();
int count = 0;
while (parent != null) {
if (dirCache.contains(parent)) {
return count;
}
final ITtlTimeProvider timeProvider
= getFilesystem().getTtlTimeProvider();
final PathMetadata pmd = S3Guard.getWithTtl(getStore(), parent,
timeProvider, false, true);
if (pmd == null || pmd.isDeleted()) {
S3AFileStatus dir = DynamoDBMetadataStore.makeDirStatus(parent,
fileStatus.getOwner());
S3Guard.putWithTtl(getStore(), new PathMetadata(dir),
timeProvider,
operationState);
count++;
}
dirCache.add(parent);
parent = parent.getParent();
}
return count;
}
}

View File

@ -1,84 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
/**
* LocalMetadataEntry is used to store entries in the cache of
* LocalMetadataStore. PathMetadata or dirListingMetadata can be null. The
* entry is not immutable.
*/
public final class LocalMetadataEntry {
@Nullable
private PathMetadata pathMetadata;
@Nullable
private DirListingMetadata dirListingMetadata;
LocalMetadataEntry() {
}
LocalMetadataEntry(PathMetadata pmd){
pathMetadata = pmd;
dirListingMetadata = null;
}
LocalMetadataEntry(DirListingMetadata dlm){
pathMetadata = null;
dirListingMetadata = dlm;
}
public PathMetadata getFileMeta() {
return pathMetadata;
}
public DirListingMetadata getDirListingMeta() {
return dirListingMetadata;
}
public boolean hasPathMeta() {
return this.pathMetadata != null;
}
public boolean hasDirMeta() {
return this.dirListingMetadata != null;
}
public void setPathMetadata(PathMetadata pathMetadata) {
this.pathMetadata = pathMetadata;
}
public void setDirListingMetadata(DirListingMetadata dirListingMetadata) {
this.dirListingMetadata = dirListingMetadata;
}
@Override public String toString() {
StringBuilder sb = new StringBuilder();
sb.append("LocalMetadataEntry{");
if(pathMetadata != null) {
sb.append("pathMetadata=" + pathMetadata.getFileStatus().getPath());
}
if(dirListingMetadata != null){
sb.append("; dirListingMetadata=" + dirListingMetadata.getPath());
}
sb.append("}");
return sb.toString();
}
}

View File

@ -1,651 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.util.Preconditions;
import org.apache.hadoop.thirdparty.com.google.common.cache.Cache;
import org.apache.hadoop.thirdparty.com.google.common.cache.CacheBuilder;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.security.UserGroupInformation;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import static org.apache.hadoop.fs.s3a.Constants.*;
/**
* This is a local, in-memory implementation of MetadataStore.
* This is <i>not</i> a coherent cache across processes. It is only
* locally-coherent.
*
* The purpose of this is for unit and integration testing.
* It could also be used to accelerate local-only operations where only one
* process is operating on a given object store, or multiple processes are
* accessing a read-only storage bucket.
*
* This MetadataStore does not enforce filesystem rules such as disallowing
* non-recursive removal of non-empty directories. It is assumed the caller
* already has to perform these sorts of checks.
*
* Contains one cache internally with time based eviction.
*/
public class LocalMetadataStore implements MetadataStore {
public static final Logger LOG = LoggerFactory.getLogger(MetadataStore.class);
/** Contains directory and file listings. */
private Cache<Path, LocalMetadataEntry> localCache;
private FileSystem fs;
/* Null iff this FS does not have an associated URI host. */
private String uriHost;
private String username;
private ITtlTimeProvider ttlTimeProvider;
@Override
public void initialize(FileSystem fileSystem,
ITtlTimeProvider ttlTp) throws IOException {
Preconditions.checkNotNull(fileSystem);
fs = fileSystem;
URI fsURI = fs.getUri();
uriHost = fsURI.getHost();
if (uriHost != null && uriHost.equals("")) {
uriHost = null;
}
initialize(fs.getConf(), ttlTp);
}
@Override
public void initialize(Configuration conf, ITtlTimeProvider ttlTp)
throws IOException {
Preconditions.checkNotNull(conf);
int maxRecords = conf.getInt(S3GUARD_METASTORE_LOCAL_MAX_RECORDS,
DEFAULT_S3GUARD_METASTORE_LOCAL_MAX_RECORDS);
if (maxRecords < 4) {
maxRecords = 4;
}
int ttl = conf.getInt(S3GUARD_METASTORE_LOCAL_ENTRY_TTL,
DEFAULT_S3GUARD_METASTORE_LOCAL_ENTRY_TTL);
CacheBuilder builder = CacheBuilder.newBuilder().maximumSize(maxRecords);
if (ttl >= 0) {
builder.expireAfterAccess(ttl, TimeUnit.MILLISECONDS);
}
localCache = builder.build();
username = UserGroupInformation.getCurrentUser().getShortUserName();
this.ttlTimeProvider = ttlTp;
}
@Override
public String toString() {
final StringBuilder sb = new StringBuilder(
"LocalMetadataStore{");
sb.append("uriHost='").append(uriHost).append('\'');
sb.append('}');
return sb.toString();
}
@Override
public void delete(Path p,
final BulkOperationState operationState)
throws IOException {
doDelete(p, false, true);
}
@Override
public void forgetMetadata(Path p) throws IOException {
doDelete(p, false, false);
}
@Override
public void deleteSubtree(Path path,
final BulkOperationState operationState)
throws IOException {
doDelete(path, true, true);
}
private synchronized void doDelete(Path p, boolean recursive,
boolean tombstone) {
Path path = standardize(p);
// Delete entry from file cache, then from cached parent directory, if any
deleteCacheEntries(path, tombstone);
if (recursive) {
// Remove all entries that have this dir as path prefix.
deleteEntryByAncestor(path, localCache, tombstone, ttlTimeProvider);
}
}
@Override
public void deletePaths(final Collection<Path> paths,
@Nullable final BulkOperationState operationState) throws IOException {
for (Path path : paths) {
doDelete(path, false, true);
}
}
@Override
public synchronized PathMetadata get(Path p) throws IOException {
return get(p, false);
}
@Override
public PathMetadata get(Path p, boolean wantEmptyDirectoryFlag)
throws IOException {
Path path = standardize(p);
synchronized (this) {
PathMetadata m = getFileMeta(path);
if (wantEmptyDirectoryFlag && m != null &&
m.getFileStatus().isDirectory()) {
m.setIsEmptyDirectory(isEmptyDirectory(p));
}
if (LOG.isDebugEnabled()) {
LOG.debug("get({}) -> {}", path, m == null ? "null" : m.prettyPrint());
}
return m;
}
}
/**
* Determine if directory is empty.
* Call with lock held.
* @param p a Path, already filtered through standardize()
* @return TRUE / FALSE if known empty / not-empty, UNKNOWN otherwise.
*/
private Tristate isEmptyDirectory(Path p) {
DirListingMetadata dlm = getDirListingMeta(p);
return dlm.withoutTombstones().isEmpty();
}
@Override
public synchronized DirListingMetadata listChildren(Path p) throws
IOException {
Path path = standardize(p);
DirListingMetadata listing = getDirListingMeta(path);
if (LOG.isDebugEnabled()) {
LOG.debug("listChildren({}) -> {}", path,
listing == null ? "null" : listing.prettyPrint());
}
if (listing != null) {
// Make a copy so callers can mutate without affecting our state
return new DirListingMetadata(listing);
}
return null;
}
@Override
public void move(@Nullable Collection<Path> pathsToDelete,
@Nullable Collection<PathMetadata> pathsToCreate,
@Nullable final BulkOperationState operationState) throws IOException {
LOG.info("Move {} to {}", pathsToDelete, pathsToCreate);
if (pathsToCreate == null) {
pathsToCreate = Collections.emptyList();
}
if (pathsToDelete == null) {
pathsToDelete = Collections.emptyList();
}
// I feel dirty for using reentrant lock. :-|
synchronized (this) {
// 1. Delete pathsToDelete
for (Path meta : pathsToDelete) {
LOG.debug("move: deleting metadata {}", meta);
delete(meta, null);
}
// 2. Create new destination path metadata
for (PathMetadata meta : pathsToCreate) {
LOG.debug("move: adding metadata {}", meta);
put(meta, null);
}
// 3. We now know full contents of all dirs in destination subtree
for (PathMetadata meta : pathsToCreate) {
FileStatus status = meta.getFileStatus();
if (status == null || status.isDirectory()) {
continue;
}
DirListingMetadata dir = listChildren(status.getPath());
if (dir != null) { // could be evicted already
dir.setAuthoritative(true);
}
}
}
}
@Override
public void put(final PathMetadata meta) throws IOException {
put(meta, null);
}
@Override
public void put(PathMetadata meta,
final BulkOperationState operationState) throws IOException {
Preconditions.checkNotNull(meta);
S3AFileStatus status = meta.getFileStatus();
Path path = standardize(status.getPath());
synchronized (this) {
/* Add entry for this file. */
if (LOG.isDebugEnabled()) {
LOG.debug("put {} -> {}", path, meta.prettyPrint());
}
LocalMetadataEntry entry = localCache.getIfPresent(path);
if(entry == null){
entry = new LocalMetadataEntry(meta);
} else {
entry.setPathMetadata(meta);
}
/* Directory case:
* We also make sure we have an entry in the dirCache, so subsequent
* listStatus(path) at least see the directory.
*
* If we had a boolean flag argument "isNew", we would know whether this
* is an existing directory the client discovered via getFileStatus(),
* or if it is a newly-created directory. In the latter case, we would
* be able to mark the directory as authoritative (fully-cached),
* saving round trips to underlying store for subsequent listStatus()
*/
// only create DirListingMetadata if the entry does not have one
if (status.isDirectory() && !entry.hasDirMeta()) {
DirListingMetadata dlm =
new DirListingMetadata(path, DirListingMetadata.EMPTY_DIR, false);
entry.setDirListingMetadata(dlm);
}
localCache.put(path, entry);
/* Update cached parent dir. */
Path parentPath = path.getParent();
if (parentPath != null) {
LocalMetadataEntry parentMeta = localCache.getIfPresent(parentPath);
// Create empty parent LocalMetadataEntry if it doesn't exist
if (parentMeta == null){
parentMeta = new LocalMetadataEntry();
localCache.put(parentPath, parentMeta);
}
// If there is no directory metadata on the parent entry, create
// an empty one
if (!parentMeta.hasDirMeta()) {
DirListingMetadata parentDirMeta =
new DirListingMetadata(parentPath, DirListingMetadata.EMPTY_DIR,
false);
parentDirMeta.setLastUpdated(meta.getLastUpdated());
parentMeta.setDirListingMetadata(parentDirMeta);
}
// Add the child pathMetadata to the listing
parentMeta.getDirListingMeta().put(meta);
// Mark the listing entry as deleted if the meta is set to deleted
if(meta.isDeleted()) {
parentMeta.getDirListingMeta().markDeleted(path,
ttlTimeProvider.getNow());
}
}
}
}
@Override
public synchronized void put(DirListingMetadata meta,
final List<Path> unchangedEntries,
final BulkOperationState operationState) throws IOException {
if (LOG.isDebugEnabled()) {
LOG.debug("put dirMeta {}", meta.prettyPrint());
}
LocalMetadataEntry entry =
localCache.getIfPresent(standardize(meta.getPath()));
if (entry == null) {
localCache.put(standardize(meta.getPath()), new LocalMetadataEntry(meta));
} else {
entry.setDirListingMetadata(meta);
}
put(meta.getListing(), null);
}
public synchronized void put(Collection<? extends PathMetadata> metas,
final BulkOperationState operationState) throws
IOException {
for (PathMetadata meta : metas) {
put(meta, operationState);
}
}
@Override
public void close() throws IOException {
}
@Override
public void destroy() throws IOException {
if (localCache != null) {
localCache.invalidateAll();
}
}
@Override
public void prune(PruneMode pruneMode, long cutoff) throws IOException{
prune(pruneMode, cutoff, "");
}
@Override
public synchronized long prune(PruneMode pruneMode, long cutoff,
String keyPrefix) {
// prune files
AtomicLong count = new AtomicLong();
// filter path_metadata (files), filter expired, remove expired
localCache.asMap().entrySet().stream()
.filter(entry -> entry.getValue().hasPathMeta())
.filter(entry -> expired(pruneMode,
entry.getValue().getFileMeta(), cutoff, keyPrefix))
.forEach(entry -> {
localCache.invalidate(entry.getKey());
count.incrementAndGet();
});
// prune dirs
// filter DIR_LISTING_METADATA, remove expired, remove authoritative bit
localCache.asMap().entrySet().stream()
.filter(entry -> entry.getValue().hasDirMeta())
.forEach(entry -> {
Path path = entry.getKey();
DirListingMetadata metadata = entry.getValue().getDirListingMeta();
Collection<PathMetadata> oldChildren = metadata.getListing();
Collection<PathMetadata> newChildren = new LinkedList<>();
for (PathMetadata child : oldChildren) {
if (!expired(pruneMode, child, cutoff, keyPrefix)) {
newChildren.add(child);
} else {
count.incrementAndGet();
}
}
removeAuthoritativeFromParent(path, oldChildren, newChildren);
});
return count.get();
}
private void removeAuthoritativeFromParent(Path path,
Collection<PathMetadata> oldChildren,
Collection<PathMetadata> newChildren) {
if (newChildren.size() != oldChildren.size()) {
DirListingMetadata dlm =
new DirListingMetadata(path, newChildren, false);
localCache.put(path, new LocalMetadataEntry(dlm));
if (!path.isRoot()) {
DirListingMetadata parent = getDirListingMeta(path.getParent());
if (parent != null) {
parent.setAuthoritative(false);
}
}
}
}
private boolean expired(PruneMode pruneMode, PathMetadata metadata,
long cutoff, String keyPrefix) {
final S3AFileStatus status = metadata.getFileStatus();
final URI statusUri = status.getPath().toUri();
// remove the protocol from path string to be able to compare
String bucket = statusUri.getHost();
String statusTranslatedPath = "";
if(bucket != null && !bucket.isEmpty()){
// if there's a bucket, (well defined host in Uri) the pathToParentKey
// can be used to get the path from the status
statusTranslatedPath =
PathMetadataDynamoDBTranslation.pathToParentKey(status.getPath());
} else {
// if there's no bucket in the path the pathToParentKey will fail, so
// this is the fallback to get the path from status
statusTranslatedPath = statusUri.getPath();
}
boolean expired;
switch (pruneMode) {
case ALL_BY_MODTIME:
// Note: S3 doesn't track modification time on directories, so for
// consistency with the DynamoDB implementation we ignore that here
expired = status.getModificationTime() < cutoff && !status.isDirectory()
&& statusTranslatedPath.startsWith(keyPrefix);
break;
case TOMBSTONES_BY_LASTUPDATED:
expired = metadata.getLastUpdated() < cutoff && metadata.isDeleted()
&& statusTranslatedPath.startsWith(keyPrefix);
break;
default:
throw new UnsupportedOperationException("Unsupported prune mode: "
+ pruneMode);
}
return expired;
}
@VisibleForTesting
static void deleteEntryByAncestor(Path ancestor,
Cache<Path, LocalMetadataEntry> cache, boolean tombstone,
ITtlTimeProvider ttlTimeProvider) {
cache.asMap().entrySet().stream()
.filter(entry -> isAncestorOf(ancestor, entry.getKey()))
.forEach(entry -> {
LocalMetadataEntry meta = entry.getValue();
Path path = entry.getKey();
if(meta.hasDirMeta()){
cache.invalidate(path);
} else if(tombstone && meta.hasPathMeta()){
final PathMetadata pmTombstone = PathMetadata.tombstone(path,
ttlTimeProvider.getNow());
meta.setPathMetadata(pmTombstone);
} else {
cache.invalidate(path);
}
});
}
/**
* @return true if 'ancestor' is ancestor dir in path 'f'.
* All paths here are absolute. Dir does not count as its own ancestor.
*/
private static boolean isAncestorOf(Path ancestor, Path f) {
String aStr = ancestor.toString();
if (!ancestor.isRoot()) {
aStr += "/";
}
String fStr = f.toString();
return (fStr.startsWith(aStr));
}
/**
* Update fileCache and dirCache to reflect deletion of file 'f'. Call with
* lock held.
*/
private void deleteCacheEntries(Path path, boolean tombstone) {
LocalMetadataEntry entry = localCache.getIfPresent(path);
// If there's no entry, delete should silently succeed
// (based on MetadataStoreTestBase#testDeleteNonExisting)
if(entry == null){
LOG.warn("Delete: path {} is missing from cache.", path);
return;
}
// Remove target file entry
LOG.debug("delete file entry for {}", path);
if(entry.hasPathMeta()){
if (tombstone) {
PathMetadata pmd = PathMetadata.tombstone(path,
ttlTimeProvider.getNow());
entry.setPathMetadata(pmd);
} else {
entry.setPathMetadata(null);
}
}
// If this path is a dir, remove its listing
if(entry.hasDirMeta()) {
LOG.debug("removing listing of {}", path);
entry.setDirListingMetadata(null);
}
// If the entry is empty (contains no dirMeta or pathMeta) remove it from
// the cache.
if(!entry.hasDirMeta() && !entry.hasPathMeta()){
localCache.invalidate(entry);
}
/* Remove this path from parent's dir listing */
Path parent = path.getParent();
if (parent != null) {
DirListingMetadata dir = getDirListingMeta(parent);
if (dir != null) {
LOG.debug("removing parent's entry for {} ", path);
if (tombstone) {
dir.markDeleted(path, ttlTimeProvider.getNow());
} else {
dir.remove(path);
}
}
}
}
/**
* Return a "standardized" version of a path so we always have a consistent
* hash value. Also asserts the path is absolute, and contains host
* component.
* @param p input Path
* @return standardized version of Path, suitable for hash key
*/
private Path standardize(Path p) {
Preconditions.checkArgument(p.isAbsolute(), "Path must be absolute");
URI uri = p.toUri();
if (uriHost != null) {
Preconditions.checkArgument(StringUtils.isNotEmpty(uri.getHost()));
}
return p;
}
@Override
public Map<String, String> getDiagnostics() throws IOException {
Map<String, String> map = new HashMap<>();
map.put("name", "local://metadata");
map.put("uriHost", uriHost);
map.put("description", "Local in-VM metadata store for testing");
map.put(MetadataStoreCapabilities.PERSISTS_AUTHORITATIVE_BIT,
Boolean.toString(true));
return map;
}
@Override
public void updateParameters(Map<String, String> parameters)
throws IOException {
}
PathMetadata getFileMeta(Path p){
LocalMetadataEntry entry = localCache.getIfPresent(p);
if(entry != null && entry.hasPathMeta()){
return entry.getFileMeta();
} else {
return null;
}
}
DirListingMetadata getDirListingMeta(Path p){
LocalMetadataEntry entry = localCache.getIfPresent(p);
if(entry != null && entry.hasDirMeta()){
return entry.getDirListingMeta();
} else {
return null;
}
}
@Override
public RenameTracker initiateRenameOperation(final StoreContext storeContext,
final Path source,
final S3AFileStatus sourceStatus, final Path dest) throws IOException {
return new ProgressiveRenameTracker(storeContext, this, source, dest,
null);
}
@Override
public synchronized void setTtlTimeProvider(ITtlTimeProvider ttlTimeProvider) {
this.ttlTimeProvider = ttlTimeProvider;
}
@Override
public synchronized void addAncestors(final Path qualifiedPath,
@Nullable final BulkOperationState operationState) throws IOException {
Collection<PathMetadata> newDirs = new ArrayList<>();
Path parent = qualifiedPath.getParent();
while (!parent.isRoot()) {
PathMetadata directory = get(parent);
if (directory == null || directory.isDeleted()) {
S3AFileStatus status = new S3AFileStatus(Tristate.FALSE, parent,
username);
PathMetadata meta = new PathMetadata(status, Tristate.FALSE, false,
ttlTimeProvider.getNow());
newDirs.add(meta);
} else {
break;
}
parent = parent.getParent();
}
if (!newDirs.isEmpty()) {
put(newDirs, operationState);
}
}
}

View File

@ -1,438 +0,0 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import java.io.Closeable;
import java.io.IOException;
import java.util.Collection;
import java.util.List;
import java.util.Map;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.Retries;
import org.apache.hadoop.fs.s3a.Retries.RetryTranslated;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
/**
* {@code MetadataStore} defines the set of operations that any metadata store
* implementation must provide. Note that all {@link Path} objects provided
* to methods must be absolute, not relative paths.
* Implementations must implement any retries needed internally, such that
* transient errors are generally recovered from without throwing exceptions
* from this API.
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public interface MetadataStore extends Closeable {
/**
* Performs one-time initialization of the metadata store.
*
* @param fs {@code FileSystem} associated with the MetadataStore
* @param ttlTimeProvider the time provider to use for metadata expiry
* @throws IOException if there is an error
*/
void initialize(FileSystem fs, ITtlTimeProvider ttlTimeProvider)
throws IOException;
/**
* Performs one-time initialization of the metadata store via configuration.
* @see #initialize(FileSystem, ITtlTimeProvider)
* @param conf Configuration.
* @param ttlTimeProvider the time provider to use for metadata expiry
* @throws IOException if there is an error
*/
void initialize(Configuration conf,
ITtlTimeProvider ttlTimeProvider) throws IOException;
/**
* Deletes exactly one path, leaving a tombstone to prevent lingering,
* inconsistent copies of it from being listed.
*
* Deleting an entry with a tombstone needs a
* {@link org.apache.hadoop.fs.s3a.s3guard.S3Guard.TtlTimeProvider} because
* the lastUpdated field of the record has to be updated to <pre>now</pre>.
*
* @param path the path to delete
* @param operationState (nullable) operational state for a bulk update
* @throws IOException if there is an error
*/
void delete(Path path,
@Nullable BulkOperationState operationState)
throws IOException;
/**
* Removes the record of exactly one path. Does not leave a tombstone (see
* {@link MetadataStore#delete(Path, BulkOperationState)}. It is currently
* intended for testing only, and a need to use it as part of normal
* FileSystem usage is not anticipated.
*
* @param path the path to delete
* @throws IOException if there is an error
*/
@VisibleForTesting
void forgetMetadata(Path path) throws IOException;
/**
* Deletes the entire sub-tree rooted at the given path, leaving tombstones
* to prevent lingering, inconsistent copies of it from being listed.
*
* In addition to affecting future calls to {@link #get(Path)},
* implementations must also update any stored {@code DirListingMetadata}
* objects which track the parent of this file.
*
* Deleting a subtree with a tombstone needs a
* {@link org.apache.hadoop.fs.s3a.s3guard.S3Guard.TtlTimeProvider} because
* the lastUpdated field of all records have to be updated to <pre>now</pre>.
*
* @param path the root of the sub-tree to delete
* @param operationState (nullable) operational state for a bulk update
* @throws IOException if there is an error
*/
@Retries.RetryTranslated
void deleteSubtree(Path path,
@Nullable BulkOperationState operationState)
throws IOException;
/**
* Delete the paths.
* There's no attempt to order the paths: they are
* deleted in the order passed in.
* @param paths paths to delete.
* @param operationState Nullable operation state
* @throws IOException failure
*/
@RetryTranslated
void deletePaths(Collection<Path> paths,
@Nullable BulkOperationState operationState)
throws IOException;
/**
* Gets metadata for a path.
*
* @param path the path to get
* @return metadata for {@code path}, {@code null} if not found
* @throws IOException if there is an error
*/
PathMetadata get(Path path) throws IOException;
/**
* Gets metadata for a path. Alternate method that includes a hint
* whether or not the MetadataStore should do work to compute the value for
* {@link PathMetadata#isEmptyDirectory()}. Since determining emptiness
* may be an expensive operation, this can save wasted work.
*
* @param path the path to get
* @param wantEmptyDirectoryFlag Set to true to give a hint to the
* MetadataStore that it should try to compute the empty directory flag.
* @return metadata for {@code path}, {@code null} if not found
* @throws IOException if there is an error
*/
PathMetadata get(Path path, boolean wantEmptyDirectoryFlag)
throws IOException;
/**
* Lists metadata for all direct children of a path.
*
* @param path the path to list
* @return metadata for all direct children of {@code path} which are being
* tracked by the MetadataStore, or {@code null} if the path was not found
* in the MetadataStore.
* @throws IOException if there is an error
*/
@Retries.RetryTranslated
DirListingMetadata listChildren(Path path) throws IOException;
/**
* This adds all new ancestors of a path as directories.
* <p>
* Important: to propagate TTL information, any new ancestors added
* must have their last updated timestamps set through
* {@link S3Guard#patchLastUpdated(Collection, ITtlTimeProvider)}.
* @param qualifiedPath path to update
* @param operationState (nullable) operational state for a bulk update
* @throws IOException failure
*/
@RetryTranslated
void addAncestors(Path qualifiedPath,
@Nullable BulkOperationState operationState) throws IOException;
/**
* Record the effects of a {@link FileSystem#rename(Path, Path)} in the
* MetadataStore. Clients provide explicit enumeration of the affected
* paths (recursively), before and after the rename.
*
* This operation is not atomic, unless specific implementations claim
* otherwise.
*
* On the need to provide an enumeration of directory trees instead of just
* source and destination paths:
* Since a MetadataStore does not have to track all metadata for the
* underlying storage system, and a new MetadataStore may be created on an
* existing underlying filesystem, this move() may be the first time the
* MetadataStore sees the affected paths. Therefore, simply providing src
* and destination paths may not be enough to record the deletions (under
* src path) and creations (at destination) that are happening during the
* rename().
*
* @param pathsToDelete Collection of all paths that were removed from the
* source directory tree of the move.
* @param pathsToCreate Collection of all PathMetadata for the new paths
* that were created at the destination of the rename().
* @param operationState Any ongoing state supplied to the rename tracker
* which is to be passed in with each move operation.
* @throws IOException if there is an error
*/
void move(@Nullable Collection<Path> pathsToDelete,
@Nullable Collection<PathMetadata> pathsToCreate,
@Nullable BulkOperationState operationState) throws IOException;
/**
* Saves metadata for exactly one path.
*
* Implementations may pre-create all the path's ancestors automatically.
* Implementations must update any {@code DirListingMetadata} objects which
* track the immediate parent of this file.
*
* @param meta the metadata to save
* @throws IOException if there is an error
*/
@RetryTranslated
void put(PathMetadata meta) throws IOException;
/**
* Saves metadata for exactly one path, potentially
* using any bulk operation state to eliminate duplicate work.
*
* Implementations may pre-create all the path's ancestors automatically.
* Implementations must update any {@code DirListingMetadata} objects which
* track the immediate parent of this file.
*
* @param meta the metadata to save
* @param operationState operational state for a bulk update
* @throws IOException if there is an error
*/
@RetryTranslated
void put(PathMetadata meta,
@Nullable BulkOperationState operationState) throws IOException;
/**
* Saves metadata for any number of paths.
*
* Semantics are otherwise the same as single-path puts.
*
* @param metas the metadata to save
* @param operationState (nullable) operational state for a bulk update
* @throws IOException if there is an error
*/
void put(Collection<? extends PathMetadata> metas,
@Nullable BulkOperationState operationState) throws IOException;
/**
* Save directory listing metadata. Callers may save a partial directory
* listing for a given path, or may store a complete and authoritative copy
* of the directory listing. {@code MetadataStore} implementations may
* subsequently keep track of all modifications to the directory contents at
* this path, and return authoritative results from subsequent calls to
* {@link #listChildren(Path)}. See {@link DirListingMetadata}.
*
* Any authoritative results returned are only authoritative for the scope
* of the {@code MetadataStore}: A per-process {@code MetadataStore}, for
* example, would only show results visible to that process, potentially
* missing metadata updates (create, delete) made to the same path by
* another process.
*
* To optimize updates and avoid overwriting existing entries which
* may contain extra data, entries in the list of unchangedEntries may
* be excluded. That is: the listing metadata has the full list of
* what it believes are children, but implementations can opt to ignore
* some.
* @param meta Directory listing metadata.
* @param unchangedEntries list of entries in the dir listing which have
* not changed since the directory was list scanned on s3guard.
* @param operationState operational state for a bulk update
* @throws IOException if there is an error
*/
void put(DirListingMetadata meta,
final List<Path> unchangedEntries,
@Nullable BulkOperationState operationState) throws IOException;
/**
* Destroy all resources associated with the metadata store.
*
* The destroyed resources can be DynamoDB tables, MySQL databases/tables, or
* HDFS directories. Any operations after calling this method may possibly
* fail.
*
* This operation is idempotent.
*
* @throws IOException if there is an error
*/
void destroy() throws IOException;
/**
* Prune method with two modes of operation:
* <ul>
* <li>
* {@link PruneMode#ALL_BY_MODTIME}
* Clear any metadata older than a specified mod_time from the store.
* Note that this modification time is the S3 modification time from the
* object's metadata - from the object store.
* Implementations MUST clear file metadata, and MAY clear directory
* metadata (s3a itself does not track modification time for directories).
* Implementations may also choose to throw UnsupportedOperationException
* instead. Note that modification times must be in UTC, as returned by
* System.currentTimeMillis at the time of modification.
* </li>
* </ul>
*
* <ul>
* <li>
* {@link PruneMode#TOMBSTONES_BY_LASTUPDATED}
* Clear any tombstone updated earlier than a specified time from the
* store. Note that this last_updated is the time when the metadata
* entry was last updated and maintained by the metadata store.
* Implementations MUST clear file metadata, and MAY clear directory
* metadata (s3a itself does not track modification time for directories).
* Implementations may also choose to throw UnsupportedOperationException
* instead. Note that last_updated must be in UTC, as returned by
* System.currentTimeMillis at the time of modification.
* </li>
* </ul>
*
* @param pruneMode Prune Mode
* @param cutoff Oldest time to allow (UTC)
* @throws IOException if there is an error
* @throws UnsupportedOperationException if not implemented
*/
void prune(PruneMode pruneMode, long cutoff) throws IOException,
UnsupportedOperationException;
/**
* Same as {@link MetadataStore#prune(PruneMode, long)}, but with an
* additional keyPrefix parameter to filter the pruned keys with a prefix.
*
* @param pruneMode Prune Mode
* @param cutoff Oldest time in milliseconds to allow (UTC)
* @param keyPrefix The prefix for the keys that should be removed
* @throws IOException if there is an error
* @throws UnsupportedOperationException if not implemented
* @return the number of pruned entries
*/
long prune(PruneMode pruneMode, long cutoff, String keyPrefix)
throws IOException, UnsupportedOperationException;
/**
* Get any diagnostics information from a store, as a list of (key, value)
* tuples for display. Arbitrary values; no guarantee of stability.
* These are for debugging and testing only.
* @return a map of strings.
* @throws IOException if there is an error
*/
Map<String, String> getDiagnostics() throws IOException;
/**
* Tune/update parameters for an existing table.
* @param parameters map of params to change.
* @throws IOException if there is an error
*/
void updateParameters(Map<String, String> parameters) throws IOException;
/**
* Mark all directories created/touched in an operation as authoritative.
* The metastore can now update that path with any authoritative
* flags it chooses.
* The store may assume that therefore the operation state is complete.
* This holds for rename and needs to be documented for import.
* @param dest destination path.
* @param operationState active state.
* @throws IOException failure.
* @return the number of directories marked.
*/
default int markAsAuthoritative(Path dest,
BulkOperationState operationState)
throws IOException {
return 0;
}
/**
* Modes of operation for prune.
* For details see {@link MetadataStore#prune(PruneMode, long)}
*/
enum PruneMode {
ALL_BY_MODTIME,
TOMBSTONES_BY_LASTUPDATED
}
/**
* Start a rename operation.
*
* @param storeContext store context.
* @param source source path
* @param sourceStatus status of the source file/dir
* @param dest destination path.
* @return the rename tracker
* @throws IOException Failure.
*/
RenameTracker initiateRenameOperation(
StoreContext storeContext,
Path source,
S3AFileStatus sourceStatus,
Path dest)
throws IOException;
/**
* Initiate a bulk update and create an operation state for it.
* This may then be passed into put operations.
* @param operation the type of the operation.
* @param dest path under which updates will be explicitly put.
* @return null or a store-specific state to pass into the put operations.
* @throws IOException failure
*/
default BulkOperationState initiateBulkWrite(
BulkOperationState.OperationType operation,
Path dest) throws IOException {
return new BulkOperationState(operation);
}
/**
* The TtlTimeProvider has to be set during the initialization for the
* metadatastore, but this method can be used for testing, and change the
* instance during runtime.
*
* @param ttlTimeProvider
*/
void setTtlTimeProvider(ITtlTimeProvider ttlTimeProvider);
/**
* Get any S3GuardInstrumentation for this store...must not be null.
* @return any store instrumentation.
*/
default MetastoreInstrumentation getInstrumentation() {
return new MetastoreInstrumentationImpl();
}
}

View File

@ -1,43 +0,0 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
/**
* All the capability constants used for the
* {@link MetadataStore} implementations.
*/
@InterfaceAudience.Public
@InterfaceStability.Evolving
public final class MetadataStoreCapabilities {
private MetadataStoreCapabilities(){
}
/**
* This capability tells if the metadata store supports authoritative
* directories. Used in {@link MetadataStore#getDiagnostics()} as a key
* for this capability. The value can be boolean true or false.
* If the Map.get() returns null for this key, that is interpreted as false.
*/
public static final String PERSISTS_AUTHORITATIVE_BIT =
"persist.authoritative.bit";
}

View File

@ -1,205 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.Iterator;
import java.util.LinkedList;
import java.util.Queue;
import java.util.Set;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
/**
* {@code MetadataStoreListFilesIterator} is a {@link RemoteIterator} that
* is similar to {@code DescendantsIterator} but does not return directories
* that have (or may have) children, and will also provide access to the set of
* tombstones to allow recently deleted S3 objects to be filtered out from a
* corresponding request. In other words, it returns tombstones and the same
* set of objects that should exist in S3: empty directories, and files, and not
* other directories whose existence is inferred therefrom.
*
* For example, assume the consistent store contains metadata representing this
* file system structure:
*
* <pre>
* /dir1
* |-- dir2
* | |-- file1
* | `-- file2
* `-- dir3
* |-- dir4
* | `-- file3
* |-- dir5
* | `-- file4
* `-- dir6
* </pre>
*
* Consider this code sample:
* <pre>
* final PathMetadata dir1 = get(new Path("/dir1"));
* for (MetadataStoreListFilesIterator files =
* new MetadataStoreListFilesIterator(dir1); files.hasNext(); ) {
* final FileStatus status = files.next().getFileStatus();
* System.out.printf("%s %s%n", status.isDirectory() ? 'D' : 'F',
* status.getPath());
* }
* </pre>
*
* The output is:
* <pre>
* F /dir1/dir2/file1
* F /dir1/dir2/file2
* F /dir1/dir3/dir4/file3
* F /dir1/dir3/dir5/file4
* D /dir1/dir3/dir6
* </pre>
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class MetadataStoreListFilesIterator implements
RemoteIterator<S3AFileStatus> {
public static final Logger LOG = LoggerFactory.getLogger(
MetadataStoreListFilesIterator.class);
private final boolean allowAuthoritative;
private final MetadataStore metadataStore;
private final Set<Path> tombstones = new HashSet<>();
private final boolean recursivelyAuthoritative;
private Iterator<S3AFileStatus> leafNodesIterator = null;
public MetadataStoreListFilesIterator(MetadataStore ms, PathMetadata meta,
boolean allowAuthoritative) throws IOException {
Preconditions.checkNotNull(ms);
this.metadataStore = ms;
this.allowAuthoritative = allowAuthoritative;
this.recursivelyAuthoritative = prefetch(meta);
}
/**
* Walks the listing tree, starting from given metadata path. All
* encountered files and empty directories are added to
* {@link leafNodesIterator} unless a directory seems to be empty
* and at least one of the following conditions hold:
* <ul>
* <li>
* The directory listing is not marked authoritative
* </li>
* <li>
* Authoritative mode is not allowed
* </li>
* </ul>
* @param meta starting point for tree walk
* @return {@code true} if all encountered directory listings
* are marked as authoritative
* @throws IOException
*/
private boolean prefetch(PathMetadata meta) throws IOException {
final Queue<PathMetadata> queue = new LinkedList<>();
final Collection<S3AFileStatus> leafNodes = new ArrayList<>();
boolean allListingsAuthoritative = true;
if (meta != null) {
final Path path = meta.getFileStatus().getPath();
if (path.isRoot()) {
DirListingMetadata rootListing = metadataStore.listChildren(path);
if (rootListing != null) {
if (!rootListing.isAuthoritative()) {
allListingsAuthoritative = false;
}
tombstones.addAll(rootListing.listTombstones());
queue.addAll(rootListing.withoutTombstones().getListing());
}
} else {
queue.add(meta);
}
} else {
allListingsAuthoritative = false;
}
while(!queue.isEmpty()) {
PathMetadata nextMetadata = queue.poll();
S3AFileStatus nextStatus = nextMetadata.getFileStatus();
if (nextStatus.isFile()) {
// All files are leaf nodes by definition
leafNodes.add(nextStatus);
continue;
}
if (nextStatus.isDirectory()) {
final Path path = nextStatus.getPath();
DirListingMetadata children = metadataStore.listChildren(path);
if (children != null) {
if (!children.isAuthoritative()) {
allListingsAuthoritative = false;
}
tombstones.addAll(children.listTombstones());
Collection<PathMetadata> liveChildren =
children.withoutTombstones().getListing();
if (!liveChildren.isEmpty()) {
// If it's a directory, has children, not all deleted, then we
// add the children to the queue and move on to the next node
queue.addAll(liveChildren);
continue;
} else if (allowAuthoritative && children.isAuthoritative()) {
leafNodes.add(nextStatus);
}
} else {
// we do not have a listing, so directory definitely non-authoritative
allListingsAuthoritative = false;
}
}
// Directories that *might* be empty are ignored for now, since we
// cannot confirm that they are empty without incurring other costs.
// Users of this class can still discover empty directories via S3's
// fake directories, subject to the same consistency semantics as before.
// The only other possibility is a symlink, which is unsupported on S3A.
}
leafNodesIterator = leafNodes.iterator();
return allListingsAuthoritative;
}
@Override
public boolean hasNext() {
return leafNodesIterator.hasNext();
}
@Override
public S3AFileStatus next() {
return leafNodesIterator.next();
}
public boolean isRecursivelyAuthoritative() {
return recursivelyAuthoritative;
}
public Set<Path> listTombstones() {
return tombstones;
}
}

View File

@ -1,70 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
/**
* Instrumentation exported to S3Guard.
*/
public interface MetastoreInstrumentation {
/** Initialized event. */
void initialized();
/** Store has been closed. */
void storeClosed();
/**
* Throttled request.
*/
void throttled();
/**
* S3Guard is retrying after a (retryable) failure.
*/
void retrying();
/**
* Records have been deleted.
* @param count the number of records deleted.
*/
void recordsDeleted(int count);
/**
* Records have been read.
* @param count the number of records read
*/
void recordsRead(int count);
/**
* records have been written (including tombstones).
* @param count number of records written.
*/
void recordsWritten(int count);
/**
* A directory has been tagged as authoritative.
*/
void directoryMarkedAuthoritative();
/**
* An entry was added.
* @param durationNanos time to add
*/
void entryAdded(long durationNanos);
}

View File

@ -1,72 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
/**
* A no-op implementation of {@link MetastoreInstrumentation}
* which allows metastores to always return an instance
* when requested.
*/
public class MetastoreInstrumentationImpl implements MetastoreInstrumentation {
@Override
public void initialized() {
}
@Override
public void storeClosed() {
}
@Override
public void throttled() {
}
@Override
public void retrying() {
}
@Override
public void recordsDeleted(final int count) {
}
@Override
public void recordsRead(final int count) {
}
@Override
public void recordsWritten(final int count) {
}
@Override
public void directoryMarkedAuthoritative() {
}
@Override
public void entryAdded(final long durationNanos) {
}
}

View File

@ -1,192 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import java.io.IOException;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* A no-op implementation of MetadataStore. Clients that use this
* implementation should behave the same as they would without any
* MetadataStore.
*/
public class NullMetadataStore implements MetadataStore {
@Override
public void initialize(FileSystem fs, ITtlTimeProvider ttlTimeProvider)
throws IOException {
}
@Override
public void initialize(Configuration conf, ITtlTimeProvider ttlTimeProvider)
throws IOException {
}
@Override
public void close() throws IOException {
}
@Override
public void delete(Path path,
final BulkOperationState operationState)
throws IOException {
}
@Override
public void forgetMetadata(Path path) throws IOException {
}
@Override
public void deleteSubtree(Path path,
final BulkOperationState operationState)
throws IOException {
}
@Override
public void deletePaths(final Collection<Path> paths,
@Nullable final BulkOperationState operationState) throws IOException {
}
@Override
public PathMetadata get(Path path) throws IOException {
return null;
}
@Override
public PathMetadata get(Path path, boolean wantEmptyDirectoryFlag)
throws IOException {
return null;
}
@Override
public DirListingMetadata listChildren(Path path) throws IOException {
return null;
}
@Override
public void move(Collection<Path> pathsToDelete,
Collection<PathMetadata> pathsToCreate,
final BulkOperationState operationState) throws IOException {
}
@Override
public void put(final PathMetadata meta) throws IOException {
}
@Override
public void put(PathMetadata meta,
final BulkOperationState operationState) throws IOException {
}
@Override
public void put(Collection<? extends PathMetadata> meta,
final BulkOperationState operationState) throws IOException {
}
@Override
public void put(DirListingMetadata meta,
final List<Path> unchangedEntries,
final BulkOperationState operationState) throws IOException {
}
@Override
public void destroy() throws IOException {
}
@Override
public void prune(PruneMode pruneMode, long cutoff) {
}
@Override
public long prune(PruneMode pruneMode, long cutoff, String keyPrefix) {
return 0;
}
@Override
public String toString() {
return "NullMetadataStore";
}
@Override
public Map<String, String> getDiagnostics() throws IOException {
Map<String, String> map = new HashMap<>();
map.put("name", "Null Metadata Store");
map.put("description", "This is not a real metadata store");
return map;
}
@Override
public void updateParameters(Map<String, String> parameters)
throws IOException {
}
@Override
public RenameTracker initiateRenameOperation(final StoreContext storeContext,
final Path source,
final S3AFileStatus sourceStatus,
final Path dest)
throws IOException {
return new NullRenameTracker(storeContext, source, dest, this);
}
@Override
public void setTtlTimeProvider(ITtlTimeProvider ttlTimeProvider) {
}
@Override
public void addAncestors(final Path qualifiedPath,
@Nullable final BulkOperationState operationState) throws IOException {
}
private static final class NullRenameTracker extends RenameTracker {
private NullRenameTracker(
final StoreContext storeContext,
final Path source,
final Path dest,
MetadataStore metadataStore) {
super("NullRenameTracker", storeContext, metadataStore, source, dest,
null);
}
@Override
public void fileCopied(final Path childSource,
final S3ObjectAttributes sourceAttributes,
final S3ObjectAttributes destAttributes,
final Path destPath,
final long blockSize,
final boolean addAncestors) throws IOException {
}
}
}

View File

@ -1,196 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import org.apache.hadoop.util.Preconditions;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
/**
* {@code PathMetadata} models path metadata stored in the
* {@link MetadataStore}. The lastUpdated field is implicitly set to 0 in the
* constructors without that parameter to show that it will be initialized
* with 0 if not set otherwise.
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
public class PathMetadata extends ExpirableMetadata {
private S3AFileStatus fileStatus;
private Tristate isEmptyDirectory;
private boolean isDeleted;
/**
* Create a tombstone from the current time.
* It is mandatory to set the lastUpdated field to update when the
* tombstone state has changed to set when the entry got deleted.
*
* @param path path to tombstone
* @param lastUpdated last updated time on which expiration is based.
* @return the entry.
*/
public static PathMetadata tombstone(Path path, long lastUpdated) {
S3AFileStatus s3aStatus = new S3AFileStatus(0,
System.currentTimeMillis(), path, 0, null,
null, null);
return new PathMetadata(s3aStatus, Tristate.UNKNOWN, true, lastUpdated);
}
/**
* Creates a new {@code PathMetadata} containing given {@code FileStatus}.
* lastUpdated field will be updated to 0 implicitly in this constructor.
*
* @param fileStatus file status containing an absolute path.
*/
public PathMetadata(S3AFileStatus fileStatus) {
this(fileStatus, Tristate.UNKNOWN, false, 0);
}
/**
* Creates a new {@code PathMetadata} containing given {@code FileStatus}.
*
* @param fileStatus file status containing an absolute path.
* @param lastUpdated last updated time on which expiration is based.
*/
public PathMetadata(S3AFileStatus fileStatus, long lastUpdated) {
this(fileStatus, Tristate.UNKNOWN, false, lastUpdated);
}
/**
* Creates a new {@code PathMetadata}.
* lastUpdated field will be updated to 0 implicitly in this constructor.
*
* @param fileStatus file status containing an absolute path.
* @param isEmptyDir empty directory {@link Tristate}
*/
public PathMetadata(S3AFileStatus fileStatus, Tristate isEmptyDir) {
this(fileStatus, isEmptyDir, false, 0);
}
/**
* Creates a new {@code PathMetadata}.
* lastUpdated field will be updated to 0 implicitly in this constructor.
*
* @param fileStatus file status containing an absolute path.
* @param isEmptyDir empty directory {@link Tristate}
* @param isDeleted deleted / tombstoned flag
*/
public PathMetadata(S3AFileStatus fileStatus, Tristate isEmptyDir,
boolean isDeleted) {
this(fileStatus, isEmptyDir, isDeleted, 0);
}
/**
* Creates a new {@code PathMetadata}.
*
* @param fileStatus file status containing an absolute path.
* @param isEmptyDir empty directory {@link Tristate}
* @param isDeleted deleted / tombstoned flag
* @param lastUpdated last updated time on which expiration is based.
*/
public PathMetadata(S3AFileStatus fileStatus, Tristate isEmptyDir, boolean
isDeleted, long lastUpdated) {
Preconditions.checkNotNull(fileStatus, "fileStatus must be non-null");
Preconditions.checkNotNull(fileStatus.getPath(), "fileStatus path must be" +
" non-null");
Preconditions.checkArgument(fileStatus.getPath().isAbsolute(), "path must" +
" be absolute");
Preconditions.checkArgument(lastUpdated >=0, "lastUpdated parameter must "
+ "be greater or equal to 0.");
this.fileStatus = fileStatus;
this.isEmptyDirectory = isEmptyDir;
this.isDeleted = isDeleted;
this.setLastUpdated(lastUpdated);
}
/**
* @return {@code FileStatus} contained in this {@code PathMetadata}.
*/
public final S3AFileStatus getFileStatus() {
return fileStatus;
}
/**
* Query if a directory is empty.
* @return Tristate.TRUE if this is known to be an empty directory,
* Tristate.FALSE if known to not be empty, and Tristate.UNKNOWN if the
* MetadataStore does have enough information to determine either way.
*/
public Tristate isEmptyDirectory() {
return isEmptyDirectory;
}
void setIsEmptyDirectory(Tristate isEmptyDirectory) {
this.isEmptyDirectory = isEmptyDirectory;
fileStatus.setIsEmptyDirectory(isEmptyDirectory);
}
public boolean isDeleted() {
return isDeleted;
}
void setIsDeleted(boolean isDeleted) {
this.isDeleted = isDeleted;
}
@Override
public boolean equals(Object o) {
if (!(o instanceof PathMetadata)) {
return false;
}
return this.fileStatus.equals(((PathMetadata)o).fileStatus);
}
@Override
public int hashCode() {
return fileStatus.hashCode();
}
@Override
public String toString() {
return "PathMetadata{" +
"fileStatus=" + fileStatus +
"; isEmptyDirectory=" + isEmptyDirectory +
"; isDeleted=" + isDeleted +
"; lastUpdated=" + super.getLastUpdated() +
'}';
}
/**
* Log contents to supplied StringBuilder in a pretty fashion.
* @param sb target StringBuilder
*/
public void prettyPrint(StringBuilder sb) {
sb.append(String.format("%-5s %-20s %-7d %-8s %-6s %-20s %-20s",
fileStatus.isDirectory() ? "dir" : "file",
fileStatus.getPath().toString(), fileStatus.getLen(),
isEmptyDirectory.name(), isDeleted,
fileStatus.getETag(), fileStatus.getVersionId()));
sb.append(fileStatus);
}
public String prettyPrint() {
StringBuilder sb = new StringBuilder();
prettyPrint(sb);
return sb.toString();
}
}

View File

@ -1,425 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.net.URI;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.KeyAttribute;
import com.amazonaws.services.dynamodbv2.document.PrimaryKey;
import com.amazonaws.services.dynamodbv2.model.AttributeDefinition;
import com.amazonaws.services.dynamodbv2.model.KeySchemaElement;
import com.amazonaws.services.dynamodbv2.model.KeyType;
import com.amazonaws.services.dynamodbv2.model.ScalarAttributeType;
import org.apache.hadoop.classification.VisibleForTesting;
import org.apache.hadoop.util.Preconditions;
import org.apache.commons.lang3.StringUtils;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.Constants;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.Tristate;
/**
* Defines methods for translating between domain model objects and their
* representations in the DynamoDB schema.
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving
@VisibleForTesting
public final class PathMetadataDynamoDBTranslation {
/** The HASH key name of each item. */
@VisibleForTesting
static final String PARENT = "parent";
/** The RANGE key name of each item. */
@VisibleForTesting
static final String CHILD = "child";
@VisibleForTesting
static final String IS_DIR = "is_dir";
@VisibleForTesting
static final String MOD_TIME = "mod_time";
@VisibleForTesting
static final String FILE_LENGTH = "file_length";
@VisibleForTesting
static final String BLOCK_SIZE = "block_size";
static final String IS_DELETED = "is_deleted";
static final String IS_AUTHORITATIVE = "is_authoritative";
static final String LAST_UPDATED = "last_updated";
static final String ETAG = "etag";
static final String VERSION_ID = "version_id";
/** Used while testing backward compatibility. */
@VisibleForTesting
static final Set<String> IGNORED_FIELDS = new HashSet<>();
/** Table version field {@value} in version marker item. */
@VisibleForTesting
static final String TABLE_VERSION = "table_version";
/** Table creation timestampfield {@value} in version marker item. */
@VisibleForTesting
static final String TABLE_CREATED = "table_created";
/** The version marker field is invalid. */
static final String E_NOT_VERSION_MARKER = "Not a version marker: ";
/**
* Returns the key schema for the DynamoDB table.
*
* @return DynamoDB key schema
*/
static Collection<KeySchemaElement> keySchema() {
return Arrays.asList(
new KeySchemaElement(PARENT, KeyType.HASH),
new KeySchemaElement(CHILD, KeyType.RANGE));
}
/**
* Returns the attribute definitions for the DynamoDB table.
*
* @return DynamoDB attribute definitions
*/
static Collection<AttributeDefinition> attributeDefinitions() {
return Arrays.asList(
new AttributeDefinition(PARENT, ScalarAttributeType.S),
new AttributeDefinition(CHILD, ScalarAttributeType.S));
}
/**
* Converts a DynamoDB item to a {@link DDBPathMetadata}.
*
* @param item DynamoDB item to convert
* @return {@code item} converted to a {@link DDBPathMetadata}
*/
static DDBPathMetadata itemToPathMetadata(Item item, String username) {
if (item == null) {
return null;
}
String parentStr = item.getString(PARENT);
Preconditions.checkNotNull(parentStr, "No parent entry in item %s", item);
String childStr = item.getString(CHILD);
Preconditions.checkNotNull(childStr, "No child entry in item %s", item);
// Skip table version markers, which are only non-absolute paths stored.
Path rawPath = new Path(parentStr, childStr);
if (!rawPath.isAbsoluteAndSchemeAuthorityNull()) {
return null;
}
Path parent = new Path(Constants.FS_S3A + ":/" + parentStr + "/");
Path path = new Path(parent, childStr);
boolean isDir = item.hasAttribute(IS_DIR) && item.getBoolean(IS_DIR);
boolean isAuthoritativeDir = false;
final S3AFileStatus fileStatus;
long lastUpdated = 0;
if (isDir) {
isAuthoritativeDir = !IGNORED_FIELDS.contains(IS_AUTHORITATIVE)
&& item.hasAttribute(IS_AUTHORITATIVE)
&& item.getBoolean(IS_AUTHORITATIVE);
fileStatus = DynamoDBMetadataStore.makeDirStatus(path, username);
} else {
long len = item.hasAttribute(FILE_LENGTH) ? item.getLong(FILE_LENGTH) : 0;
long modTime = item.hasAttribute(MOD_TIME) ? item.getLong(MOD_TIME) : 0;
long block = item.hasAttribute(BLOCK_SIZE) ? item.getLong(BLOCK_SIZE) : 0;
String eTag = item.getString(ETAG);
String versionId = item.getString(VERSION_ID);
fileStatus = new S3AFileStatus(
len, modTime, path, block, username, eTag, versionId);
}
lastUpdated =
!IGNORED_FIELDS.contains(LAST_UPDATED)
&& item.hasAttribute(LAST_UPDATED)
? item.getLong(LAST_UPDATED) : 0;
boolean isDeleted =
item.hasAttribute(IS_DELETED) && item.getBoolean(IS_DELETED);
return new DDBPathMetadata(fileStatus, Tristate.UNKNOWN, isDeleted,
isAuthoritativeDir, lastUpdated);
}
/**
* Converts a {@link DDBPathMetadata} to a DynamoDB item.
*
* Can ignore {@code IS_AUTHORITATIVE} flag if {@code ignoreIsAuthFlag} is
* true.
*
* @param meta {@link DDBPathMetadata} to convert
* @return {@code meta} converted to DynamoDB item
*/
static Item pathMetadataToItem(DDBPathMetadata meta) {
Preconditions.checkNotNull(meta);
final S3AFileStatus status = meta.getFileStatus();
final Item item = new Item().withPrimaryKey(pathToKey(status.getPath()));
if (status.isDirectory()) {
item.withBoolean(IS_DIR, true);
if (!IGNORED_FIELDS.contains(IS_AUTHORITATIVE)) {
item.withBoolean(IS_AUTHORITATIVE, meta.isAuthoritativeDir());
}
} else {
item.withLong(FILE_LENGTH, status.getLen())
.withLong(MOD_TIME, status.getModificationTime())
.withLong(BLOCK_SIZE, status.getBlockSize());
if (status.getETag() != null) {
item.withString(ETAG, status.getETag());
}
if (status.getVersionId() != null) {
item.withString(VERSION_ID, status.getVersionId());
}
}
item.withBoolean(IS_DELETED, meta.isDeleted());
if(!IGNORED_FIELDS.contains(LAST_UPDATED)) {
item.withLong(LAST_UPDATED, meta.getLastUpdated());
}
return item;
}
/**
* The version marker has a primary key whose PARENT is {@code name};
* this MUST NOT be a value which represents an absolute path.
* @param name name of the version marker
* @param version version number
* @param timestamp creation timestamp
* @return an item representing a version marker.
*/
static Item createVersionMarker(String name, int version, long timestamp) {
return new Item().withPrimaryKey(createVersionMarkerPrimaryKey(name))
.withInt(TABLE_VERSION, version)
.withLong(TABLE_CREATED, timestamp);
}
/**
* Create the primary key of the version marker.
* @param name key name
* @return the key to use when registering or resolving version markers
*/
static PrimaryKey createVersionMarkerPrimaryKey(String name) {
return new PrimaryKey(PARENT, name, CHILD, name);
}
/**
* Extract the version from a version marker item.
* @param marker version marker item
* @return the extracted version field
* @throws IOException if the item is not a version marker
*/
static int extractVersionFromMarker(Item marker) throws IOException {
if (marker.hasAttribute(TABLE_VERSION)) {
return marker.getInt(TABLE_VERSION);
} else {
throw new IOException(E_NOT_VERSION_MARKER + marker);
}
}
/**
* Extract the creation time, if present.
* @param marker version marker item
* @return the creation time, or null
* @throws IOException if the item is not a version marker
*/
static Long extractCreationTimeFromMarker(Item marker) {
if (marker.hasAttribute(TABLE_CREATED)) {
return marker.getLong(TABLE_CREATED);
} else {
return null;
}
}
/**
* Converts a collection {@link DDBPathMetadata} to a collection DynamoDB
* items.
*
* @see #pathMetadataToItem(DDBPathMetadata)
*/
static Item[] pathMetadataToItem(Collection<DDBPathMetadata> metas) {
if (metas == null) {
return null;
}
final Item[] items = new Item[metas.size()];
int i = 0;
for (DDBPathMetadata meta : metas) {
items[i++] = pathMetadataToItem(meta);
}
return items;
}
/**
* Converts a {@link Path} to a DynamoDB equality condition on that path as
* parent, suitable for querying all direct children of the path.
*
* @param path the path; can not be null
* @return DynamoDB equality condition on {@code path} as parent
*/
static KeyAttribute pathToParentKeyAttribute(Path path) {
return new KeyAttribute(PARENT, pathToParentKey(path));
}
/**
* e.g. {@code pathToParentKey(s3a://bucket/path/a) -> /bucket/path/a}
* @param path path to convert
* @return string for parent key
*/
@VisibleForTesting
public static String pathToParentKey(Path path) {
Preconditions.checkNotNull(path);
Preconditions.checkArgument(path.isUriPathAbsolute(),
"Path not absolute: '%s'", path);
URI uri = path.toUri();
String bucket = uri.getHost();
Preconditions.checkArgument(!StringUtils.isEmpty(bucket),
"Path missing bucket %s", path);
String pKey = "/" + bucket + uri.getPath();
// Strip trailing slash
if (pKey.endsWith("/")) {
pKey = pKey.substring(0, pKey.length() - 1);
}
return pKey;
}
/**
* Converts a {@link Path} to a DynamoDB key, suitable for getting the item
* matching the path.
*
* @param path the path; can not be null
* @return DynamoDB key for item matching {@code path}
*/
static PrimaryKey pathToKey(Path path) {
Preconditions.checkArgument(!path.isRoot(),
"Root path is not mapped to any PrimaryKey");
String childName = path.getName();
PrimaryKey key = new PrimaryKey(PARENT,
pathToParentKey(path.getParent()), CHILD,
childName);
for (KeyAttribute attr : key.getComponents()) {
String name = attr.getName();
Object v = attr.getValue();
Preconditions.checkNotNull(v,
"Null value for DynamoDB attribute \"%s\"", name);
Preconditions.checkState(!((String)v).isEmpty(),
"Empty string value for DynamoDB attribute \"%s\"", name);
}
return key;
}
/**
* Converts a collection of {@link Path} to a collection of DynamoDB keys.
*
* @see #pathToKey(Path)
*/
static PrimaryKey[] pathToKey(Collection<Path> paths) {
if (paths == null) {
return null;
}
final PrimaryKey[] keys = new PrimaryKey[paths.size()];
int i = 0;
for (Path p : paths) {
keys[i++] = pathToKey(p);
}
return keys;
}
/**
* There is no need to instantiate this class.
*/
private PathMetadataDynamoDBTranslation() {
}
/**
* Convert a collection of metadata entries to a list
* of DDBPathMetadata entries.
* If the sources are already DDBPathMetadata instances, they
* are copied directly into the new list, otherwise new
* instances are created.
* @param pathMetadatas source data
* @return the converted list.
*/
static List<DDBPathMetadata> pathMetaToDDBPathMeta(
Collection<? extends PathMetadata> pathMetadatas) {
return pathMetadatas.stream().map(p ->
(p instanceof DDBPathMetadata)
? (DDBPathMetadata) p
: new DDBPathMetadata(p))
.collect(Collectors.toList());
}
/**
* Convert an item's (parent, child) key to a string value
* for logging. There is no validation of the item.
* @param item item.
* @return an s3a:// prefixed string.
*/
static String itemPrimaryKeyToString(Item item) {
String parent = item.getString(PARENT);
String child = item.getString(CHILD);
return "s3a://" + parent + "/" + child;
}
/**
* Convert an item's (parent, child) key to a string value
* for logging. There is no validation of the item.
* @param item item.
* @return an s3a:// prefixed string.
*/
static String primaryKeyToString(PrimaryKey item) {
Collection<KeyAttribute> c = item.getComponents();
String parent = "";
String child = "";
for (KeyAttribute attr : c) {
switch (attr.getName()) {
case PARENT:
parent = attr.getValue().toString();
break;
case CHILD:
child = attr.getValue().toString();
break;
default:
}
}
return "s3a://" + parent + "/" + child;
}
/**
* Create an empty dir marker which, when passed to the
* DDB metastore, is considered authoritative.
* @param status file status
* @return path metadata.
*/
static PathMetadata authoritativeEmptyDirectoryMarker(
final S3AFileStatus status) {
return new DDBPathMetadata(status, Tristate.TRUE,
false, true, 0);
}
}

View File

@ -1,133 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.Serializable;
import java.util.Comparator;
import org.apache.hadoop.fs.Path;
/**
* Comparator of path ordering for sorting collections.
*
* The definition of "topmost" is:
* <ol>
* <li>The depth of a path is the primary comparator.</li>
* <li>Root is topmost, "0"</li>
* <li>If two paths are of equal depth, {@link Path#compareTo(Path)}</li>
* is used. This delegates to URI compareTo.
* <li>repeated sorts do not change the order</li>
* </ol>
*/
final class PathOrderComparators {
private PathOrderComparators() {
}
/**
* The shallowest paths come first.
* This is to be used when adding entries.
*/
static final Comparator<Path> TOPMOST_PATH_FIRST
= new TopmostFirst();
/**
* The leaves come first.
* This is to be used when deleting entries.
*/
static final Comparator<Path> TOPMOST_PATH_LAST
= new TopmostLast();
/**
* The shallowest paths come first.
* This is to be used when adding entries.
*/
static final Comparator<PathMetadata> TOPMOST_PM_FIRST
= new PathMetadataComparator(TOPMOST_PATH_FIRST);
/**
* The leaves come first.
* This is to be used when deleting entries.
*/
static final Comparator<PathMetadata> TOPMOST_PM_LAST
= new PathMetadataComparator(TOPMOST_PATH_LAST);
private static class TopmostFirst implements Comparator<Path>, Serializable {
@Override
public int compare(Path pathL, Path pathR) {
// exit fast on equal values.
if (pathL.equals(pathR)) {
return 0;
}
int depthL = pathL.depth();
int depthR = pathR.depth();
if (depthL < depthR) {
// left is higher up than the right.
return -1;
}
if (depthR < depthL) {
// right is higher up than the left
return 1;
}
// and if they are of equal depth, use the "classic" comparator
// of paths.
return pathL.compareTo(pathR);
}
}
/**
* Compare the topmost last.
* For some reason the .reverse() option wasn't giving the
* correct outcome.
*/
private static final class TopmostLast extends TopmostFirst {
@Override
public int compare(final Path pathL, final Path pathR) {
int compare = super.compare(pathL, pathR);
if (compare < 0) {
return 1;
}
if (compare > 0) {
return -1;
}
return 0;
}
}
/**
* Compare on path status.
*/
static final class PathMetadataComparator implements
Comparator<PathMetadata>, Serializable {
private final Comparator<Path> inner;
PathMetadataComparator(final Comparator<Path> inner) {
this.inner = inner;
}
@Override
public int compare(final PathMetadata o1, final PathMetadata o2) {
return inner.compare(o1.getFileStatus().getPath(),
o2.getFileStatus().getPath());
}
}
}

View File

@ -1,247 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.List;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.util.DurationInfo;
import static org.apache.hadoop.util.Preconditions.checkArgument;
import static org.apache.hadoop.fs.s3a.s3guard.S3Guard.addMoveAncestors;
import static org.apache.hadoop.fs.s3a.s3guard.S3Guard.addMoveDir;
/**
* This rename tracker progressively updates the metadata store
* as it proceeds, during the parallelized copy operation.
* <p>
* Algorithm
* <ol>
* <li>
* As {@code RenameTracker.fileCopied()} callbacks
* are raised, the metastore is updated with the new file entry.
* </li>
* <li>
* Including parent entries, as appropriate.
* </li>
* <li>
* All directories which have been created are tracked locally,
* to avoid needing to read the store; this is a thread-safe structure.
* </li>
* <li>
* The actual update is performed out of any synchronized block.
* </li>
* <li>
* When deletes are executed, the store is also updated.
* </li>
* <li>
* And at the completion of a successful rename, the source directory
* is also removed.
* </li>
* </ol>
* <pre>
*
* </pre>
*/
public class ProgressiveRenameTracker extends RenameTracker {
/**
* The collection of paths to delete; this is added as individual files
* are renamed.
* <p>
* The metastore is only updated with these entries after the DELETE
* call containing these paths succeeds.
* <p>
* If the DELETE fails; the filesystem will use
* {@code MultiObjectDeleteSupport} to remove all successfully deleted
* entries from the metastore.
*/
private final Collection<Path> pathsToDelete = new HashSet<>();
public ProgressiveRenameTracker(
final StoreContext storeContext,
final MetadataStore metadataStore,
final Path sourceRoot,
final Path dest,
final BulkOperationState operationState) {
super("ProgressiveRenameTracker",
storeContext, metadataStore, sourceRoot, dest, operationState);
}
/**
* When a file is copied, any ancestors
* are calculated and then the store is updated with
* the destination entries.
* <p>
* The source entries are added to the {@link #pathsToDelete} list.
* @param sourcePath path of source
* @param sourceAttributes status of source.
* @param destAttributes destination attributes
* @param destPath destination path.
* @param blockSize block size.
* @param addAncestors should ancestors be added?
* @throws IOException failure
*/
@Override
public void fileCopied(
final Path sourcePath,
final S3ObjectAttributes sourceAttributes,
final S3ObjectAttributes destAttributes,
final Path destPath,
final long blockSize,
final boolean addAncestors) throws IOException {
// build the list of entries to add in a synchronized block.
final List<PathMetadata> entriesToAdd = new ArrayList<>(1);
LOG.debug("Updating store with copied file {}", sourcePath);
MetadataStore store = getMetadataStore();
synchronized (this) {
checkArgument(!pathsToDelete.contains(sourcePath),
"File being renamed is already processed %s", destPath);
// create the file metadata and update the lists
// the pathsToDelete field is incremented with the new source path,
// for deletion after the DELETE operation succeeds;
// the entriesToAdd variable is filled in with all entries
// to add within this method
S3Guard.addMoveFile(
store,
pathsToDelete,
entriesToAdd,
sourcePath,
destPath,
sourceAttributes.getLen(),
blockSize,
getOwner(),
destAttributes.getETag(),
destAttributes.getVersionId());
LOG.debug("New metastore entry : {}", entriesToAdd.get(0));
if (addAncestors) {
// add all new ancestors to the lists
addMoveAncestors(
store,
pathsToDelete,
entriesToAdd,
getSourceRoot(),
sourcePath,
destPath,
getOwner());
}
}
// outside the lock, the entriesToAdd variable has all the new entries to
// create. ...so update the store.
// no entries are deleted at this point.
try (DurationInfo ignored = new DurationInfo(LOG, false,
"Adding new metastore entries")) {
store.move(null, entriesToAdd, getOperationState());
}
}
/**
* A directory marker has been added.
* Add the new entry and record the source path as another entry to delete.
* @param sourcePath status of source.
* @param destPath destination path.
* @param addAncestors should ancestors be added?
* @throws IOException failure.
*/
@Override
public void directoryMarkerCopied(
final Path sourcePath,
final Path destPath,
final boolean addAncestors) throws IOException {
// this list is created on demand.
final List<PathMetadata> entriesToAdd = new ArrayList<>(1);
MetadataStore store = getMetadataStore();
synchronized (this) {
addMoveDir(store,
pathsToDelete,
entriesToAdd,
sourcePath,
destPath,
getOwner());
// Ancestor directories may not be listed, so we explicitly add them
if (addAncestors) {
addMoveAncestors(store,
pathsToDelete,
entriesToAdd,
getSourceRoot(),
sourcePath,
destPath,
getOwner());
}
}
// outside the lock, the entriesToAdd list has all new files to create.
// ...so update the store.
try (DurationInfo ignored = new DurationInfo(LOG, false,
"adding %s metastore entries", entriesToAdd.size())) {
store.move(null, entriesToAdd, getOperationState());
}
}
@Override
public synchronized void moveSourceDirectory() throws IOException {
// this moves the source directory in the metastore if it has not
// already been processed.
if (!pathsToDelete.contains(getSourceRoot())) {
final List<Path> toDelete = new ArrayList<>(1);
final List<PathMetadata> toAdd = new ArrayList<>(1);
addMoveDir(getMetadataStore(), pathsToDelete, toAdd,
getSourceRoot(),
getDest(),
getOwner());
getMetadataStore().move(toDelete, toAdd, getOperationState());
}
getMetadataStore().markAsAuthoritative(
getDest(), getOperationState());
}
/**
* As source objects are deleted, so is the list of entries.
* @param paths path of objects deleted.
* @throws IOException failure.
*/
@Override
public void sourceObjectsDeleted(
final Collection<Path> paths) throws IOException {
// delete the paths from the metastore
try (DurationInfo ignored = new DurationInfo(LOG, false,
"delete %s metastore entries", paths.size())) {
getMetadataStore().move(paths, null, getOperationState());
getMetadataStore().deletePaths(paths, getOperationState());
}
}
@Override
public synchronized void completeRename() throws IOException {
// mark dest tree as authoritative all the way down.
// finish off by deleting source directories.
sourceObjectsDeleted(pathsToDelete);
super.completeRename();
}
}

View File

@ -1,258 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import javax.annotation.Nullable;
import java.io.IOException;
import java.net.URI;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.service.Service;
import org.apache.hadoop.service.launcher.LauncherExitCodes;
import org.apache.hadoop.service.launcher.ServiceLaunchException;
import org.apache.hadoop.service.launcher.ServiceLauncher;
import org.apache.hadoop.util.DurationInfo;
import org.apache.hadoop.util.ExitUtil;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.s3a.s3guard.DumpS3GuardDynamoTable.serviceMain;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.PARENT;
/**
* Purge the S3Guard table of a FileSystem from all entries related to
* that table.
* Will fail if there is no table, or the store is in auth mode.
* <pre>
* hadoop org.apache.hadoop.fs.s3a.s3guard.PurgeS3GuardDynamoTable \
* -force s3a://example-bucket/
* </pre>
*
*/
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class PurgeS3GuardDynamoTable
extends AbstractS3GuardDynamoDBDiagnostic {
private static final Logger LOG =
LoggerFactory.getLogger(PurgeS3GuardDynamoTable.class);
public static final String NAME = "PurgeS3GuardDynamoTable";
/**
* Name of the force option.
*/
public static final String FORCE = "-force";
/**
* Usage message.
*/
private static final String USAGE_MESSAGE = NAME
+ " [-force] <filesystem>";
/**
* Flag which actually triggers the delete.
*/
private boolean force;
private long filesFound;
private long filesDeleted;
public PurgeS3GuardDynamoTable(final String name) {
super(name);
}
public PurgeS3GuardDynamoTable() {
this(NAME);
}
public PurgeS3GuardDynamoTable(
final S3AFileSystem filesystem,
final DynamoDBMetadataStore store,
final URI uri,
final boolean force) {
super(NAME, filesystem, store, uri);
this.force = force;
}
/**
* Bind to the argument list, including validating the CLI.
* @throws Exception failure.
*/
@Override
protected void serviceStart() throws Exception {
if (getStore() == null) {
List<String> arg = getArgumentList(1, 2, USAGE_MESSAGE);
String fsURI = arg.get(0);
if (arg.size() == 2) {
if (!arg.get(0).equals(FORCE)) {
throw new ServiceLaunchException(LauncherExitCodes.EXIT_USAGE,
USAGE_MESSAGE);
}
force = true;
fsURI = arg.get(1);
}
bindFromCLI(fsURI);
}
}
/**
* Extract the host from the FS URI, then scan and
* delete all entries from that bucket.
* @return the exit code.
* @throws ServiceLaunchException on failure.
* @throws IOException IO failure.
*/
@Override
public int execute() throws ServiceLaunchException, IOException {
URI uri = getUri();
String host = uri.getHost();
String prefix = "/" + host + "/";
DynamoDBMetadataStore ddbms = getStore();
S3GuardTableAccess tableAccess = new S3GuardTableAccess(ddbms);
ExpressionSpecBuilder builder = new ExpressionSpecBuilder();
builder.withKeyCondition(
ExpressionSpecBuilder.S(PARENT).beginsWith(prefix));
LOG.info("Scanning for entries with prefix {} to delete from {}",
prefix, ddbms);
Iterable<DDBPathMetadata> entries =
ddbms.wrapWithRetries(tableAccess.scanMetadata(builder));
List<Path> list = new ArrayList<>();
entries.iterator().forEachRemaining(e -> {
if (!(e instanceof S3GuardTableAccess.VersionMarker)) {
Path p = e.getFileStatus().getPath();
String type = e.getFileStatus().isFile() ? "file" : "directory";
boolean tombstone = e.isDeleted();
if (tombstone) {
type = "tombstone " + type;
}
LOG.info("{} {}", type, p);
list.add(p);
}
});
int count = list.size();
filesFound = count;
LOG.info("Found {} entries{}",
count,
(count == 0 ? " -nothing to purge": ""));
if (count > 0) {
if (force) {
DurationInfo duration =
new DurationInfo(LOG,
"deleting %s entries from %s",
count, ddbms.toString());
// sending this in one by one for more efficient retries
for (Path path: list) {
ddbms.getInvoker()
.retry("delete",
prefix,
true,
() -> tableAccess.delete(path));
}
duration.close();
long durationMillis = duration.value();
long timePerEntry = durationMillis / count;
LOG.info("Time per entry: {} ms", timePerEntry);
filesDeleted = count;
} else {
LOG.info("Delete process will only be executed when "
+ FORCE + " is set");
}
}
return LauncherExitCodes.EXIT_SUCCESS;
}
/**
* This is the Main entry point for the service launcher.
*
* Converts the arguments to a list, instantiates a instance of the class
* then executes it.
* @param args command line arguments.
*/
public static void main(String[] args) {
try {
serviceMain(Arrays.asList(args), new PurgeS3GuardDynamoTable());
} catch (ExitUtil.ExitException e) {
ExitUtil.terminate(e);
}
}
/**
* API Entry point to dump the metastore and S3 store world views
* <p>
* Both the FS and the store will be dumped: the store is scanned
* before and after the sequence to show what changes were made to
* the store during the list operation.
* @param fs fs to dump. If null a store must be provided.
* @param store store to dump (fallback to FS)
* @param conf configuration to use (fallback to fs)
* @param uri URI of store -only needed if FS is null.
* @param force force the actual delete
* @return (filesFound, filesDeleted)
* @throws ExitUtil.ExitException failure.
*/
@InterfaceAudience.Private
@InterfaceStability.Unstable
public static Pair<Long, Long> purgeStore(
@Nullable final S3AFileSystem fs,
@Nullable DynamoDBMetadataStore store,
@Nullable Configuration conf,
@Nullable URI uri,
boolean force) throws ExitUtil.ExitException {
ServiceLauncher<Service> serviceLauncher =
new ServiceLauncher<>(NAME);
if (conf == null) {
conf = checkNotNull(fs, "No filesystem").getConf();
}
if (store == null) {
store = (DynamoDBMetadataStore) checkNotNull(fs, "No filesystem")
.getMetadataStore();
}
PurgeS3GuardDynamoTable purge = new PurgeS3GuardDynamoTable(fs,
store,
uri,
force);
ExitUtil.ExitException ex = serviceLauncher.launchService(
conf,
purge,
Collections.emptyList(),
false,
true);
if (ex != null && ex.getExitCode() != 0) {
throw ex;
}
return Pair.of(purge.filesFound, purge.filesDeleted);
}
}

View File

@ -1,275 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.util.Collection;
import java.util.List;
import com.amazonaws.SdkBaseException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3ObjectAttributes;
import org.apache.hadoop.fs.s3a.impl.StoreContext;
import org.apache.hadoop.fs.s3a.impl.AbstractStoreOperation;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.util.DurationInfo;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.s3a.S3AUtils.translateException;
/**
* A class which manages updating the metastore with the rename process
* as initiated in the S3AFilesystem rename.
* <p>
* Subclasses must provide an implementation and return it in
* {@code MetadataStore.initiateRenameOperation()}.
* <p>
* The {@link #operationState} field/constructor argument is an opaque state to
* be passed down to the metastore in its move operations; this allows the
* stores to manage ongoing state -while still being able to share
* rename tracker implementations.
* <p>
* This is to avoid performance problems wherein the progressive rename
* tracker causes the store to repeatedly create and write duplicate
* ancestor entries for every file added.
*/
public abstract class RenameTracker extends AbstractStoreOperation {
public static final Logger LOG = LoggerFactory.getLogger(
RenameTracker.class);
/** source path. */
private final Path sourceRoot;
/** destination path. */
private final Path dest;
/**
* Track the duration of this operation.
*/
private final DurationInfo durationInfo;
/**
* Generated name for strings.
*/
private final String name;
/**
* Any ongoing state supplied to the rename tracker
* which is to be passed in with each move operation.
* This must be closed at the end of the tracker's life.
*/
private final BulkOperationState operationState;
/**
* The metadata store for this tracker.
* Always non-null.
* <p>
* This is passed in separate from the store context to guarantee
* that whichever store creates a tracker is explicitly bound to that
* instance.
*/
private final MetadataStore metadataStore;
/**
* Constructor.
* @param name tracker name for logs.
* @param storeContext store context.
* @param metadataStore the store
* @param sourceRoot source path.
* @param dest destination path.
* @param operationState ongoing move state.
*/
protected RenameTracker(
final String name,
final StoreContext storeContext,
final MetadataStore metadataStore,
final Path sourceRoot,
final Path dest,
final BulkOperationState operationState) {
super(checkNotNull(storeContext));
checkNotNull(storeContext.getUsername(), "No username");
this.metadataStore = checkNotNull(metadataStore);
this.sourceRoot = checkNotNull(sourceRoot);
this.dest = checkNotNull(dest);
this.operationState = operationState;
this.name = String.format("%s (%s, %s)", name, sourceRoot, dest);
durationInfo = new DurationInfo(LOG, false,
name +" (%s, %s)", sourceRoot, dest);
}
@Override
public String toString() {
return name;
}
public Path getSourceRoot() {
return sourceRoot;
}
public Path getDest() {
return dest;
}
public String getOwner() {
return getStoreContext().getUsername();
}
public BulkOperationState getOperationState() {
return operationState;
}
/**
* Get the metadata store.
* @return a non-null store.
*/
protected MetadataStore getMetadataStore() {
return metadataStore;
}
/**
* A file has been copied.
*
* @param childSource source of the file. This may actually be different
* from the path of the sourceAttributes. (HOW?)
* @param sourceAttributes status of source.
* @param destAttributes destination attributes
* @param destPath destination path.
* @param blockSize block size.
* @param addAncestors should ancestors be added?
* @throws IOException failure.
*/
public abstract void fileCopied(
Path childSource,
S3ObjectAttributes sourceAttributes,
S3ObjectAttributes destAttributes,
Path destPath,
long blockSize,
boolean addAncestors) throws IOException;
/**
* A directory marker has been copied.
* @param sourcePath source path.
* @param destPath destination path.
* @param addAncestors should ancestors be added?
* @throws IOException failure.
*/
public void directoryMarkerCopied(
Path sourcePath,
Path destPath,
boolean addAncestors) throws IOException {
}
/**
* The delete failed.
* <p>
* By the time this is called, the metastore will already have
* been updated with the results of any partial delete failure,
* such that all files known to have been deleted will have been
* removed.
* @param e exception
* @param pathsToDelete paths which were to be deleted.
* @param undeletedObjects list of objects which were not deleted.
*/
public IOException deleteFailed(
final Exception e,
final List<Path> pathsToDelete,
final List<Path> undeletedObjects) {
return convertToIOException(e);
}
/**
* Top level directory move.
* This is invoked after all child entries have been copied
* @throws IOException on failure
*/
public void moveSourceDirectory() throws IOException {
}
/**
* Note that source objects have been deleted.
* The metastore will already have been updated.
* @param paths path of objects deleted.
*/
public void sourceObjectsDeleted(
final Collection<Path> paths) throws IOException {
}
/**
* Complete the operation.
* @throws IOException failure.
*/
public void completeRename() throws IOException {
IOUtils.cleanupWithLogger(LOG, operationState);
noteRenameFinished();
}
/**
* Note that the rename has finished by closing the duration info;
* this will log the duration of the operation at debug.
*/
protected void noteRenameFinished() {
durationInfo.close();
}
/**
* Rename has failed.
* <p>
* The metastore now needs to be updated with its current state
* even though the operation is incomplete.
* Implementations MUST NOT throw exceptions here, as this is going to
* be invoked in an exception handler.
* catch and log or catch and return/wrap.
* <p>
* The base implementation returns the IOE passed in and translates
* any AWS exception into an IOE.
* @param ex the exception which caused the failure.
* This is either an IOException or and AWS exception
* @return an IOException to throw in an exception.
*/
public IOException renameFailed(Exception ex) {
LOG.debug("Rename has failed", ex);
IOUtils.cleanupWithLogger(LOG, operationState);
noteRenameFinished();
return convertToIOException(ex);
}
/**
* Convert a passed in exception (expected to be an IOE or AWS exception)
* into an IOException.
* @param ex exception caught
* @return the exception to throw in the failure handler.
*/
protected IOException convertToIOException(final Exception ex) {
if (ex instanceof IOException) {
return (IOException) ex;
} else if (ex instanceof SdkBaseException) {
return translateException("rename " + sourceRoot + " to " + dest,
sourceRoot.toString(),
(SdkBaseException) ex);
} else {
// should never happen, but for completeness
return new IOException(ex);
}
}
}

View File

@ -1,126 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.util.Iterator;
import org.apache.hadoop.fs.s3a.Invoker;
import org.apache.hadoop.fs.s3a.Retries;
/**
* A collection which wraps the result of a query or scan
* with retries.
* Important: iterate through this only once; the outcome
* of repeating an iteration is "undefined"
* @param <T> type of outcome.
*/
class RetryingCollection<T> implements Iterable<T> {
/**
* Source iterable.
*/
private final Iterable<T> source;
/**
* Invoker for retries.
*/
private final Invoker invoker;
/**
* Operation name for invoker.retry messages.
*/
private final String operation;
/**
* Constructor.
* @param operation Operation name for invoker.retry messages.
* @param invoker Invoker for retries.
* @param source Source iterable.
*/
RetryingCollection(
final String operation,
final Invoker invoker,
final Iterable<T> source) {
this.operation = operation;
this.source = source;
this.invoker = invoker;
}
/**
* Demand creates a new iterator which will retry all hasNext/next
* operations through the invoker supplied in the constructor.
* @return a new iterator.
*/
@Override
public Iterator<T> iterator() {
return new RetryingIterator(source.iterator());
}
/**
* An iterator which wraps a non-retrying iterator of scan results
* (i.e {@code S3GuardTableAccess.DDBPathMetadataIterator}.
*/
private final class RetryingIterator implements Iterator<T> {
private final Iterator<T> iterator;
private RetryingIterator(final Iterator<T> iterator) {
this.iterator = iterator;
}
/**
* {@inheritDoc}.
* @throws UncheckedIOException for IO failure, including throttling.
*/
@Override
@Retries.RetryTranslated
public boolean hasNext() {
try {
return invoker.retry(
operation,
null,
true,
iterator::hasNext);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
/**
* {@inheritDoc}.
* @throws UncheckedIOException for IO failure, including throttling.
*/
@Override
@Retries.RetryTranslated
public T next() {
try {
return invoker.retry(
"Scan Dynamo",
null,
true,
iterator::next);
} catch (IOException e) {
throw new UncheckedIOException(e);
}
}
}
}

View File

@ -1,47 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.util.concurrent.TimeUnit;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.s3a.S3ARetryPolicy;
import org.apache.hadoop.io.retry.RetryPolicy;
import static org.apache.hadoop.fs.s3a.Constants.*;
import static org.apache.hadoop.io.retry.RetryPolicies.exponentialBackoffRetry;
/**
* A Retry policy whose throttling comes from the S3Guard config options.
*/
public class S3GuardDataAccessRetryPolicy extends S3ARetryPolicy {
public S3GuardDataAccessRetryPolicy(final Configuration conf) {
super(conf);
}
protected RetryPolicy createThrottleRetryPolicy(final Configuration conf) {
return exponentialBackoffRetry(
conf.getInt(S3GUARD_DDB_MAX_RETRIES, S3GUARD_DDB_MAX_RETRIES_DEFAULT),
conf.getTimeDuration(S3GUARD_DDB_THROTTLE_RETRY_INTERVAL,
S3GUARD_DDB_THROTTLE_RETRY_INTERVAL_DEFAULT,
TimeUnit.MILLISECONDS),
TimeUnit.MILLISECONDS);
}
}

View File

@ -1,764 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* <p>
* http://www.apache.org/licenses/LICENSE-2.0
* <p>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.security.InvalidParameterException;
import java.util.ArrayDeque;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Queue;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.PrimaryKey;
import com.amazonaws.services.dynamodbv2.document.ScanOutcome;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport;
import com.amazonaws.services.dynamodbv2.document.spec.GetItemSpec;
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder;
import org.apache.hadoop.util.Preconditions;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.fs.s3a.Tristate;
import org.apache.hadoop.util.StopWatch;
import static java.util.stream.Collectors.toList;
import static java.util.stream.Collectors.toSet;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.itemToPathMetadata;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.pathToKey;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.pathToParentKey;
/**
* Main class for the FSCK factored out from S3GuardTool
* The implementation uses fixed DynamoDBMetadataStore as the backing store
* for metadata.
*
* Functions:
* <ul>
* <li>Checking metadata consistency between S3 and metadatastore</li>
* <li>Checking the internal metadata consistency</li>
* </ul>
*/
public class S3GuardFsck {
private static final Logger LOG = LoggerFactory.getLogger(S3GuardFsck.class);
public static final String ROOT_PATH_STRING = "/";
private final S3AFileSystem rawFS;
private final DynamoDBMetadataStore metadataStore;
private static final long MOD_TIME_RANGE = 2000L;
/**
* Creates an S3GuardFsck.
* @param fs the filesystem to compare to
* @param ms metadatastore the metadatastore to compare with (dynamo)
*/
public S3GuardFsck(S3AFileSystem fs, MetadataStore ms)
throws InvalidParameterException {
this.rawFS = fs;
if (ms == null) {
throw new InvalidParameterException("S3A Bucket " + fs.getBucket()
+ " should be guarded by a "
+ DynamoDBMetadataStore.class.getCanonicalName());
}
this.metadataStore = (DynamoDBMetadataStore) ms;
Preconditions.checkArgument(!rawFS.hasMetadataStore(),
"Raw fs should not have a metadatastore.");
}
/**
* Compares S3 to MS.
* Iterative breadth first walk on the S3 structure from a given root.
* Creates a list of pairs (metadata in S3 and in the MetadataStore) where
* the consistency or any rule is violated.
* Uses {@link S3GuardFsckViolationHandler} to handle violations.
* The violations are listed in Enums: {@link Violation}
*
* @param p the root path to start the traversal
* @return a list of {@link ComparePair}
* @throws IOException
*/
public List<ComparePair> compareS3ToMs(Path p) throws IOException {
StopWatch stopwatch = new StopWatch();
stopwatch.start();
int scannedItems = 0;
final Path rootPath = rawFS.qualify(p);
S3AFileStatus root = (S3AFileStatus) rawFS.getFileStatus(rootPath);
final List<ComparePair> comparePairs = new ArrayList<>();
final Queue<S3AFileStatus> queue = new ArrayDeque<>();
queue.add(root);
while (!queue.isEmpty()) {
final S3AFileStatus currentDir = queue.poll();
final Path currentDirPath = currentDir.getPath();
try {
List<FileStatus> s3DirListing = Arrays.asList(
rawFS.listStatus(currentDirPath));
// Check authoritative directory flag.
compareAuthoritativeDirectoryFlag(comparePairs, currentDirPath,
s3DirListing);
// Add all descendant directory to the queue
s3DirListing.stream().filter(pm -> pm.isDirectory())
.map(S3AFileStatus.class::cast)
.forEach(pm -> queue.add(pm));
// Check file and directory metadata for consistency.
final List<S3AFileStatus> children = s3DirListing.stream()
.filter(status -> !status.isDirectory())
.map(S3AFileStatus.class::cast).collect(toList());
final List<ComparePair> compareResult =
compareS3DirContentToMs(currentDir, children);
comparePairs.addAll(compareResult);
// Increase the scanned file size.
// One for the directory, one for the children.
scannedItems++;
scannedItems += children.size();
} catch (FileNotFoundException e) {
LOG.error("The path has been deleted since it was queued: "
+ currentDirPath, e);
}
}
stopwatch.stop();
// Create a handler and handle each violated pairs
S3GuardFsckViolationHandler handler =
new S3GuardFsckViolationHandler(rawFS, metadataStore);
for (ComparePair comparePair : comparePairs) {
handler.logError(comparePair);
}
LOG.info("Total scan time: {}s", stopwatch.now(TimeUnit.SECONDS));
LOG.info("Scanned entries: {}", scannedItems);
return comparePairs;
}
/**
* Compare the directory contents if the listing is authoritative.
*
* @param comparePairs the list of compare pairs to add to
* if it contains a violation
* @param currentDirPath the current directory path
* @param s3DirListing the s3 directory listing to compare with
* @throws IOException
*/
private void compareAuthoritativeDirectoryFlag(List<ComparePair> comparePairs,
Path currentDirPath, List<FileStatus> s3DirListing) throws IOException {
final DirListingMetadata msDirListing =
metadataStore.listChildren(currentDirPath);
if (msDirListing != null && msDirListing.isAuthoritative()) {
ComparePair cP = new ComparePair(s3DirListing, msDirListing);
if (s3DirListing.size() != msDirListing.numEntries()) {
cP.violations.add(Violation.AUTHORITATIVE_DIRECTORY_CONTENT_MISMATCH);
} else {
final Set<Path> msPaths = msDirListing.getListing().stream()
.map(pm -> pm.getFileStatus().getPath()).collect(toSet());
final Set<Path> s3Paths = s3DirListing.stream()
.map(pm -> pm.getPath()).collect(toSet());
if (!s3Paths.equals(msPaths)) {
cP.violations.add(Violation.AUTHORITATIVE_DIRECTORY_CONTENT_MISMATCH);
}
}
if (cP.containsViolation()) {
comparePairs.add(cP);
}
}
}
/**
* Compares S3 directory content to the metadata store.
*
* @param s3CurrentDir file status of the current directory
* @param children the contents of the directory
* @return the compare pairs with violations of consistency
* @throws IOException
*/
protected List<ComparePair> compareS3DirContentToMs(
S3AFileStatus s3CurrentDir,
List<S3AFileStatus> children) throws IOException {
final Path path = s3CurrentDir.getPath();
final PathMetadata pathMetadata = metadataStore.get(path);
List<ComparePair> violationComparePairs = new ArrayList<>();
final ComparePair rootComparePair =
compareFileStatusToPathMetadata(s3CurrentDir, pathMetadata);
if (rootComparePair.containsViolation()) {
violationComparePairs.add(rootComparePair);
}
children.forEach(s3ChildMeta -> {
try {
final PathMetadata msChildMeta =
metadataStore.get(s3ChildMeta.getPath());
final ComparePair comparePair =
compareFileStatusToPathMetadata(s3ChildMeta, msChildMeta);
if (comparePair.containsViolation()) {
violationComparePairs.add(comparePair);
}
} catch (Exception e) {
LOG.error(e.getMessage(), e);
}
});
return violationComparePairs;
}
/**
* Compares a {@link S3AFileStatus} from S3 to a {@link PathMetadata}
* from the metadata store. Finds violated invariants and consistency
* issues.
*
* @param s3FileStatus the file status from S3
* @param msPathMetadata the path metadata from metadatastore
* @return {@link ComparePair} with the found issues
* @throws IOException
*/
protected ComparePair compareFileStatusToPathMetadata(
S3AFileStatus s3FileStatus,
PathMetadata msPathMetadata) throws IOException {
final Path path = s3FileStatus.getPath();
if (msPathMetadata != null) {
LOG.info("Path: {} - Length S3: {}, MS: {} " +
"- Etag S3: {}, MS: {} ",
path,
s3FileStatus.getLen(), msPathMetadata.getFileStatus().getLen(),
s3FileStatus.getETag(), msPathMetadata.getFileStatus().getETag());
} else {
LOG.info("Path: {} - Length S3: {} - Etag S3: {}, no record in MS.",
path, s3FileStatus.getLen(), s3FileStatus.getETag());
}
ComparePair comparePair = new ComparePair(s3FileStatus, msPathMetadata);
if (!path.equals(path(ROOT_PATH_STRING))) {
final Path parentPath = path.getParent();
final PathMetadata parentPm = metadataStore.get(parentPath);
if (parentPm == null) {
comparePair.violations.add(Violation.NO_PARENT_ENTRY);
} else {
if (!parentPm.getFileStatus().isDirectory()) {
comparePair.violations.add(Violation.PARENT_IS_A_FILE);
}
if (parentPm.isDeleted()) {
comparePair.violations.add(Violation.PARENT_TOMBSTONED);
}
}
} else {
LOG.debug("Entry is in the root directory, so there's no parent");
}
// If the msPathMetadata is null, we RETURN because
// there is no metadata compare with
if (msPathMetadata == null) {
comparePair.violations.add(Violation.NO_METADATA_ENTRY);
return comparePair;
}
final S3AFileStatus msFileStatus = msPathMetadata.getFileStatus();
if (s3FileStatus.isDirectory() && !msFileStatus.isDirectory()) {
comparePair.violations.add(Violation.DIR_IN_S3_FILE_IN_MS);
}
if (!s3FileStatus.isDirectory() && msFileStatus.isDirectory()) {
comparePair.violations.add(Violation.FILE_IN_S3_DIR_IN_MS);
}
if(msPathMetadata.isDeleted()) {
comparePair.violations.add(Violation.TOMBSTONED_IN_MS_NOT_DELETED_IN_S3);
}
/**
* Attribute check
*/
if (s3FileStatus.getLen() != msFileStatus.getLen()) {
comparePair.violations.add(Violation.LENGTH_MISMATCH);
}
// ModTime should be in the accuracy range defined.
long modTimeDiff = Math.abs(
s3FileStatus.getModificationTime() - msFileStatus.getModificationTime()
);
if (modTimeDiff > MOD_TIME_RANGE) {
comparePair.violations.add(Violation.MOD_TIME_MISMATCH);
}
if(msPathMetadata.getFileStatus().getVersionId() == null
|| s3FileStatus.getVersionId() == null) {
LOG.debug("Missing versionIDs skipped. A HEAD request is "
+ "required for each object to get the versionID.");
} else if(!s3FileStatus.getVersionId().equals(msFileStatus.getVersionId())) {
comparePair.violations.add(Violation.VERSIONID_MISMATCH);
}
// check etag only for files, and not directories
if (!s3FileStatus.isDirectory()) {
if (msPathMetadata.getFileStatus().getETag() == null) {
comparePair.violations.add(Violation.NO_ETAG);
} else if (s3FileStatus.getETag() != null &&
!s3FileStatus.getETag().equals(msFileStatus.getETag())) {
comparePair.violations.add(Violation.ETAG_MISMATCH);
}
}
return comparePair;
}
private Path path(String s) {
return rawFS.makeQualified(new Path(s));
}
/**
* Fix violations found during check.
*
* Currently only supports handling the following violation:
* - Violation.ORPHAN_DDB_ENTRY
*
* @param violations to be handled
* @throws IOException throws the error if there's any during handling
*/
public void fixViolations(List<ComparePair> violations) throws IOException {
S3GuardFsckViolationHandler handler =
new S3GuardFsckViolationHandler(rawFS, metadataStore);
for (ComparePair v : violations) {
if (v.getViolations().contains(Violation.ORPHAN_DDB_ENTRY)) {
try {
handler.doFix(v);
} catch (IOException e) {
LOG.error("Error during handling the violation: ", e);
throw e;
}
}
}
}
/**
* A compare pair with the pair of metadata and the list of violations.
*/
public static class ComparePair {
private final S3AFileStatus s3FileStatus;
private final PathMetadata msPathMetadata;
private final List<FileStatus> s3DirListing;
private final DirListingMetadata msDirListing;
private final Path path;
private final Set<Violation> violations = new HashSet<>();
ComparePair(S3AFileStatus status, PathMetadata pm) {
this.s3FileStatus = status;
this.msPathMetadata = pm;
this.s3DirListing = null;
this.msDirListing = null;
if (status != null) {
this.path = status.getPath();
} else {
this.path = pm.getFileStatus().getPath();
}
}
ComparePair(List<FileStatus> s3DirListing, DirListingMetadata msDirListing) {
this.s3DirListing = s3DirListing;
this.msDirListing = msDirListing;
this.s3FileStatus = null;
this.msPathMetadata = null;
this.path = msDirListing.getPath();
}
public S3AFileStatus getS3FileStatus() {
return s3FileStatus;
}
public PathMetadata getMsPathMetadata() {
return msPathMetadata;
}
public Set<Violation> getViolations() {
return violations;
}
public boolean containsViolation() {
return !violations.isEmpty();
}
public DirListingMetadata getMsDirListing() {
return msDirListing;
}
public List<FileStatus> getS3DirListing() {
return s3DirListing;
}
public Path getPath() {
return path;
}
@Override public String toString() {
return "ComparePair{" + "s3FileStatus=" + s3FileStatus
+ ", msPathMetadata=" + msPathMetadata + ", s3DirListing=" +
s3DirListing + ", msDirListing=" + msDirListing + ", path="
+ path + ", violations=" + violations + '}';
}
}
/**
* Check the DynamoDB metadatastore internally for consistency.
* <pre>
* Tasks to do here:
* - find orphan entries (entries without a parent).
* - find if a file's parent is not a directory (so the parent is a file).
* - find entries where the parent is a tombstone.
* - warn: no lastUpdated field.
* </pre>
*/
public List<ComparePair> checkDdbInternalConsistency(Path basePath)
throws IOException {
Preconditions.checkArgument(basePath.isAbsolute(), "path must be absolute");
List<ComparePair> comparePairs = new ArrayList<>();
String rootStr = basePath.toString();
LOG.info("Root for internal consistency check: {}", rootStr);
StopWatch stopwatch = new StopWatch();
stopwatch.start();
final Table table = metadataStore.getTable();
final String username = metadataStore.getUsername();
DDBTree ddbTree = new DDBTree();
/*
* I. Root node construction
* - If the root node is the real bucket root, a node is constructed instead of
* doing a query to the ddb because the bucket root is not stored.
* - If the root node is not a real bucket root then the entry is queried from
* the ddb and constructed from the result.
*/
DDBPathMetadata baseMeta;
if (!basePath.isRoot()) {
PrimaryKey rootKey = pathToKey(basePath);
final GetItemSpec spec = new GetItemSpec()
.withPrimaryKey(rootKey)
.withConsistentRead(true);
final Item baseItem = table.getItem(spec);
baseMeta = itemToPathMetadata(baseItem, username);
if (baseMeta == null) {
throw new FileNotFoundException(
"Base element metadata is null. " +
"This means the base path element is missing, or wrong path was " +
"passed as base path to the internal ddb consistency checker.");
}
} else {
baseMeta = new DDBPathMetadata(
new S3AFileStatus(Tristate.UNKNOWN, basePath, username)
);
}
DDBTreeNode root = new DDBTreeNode(baseMeta);
ddbTree.addNode(root);
ddbTree.setRoot(root);
/*
* II. Build and check the descendant tree:
* 1. query all nodes where the prefix is the given root, and put it in the tree
* 2. Check connectivity: check if each parent is in the hashmap
* - This is done in O(n): we only need to find the parent based on the
* path with a hashmap lookup.
* - Do a test if the graph is connected - if the parent is not in the
* hashmap, we found an orphan entry.
*
* 3. Do test the elements for errors:
* - File is a parent of a file.
* - Entries where the parent is tombstoned but the entries are not.
* - Warn on no lastUpdated field.
*
*/
ExpressionSpecBuilder builder = new ExpressionSpecBuilder();
builder.withCondition(
ExpressionSpecBuilder.S("parent")
.beginsWith(pathToParentKey(basePath))
);
final IteratorSupport<Item, ScanOutcome> resultIterator = table.scan(
builder.buildForScan()).iterator();
resultIterator.forEachRemaining(item -> {
final DDBPathMetadata pmd = itemToPathMetadata(item, username);
DDBTreeNode ddbTreeNode = new DDBTreeNode(pmd);
ddbTree.addNode(ddbTreeNode);
});
LOG.debug("Root: {}", ddbTree.getRoot());
for (Map.Entry<Path, DDBTreeNode> entry : ddbTree.getContentMap().entrySet()) {
final DDBTreeNode node = entry.getValue();
final ComparePair pair = new ComparePair(null, node.val);
// let's skip the root node when checking.
if (node.getVal().getFileStatus().getPath().isRoot()) {
continue;
}
if(node.getVal().getLastUpdated() == 0) {
pair.violations.add(Violation.NO_LASTUPDATED_FIELD);
}
// skip further checking the basenode which is not the actual bucket root.
if (node.equals(ddbTree.getRoot())) {
continue;
}
final Path parent = node.getFileStatus().getPath().getParent();
final DDBTreeNode parentNode = ddbTree.getContentMap().get(parent);
if (parentNode == null) {
pair.violations.add(Violation.ORPHAN_DDB_ENTRY);
} else {
if (!node.isTombstoned() && !parentNode.isDirectory()) {
pair.violations.add(Violation.PARENT_IS_A_FILE);
}
if(!node.isTombstoned() && parentNode.isTombstoned()) {
pair.violations.add(Violation.PARENT_TOMBSTONED);
}
}
if (!pair.violations.isEmpty()) {
comparePairs.add(pair);
}
node.setParent(parentNode);
}
// Create a handler and handle each violated pairs
S3GuardFsckViolationHandler handler =
new S3GuardFsckViolationHandler(rawFS, metadataStore);
for (ComparePair comparePair : comparePairs) {
handler.logError(comparePair);
}
stopwatch.stop();
LOG.info("Total scan time: {}s", stopwatch.now(TimeUnit.SECONDS));
LOG.info("Scanned entries: {}", ddbTree.contentMap.size());
return comparePairs;
}
/**
* DDBTree is the tree that represents the structure of items in the DynamoDB.
*/
public static class DDBTree {
private final Map<Path, DDBTreeNode> contentMap = new HashMap<>();
private DDBTreeNode root;
public DDBTree() {
}
public Map<Path, DDBTreeNode> getContentMap() {
return contentMap;
}
public DDBTreeNode getRoot() {
return root;
}
public void setRoot(DDBTreeNode root) {
this.root = root;
}
public void addNode(DDBTreeNode pm) {
contentMap.put(pm.getVal().getFileStatus().getPath(), pm);
}
@Override
public String toString() {
return "DDBTree{" +
"contentMap=" + contentMap +
", root=" + root +
'}';
}
}
/**
* Tree node for DDBTree.
*/
private static final class DDBTreeNode {
private final DDBPathMetadata val;
private DDBTreeNode parent;
private final List<DDBPathMetadata> children;
private DDBTreeNode(DDBPathMetadata pm) {
this.val = pm;
this.parent = null;
this.children = new ArrayList<>();
}
public DDBPathMetadata getVal() {
return val;
}
public DDBTreeNode getParent() {
return parent;
}
public void setParent(DDBTreeNode parent) {
this.parent = parent;
}
public List<DDBPathMetadata> getChildren() {
return children;
}
public boolean isDirectory() {
return val.getFileStatus().isDirectory();
}
public S3AFileStatus getFileStatus() {
return val.getFileStatus();
}
public boolean isTombstoned() {
return val.isDeleted();
}
@Override
public String toString() {
return "DDBTreeNode{" +
"val=" + val +
", parent=" + parent +
", children=" + children +
'}';
}
}
/**
* Violation with severity and the handler.
* Defines the severity of the violation between 0-2
* where 0 is the most severe and 2 is the least severe.
*/
public enum Violation {
/**
* No entry in metadatastore.
*/
NO_METADATA_ENTRY(1,
S3GuardFsckViolationHandler.NoMetadataEntry.class),
/**
* A file or directory entry does not have a parent entry - excluding
* files and directories in the root.
*/
NO_PARENT_ENTRY(0,
S3GuardFsckViolationHandler.NoParentEntry.class),
/**
* An entrys parent is a file.
*/
PARENT_IS_A_FILE(0,
S3GuardFsckViolationHandler.ParentIsAFile.class),
/**
* A file exists under a path for which there is a
* tombstone entry in the MS.
*/
PARENT_TOMBSTONED(0,
S3GuardFsckViolationHandler.ParentTombstoned.class),
/**
* A directory in S3 is a file entry in the MS.
*/
DIR_IN_S3_FILE_IN_MS(0,
S3GuardFsckViolationHandler.DirInS3FileInMs.class),
/**
* A file in S3 is a directory in the MS.
*/
FILE_IN_S3_DIR_IN_MS(0,
S3GuardFsckViolationHandler.FileInS3DirInMs.class),
AUTHORITATIVE_DIRECTORY_CONTENT_MISMATCH(1,
S3GuardFsckViolationHandler.AuthDirContentMismatch.class),
/**
* An entry in the MS is tombstoned, but the object is not deleted on S3.
*/
TOMBSTONED_IN_MS_NOT_DELETED_IN_S3(0,
S3GuardFsckViolationHandler.TombstonedInMsNotDeletedInS3.class),
/**
* Attribute mismatch.
*/
LENGTH_MISMATCH(0,
S3GuardFsckViolationHandler.LengthMismatch.class),
MOD_TIME_MISMATCH(2,
S3GuardFsckViolationHandler.ModTimeMismatch.class),
/**
* If there's a versionID the mismatch is severe.
*/
VERSIONID_MISMATCH(0,
S3GuardFsckViolationHandler.VersionIdMismatch.class),
/**
* If there's an etag the mismatch is severe.
*/
ETAG_MISMATCH(0,
S3GuardFsckViolationHandler.EtagMismatch.class),
/**
* Don't worry too much if we don't have an etag.
*/
NO_ETAG(2,
S3GuardFsckViolationHandler.NoEtag.class),
/**
* The entry does not have a parent in ddb.
*/
ORPHAN_DDB_ENTRY(0, S3GuardFsckViolationHandler.OrphanDDBEntry.class),
/**
* The entry's lastUpdated field is empty.
*/
NO_LASTUPDATED_FIELD(2,
S3GuardFsckViolationHandler.NoLastUpdatedField.class);
private final int severity;
private final Class<? extends S3GuardFsckViolationHandler.ViolationHandler> handler;
Violation(int s,
Class<? extends S3GuardFsckViolationHandler.ViolationHandler> h) {
this.severity = s;
this.handler = h;
}
public int getSeverity() {
return severity;
}
public Class<? extends S3GuardFsckViolationHandler.ViolationHandler> getHandler() {
return handler;
}
}
}

View File

@ -1,425 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
* <p>
* http://www.apache.org/licenses/LICENSE-2.0
* <p>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.io.IOException;
import java.lang.reflect.InvocationTargetException;
import java.util.Arrays;
import java.util.List;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
/**
* Violation handler for the S3Guard's fsck.
*/
public class S3GuardFsckViolationHandler {
private static final Logger LOG = LoggerFactory.getLogger(
S3GuardFsckViolationHandler.class);
// The rawFS and metadataStore are here to prepare when the ViolationHandlers
// will not just log, but fix the violations, so they will have access.
private final S3AFileSystem rawFs;
private final DynamoDBMetadataStore metadataStore;
private static String newLine = System.getProperty("line.separator");
public enum HandleMode {
FIX, LOG
}
public S3GuardFsckViolationHandler(S3AFileSystem fs,
DynamoDBMetadataStore ddbms) {
this.metadataStore = ddbms;
this.rawFs = fs;
}
public void logError(S3GuardFsck.ComparePair comparePair) throws IOException {
if (!comparePair.containsViolation()) {
LOG.debug("There is no violation in the compare pair: {}", comparePair);
return;
}
StringBuilder sB = new StringBuilder();
sB.append(newLine)
.append("On path: ").append(comparePair.getPath()).append(newLine);
handleComparePair(comparePair, sB, HandleMode.LOG);
LOG.error(sB.toString());
}
public void doFix(S3GuardFsck.ComparePair comparePair) throws IOException {
if (!comparePair.containsViolation()) {
LOG.debug("There is no violation in the compare pair: {}", comparePair);
return;
}
StringBuilder sB = new StringBuilder();
sB.append(newLine)
.append("On path: ").append(comparePair.getPath()).append(newLine);
handleComparePair(comparePair, sB, HandleMode.FIX);
LOG.info(sB.toString());
}
/**
* Create a new instance of the violation handler for all the violations
* found in the compare pair and use it.
*
* @param comparePair the compare pair with violations
* @param sB StringBuilder to append error strings from violations.
*/
protected void handleComparePair(S3GuardFsck.ComparePair comparePair,
StringBuilder sB, HandleMode handleMode) throws IOException {
for (S3GuardFsck.Violation violation : comparePair.getViolations()) {
try {
ViolationHandler handler = violation.getHandler()
.getDeclaredConstructor(S3GuardFsck.ComparePair.class)
.newInstance(comparePair);
switch (handleMode) {
case FIX:
final String errorStr = handler.getError();
sB.append(errorStr);
break;
case LOG:
final String fixStr = handler.fixViolation(rawFs, metadataStore);
sB.append(fixStr);
break;
default:
throw new UnsupportedOperationException("Unknown handleMode: " + handleMode);
}
} catch (NoSuchMethodException e) {
LOG.error("Can not find declared constructor for handler: {}",
violation.getHandler());
} catch (IllegalAccessException | InstantiationException | InvocationTargetException e) {
LOG.error("Can not instantiate handler: {}",
violation.getHandler());
}
sB.append(newLine);
}
}
/**
* Violation handler abstract class.
* This class should be extended for violation handlers.
*/
public static abstract class ViolationHandler {
private final PathMetadata pathMetadata;
private final S3AFileStatus s3FileStatus;
private final S3AFileStatus msFileStatus;
private final List<FileStatus> s3DirListing;
private final DirListingMetadata msDirListing;
public ViolationHandler(S3GuardFsck.ComparePair comparePair) {
pathMetadata = comparePair.getMsPathMetadata();
s3FileStatus = comparePair.getS3FileStatus();
if (pathMetadata != null) {
msFileStatus = pathMetadata.getFileStatus();
} else {
msFileStatus = null;
}
s3DirListing = comparePair.getS3DirListing();
msDirListing = comparePair.getMsDirListing();
}
public abstract String getError();
public PathMetadata getPathMetadata() {
return pathMetadata;
}
public S3AFileStatus getS3FileStatus() {
return s3FileStatus;
}
public S3AFileStatus getMsFileStatus() {
return msFileStatus;
}
public List<FileStatus> getS3DirListing() {
return s3DirListing;
}
public DirListingMetadata getMsDirListing() {
return msDirListing;
}
public String fixViolation(S3AFileSystem fs,
DynamoDBMetadataStore ddbms) throws IOException {
return String.format("Fixing of violation: %s is not supported yet.",
this.getClass().getSimpleName());
}
}
/**
* The violation handler when there's no matching metadata entry in the MS.
*/
public static class NoMetadataEntry extends ViolationHandler {
public NoMetadataEntry(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "No PathMetadata for this path in the MS.";
}
}
/**
* The violation handler when there's no parent entry.
*/
public static class NoParentEntry extends ViolationHandler {
public NoParentEntry(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "Entry does not have a parent entry (not root)";
}
}
/**
* The violation handler when the parent of an entry is a file.
*/
public static class ParentIsAFile extends ViolationHandler {
public ParentIsAFile(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "The entry's parent in the metastore database is a file.";
}
}
/**
* The violation handler when the parent of an entry is tombstoned.
*/
public static class ParentTombstoned extends ViolationHandler {
public ParentTombstoned(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "The entry in the metastore database has a parent entry " +
"which is a tombstone marker";
}
}
/**
* The violation handler when there's a directory is a file metadata in MS.
*/
public static class DirInS3FileInMs extends ViolationHandler {
public DirInS3FileInMs(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "A directory in S3 is a file entry in the MS";
}
}
/**
* The violation handler when a file metadata is a directory in MS.
*/
public static class FileInS3DirInMs extends ViolationHandler {
public FileInS3DirInMs(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "A file in S3 is a directory entry in the MS";
}
}
/**
* The violation handler when there's a directory listing content mismatch.
*/
public static class AuthDirContentMismatch extends ViolationHandler {
public AuthDirContentMismatch(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
final String str = String.format(
"The content of an authoritative directory listing does "
+ "not match the content of the S3 listing. S3: %s, MS: %s",
Arrays.asList(getS3DirListing()), getMsDirListing().getListing());
return str;
}
}
/**
* The violation handler when there's a length mismatch.
*/
public static class LengthMismatch extends ViolationHandler {
public LengthMismatch(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override public String getError() {
return String.format("File length mismatch - S3: %s, MS: %s",
getS3FileStatus().getLen(), getMsFileStatus().getLen());
}
}
/**
* The violation handler when there's a modtime mismatch.
*/
public static class ModTimeMismatch extends ViolationHandler {
public ModTimeMismatch(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return String.format("File timestamp mismatch - S3: %s, MS: %s",
getS3FileStatus().getModificationTime(),
getMsFileStatus().getModificationTime());
}
}
/**
* The violation handler when there's a version id mismatch.
*/
public static class VersionIdMismatch extends ViolationHandler {
public VersionIdMismatch(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return String.format("getVersionId mismatch - S3: %s, MS: %s",
getS3FileStatus().getVersionId(), getMsFileStatus().getVersionId());
}
}
/**
* The violation handler when there's an etag mismatch.
*/
public static class EtagMismatch extends ViolationHandler {
public EtagMismatch(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return String.format("Etag mismatch - S3: %s, MS: %s",
getS3FileStatus().getETag(), getMsFileStatus().getETag());
}
}
/**
* The violation handler when there's no etag.
*/
public static class NoEtag extends ViolationHandler {
public NoEtag(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "No etag.";
}
}
/**
* The violation handler when there's a tombstoned entry in the ms is
* present, but the object is not deleted in S3.
*/
public static class TombstonedInMsNotDeletedInS3 extends ViolationHandler {
public TombstonedInMsNotDeletedInS3(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "The entry for the path is tombstoned in the MS.";
}
}
/**
* The violation handler there's no parent in the MetadataStore.
*/
public static class OrphanDDBEntry extends ViolationHandler {
public OrphanDDBEntry(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "The DDB entry is orphan - there is no parent in the MS.";
}
@Override
public String fixViolation(S3AFileSystem fs, DynamoDBMetadataStore ddbms)
throws IOException {
final Path path = getPathMetadata().getFileStatus().getPath();
ddbms.forgetMetadata(path);
return String.format(
"Fixing violation by removing metadata entry from the " +
"MS on path: %s", path);
}
}
/**
* The violation handler when there's no last updated field for the entry.
*/
public static class NoLastUpdatedField extends ViolationHandler {
public NoLastUpdatedField(S3GuardFsck.ComparePair comparePair) {
super(comparePair);
}
@Override
public String getError() {
return "No lastUpdated field provided for the entry.";
}
}
}

View File

@ -1,256 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import java.util.Collection;
import java.util.Iterator;
import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.dynamodbv2.document.ItemCollection;
import com.amazonaws.services.dynamodbv2.document.QueryOutcome;
import com.amazonaws.services.dynamodbv2.document.ScanOutcome;
import com.amazonaws.services.dynamodbv2.document.Table;
import com.amazonaws.services.dynamodbv2.document.internal.IteratorSupport;
import com.amazonaws.services.dynamodbv2.document.spec.QuerySpec;
import com.amazonaws.services.dynamodbv2.xspec.ExpressionSpecBuilder;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.commons.lang3.tuple.Pair;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.Retries;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import static org.apache.hadoop.util.Preconditions.checkNotNull;
import static org.apache.hadoop.fs.s3a.s3guard.DynamoDBMetadataStore.VERSION_MARKER_ITEM_NAME;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.CHILD;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.PARENT;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.TABLE_VERSION;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.itemToPathMetadata;
import static org.apache.hadoop.fs.s3a.s3guard.PathMetadataDynamoDBTranslation.pathToKey;
/**
* Package-scoped accessor to table state in S3Guard.
* This is for maintenance, diagnostics and testing: it is <i>not</i> to
* be used otherwise.
* <ol>
* <li>
* Some of the operations here may dramatically alter the state of
* a table, so use carefully.
* </li>
* <li>
* Operations to assess consistency of a store are best executed
* against a table which is otherwise inactive.
* </li>
* <li>
* No retry/throttling or AWS to IOE logic here.
* </li>
* <li>
* If a scan or query includes the version marker in the result, it
* is converted to a {@link VersionMarker} instance.
* </li>
* </ol>
*
*/
@InterfaceAudience.Private
@InterfaceStability.Unstable
@Retries.OnceRaw
class S3GuardTableAccess {
private static final Logger LOG =
LoggerFactory.getLogger(S3GuardTableAccess.class);
/**
* Store instance to work with.
*/
private final DynamoDBMetadataStore store;
/**
* Table; retrieved from the store.
*/
private final Table table;
/**
* Construct.
* @param store store to work with.
*/
S3GuardTableAccess(final DynamoDBMetadataStore store) {
this.store = checkNotNull(store);
this.table = checkNotNull(store.getTable());
}
/**
* Username of user in store.
* @return a string.
*/
private String getUsername() {
return store.getUsername();
}
/**
* Execute a query.
* @param spec query spec.
* @return the outcome.
*/
@Retries.OnceRaw
ItemCollection<QueryOutcome> query(QuerySpec spec) {
return table.query(spec);
}
/**
* Issue a query where the result is to be an iterator over
* the entries
* of DDBPathMetadata instances.
* @param spec query spec.
* @return an iterator over path entries.
*/
@Retries.OnceRaw
Iterable<DDBPathMetadata> queryMetadata(QuerySpec spec) {
return new DDBPathMetadataCollection<>(query(spec));
}
@Retries.OnceRaw
ItemCollection<ScanOutcome> scan(ExpressionSpecBuilder spec) {
return table.scan(spec.buildForScan());
}
@Retries.OnceRaw
Iterable<DDBPathMetadata> scanMetadata(ExpressionSpecBuilder spec) {
return new DDBPathMetadataCollection<>(scan(spec));
}
@Retries.OnceRaw
void delete(Collection<Path> paths) {
paths.stream()
.map(PathMetadataDynamoDBTranslation::pathToKey)
.forEach(table::deleteItem);
}
@Retries.OnceRaw
void delete(Path path) {
table.deleteItem(pathToKey(path));
}
/**
* A collection which wraps the result of a query or scan.
* Important: iterate through this only once; the outcome
* of repeating an iteration is "undefined"
* @param <T> type of outcome.
*/
private final class DDBPathMetadataCollection<T>
implements Iterable<DDBPathMetadata> {
/**
* Query/scan result.
*/
private final ItemCollection<T> outcome;
/**
* Instantiate.
* @param outcome query/scan outcome.
*/
private DDBPathMetadataCollection(final ItemCollection<T> outcome) {
this.outcome = outcome;
}
/**
* Get the iterator.
* @return the iterator.
*/
@Override
public Iterator<DDBPathMetadata> iterator() {
return new DDBPathMetadataIterator<>(outcome.iterator());
}
}
/**
* An iterator which converts the iterated-over result of
* a query or scan into a {@code DDBPathMetadataIterator} entry.
* @param <T> type of source.
*/
private final class DDBPathMetadataIterator<T> implements
Iterator<DDBPathMetadata> {
/**
* Iterator to invoke.
*/
private final IteratorSupport<Item, T> it;
/**
* Instantiate.
* @param it Iterator to invoke.
*/
private DDBPathMetadataIterator(final IteratorSupport<Item, T> it) {
this.it = it;
}
@Override
@Retries.OnceRaw
public boolean hasNext() {
return it.hasNext();
}
@Override
@Retries.OnceRaw
public DDBPathMetadata next() {
Item item = it.next();
Pair<String, String> key = primaryKey(item);
if (VERSION_MARKER_ITEM_NAME.equals(key.getLeft()) &&
VERSION_MARKER_ITEM_NAME.equals(key.getRight())) {
// a version marker is found, return the special type
return new VersionMarker(item);
} else {
return itemToPathMetadata(item, getUsername());
}
}
}
/**
* DDBPathMetadata subclass returned when a query returns
* the version marker.
* There is a FileStatus returned where the owner field contains
* the table version; the path is always the unqualified path "/VERSION".
* Because it is unqualified, operations which treat this as a normal
* DDB metadata entry usually fail.
*/
static final class VersionMarker extends DDBPathMetadata {
/**
* Instantiate.
* @param versionMarker the version marker.
*/
VersionMarker(Item versionMarker) {
super(new S3AFileStatus(true, new Path("/VERSION"),
"" + versionMarker.getString(TABLE_VERSION)));
}
}
/**
* Given an item, split it to the parent and child fields.
* @param item item to split.
* @return (parent, child).
*/
private static Pair<String, String> primaryKey(Item item) {
return Pair.of(item.getString(PARENT), item.getString(CHILD));
}
}

View File

@ -1,34 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.s3guard;
import org.apache.hadoop.fs.PathIOException;
/**
* An exception raised when a table being deleted is still present after
* the wait time is exceeded.
*/
public class TableDeleteTimeoutException extends PathIOException {
TableDeleteTimeoutException(final String path,
final String error,
final Throwable cause) {
super(path, error, cause);
}
}

View File

@ -17,10 +17,9 @@
*/
/**
* This package contains classes related to S3Guard: a feature of S3A to mask
* the eventual consistency behavior of S3 and optimize access patterns by
* coordinating with a strongly consistent external store for file system
* metadata.
* This package contained S3Guard support; now the feature has been removed,
* its contents are limited to the public command line and some
* methods still used by directory marker code.
*/
@InterfaceAudience.Private
@InterfaceStability.Evolving

View File

@ -337,7 +337,7 @@ public synchronized void seek(long newPos) throws IOException {
/**
* Build an exception to raise when an operation is not supported here.
* @param action action which is unsupported.
* @param action action which is Unsupported.
* @return an exception to throw.
*/
protected PathIOException unsupported(final String action) {

View File

@ -264,7 +264,7 @@ public int run(String[] args, PrintStream out)
stream = FutureIOSupport.awaitFuture(builder.build());
} catch (FileNotFoundException e) {
// the source file is missing.
throw storeNotFound(e);
throw notFound(e);
}
try {
if (toConsole) {

View File

@ -18,19 +18,11 @@
package org.apache.hadoop.fs.s3a.statistics;
import org.apache.hadoop.fs.s3a.s3guard.MetastoreInstrumentation;
/**
* This is the statistics context for ongoing operations in S3A.
*/
public interface S3AStatisticsContext extends CountersAndGauges {
/**
* Get the metastore instrumentation.
* @return an instance of the metastore statistics tracking.
*/
MetastoreInstrumentation getS3GuardInstrumentation();
/**
* Create a stream input statistics instance.
* @return the new instance

View File

@ -25,7 +25,6 @@
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.s3a.S3AInstrumentation;
import org.apache.hadoop.fs.s3a.Statistic;
import org.apache.hadoop.fs.s3a.s3guard.MetastoreInstrumentation;
import org.apache.hadoop.fs.s3a.statistics.BlockOutputStreamStatistics;
import org.apache.hadoop.fs.s3a.statistics.CommitterStatistics;
import org.apache.hadoop.fs.s3a.statistics.DelegationTokenStatistics;
@ -94,16 +93,6 @@ private FileSystem.Statistics getInstanceStatistics() {
return statisticsSource.getInstanceStatistics();
}
/**
* Get a MetastoreInstrumentation getInstrumentation() instance for this
* context.
* @return the S3Guard getInstrumentation() point.
*/
@Override
public MetastoreInstrumentation getS3GuardInstrumentation() {
return getInstrumentation().getS3GuardInstrumentation();
}
/**
* Create a stream input statistics instance.
* The FileSystem.Statistics instance of the {@link #statisticsSource}

View File

@ -22,8 +22,6 @@
import java.time.Duration;
import org.apache.hadoop.fs.s3a.Statistic;
import org.apache.hadoop.fs.s3a.s3guard.MetastoreInstrumentation;
import org.apache.hadoop.fs.s3a.s3guard.MetastoreInstrumentationImpl;
import org.apache.hadoop.fs.s3a.statistics.BlockOutputStreamStatistics;
import org.apache.hadoop.fs.s3a.statistics.ChangeTrackerStatistics;
import org.apache.hadoop.fs.s3a.statistics.CommitterStatistics;
@ -49,9 +47,6 @@
*/
public final class EmptyS3AStatisticsContext implements S3AStatisticsContext {
public static final MetastoreInstrumentation
METASTORE_INSTRUMENTATION = new MetastoreInstrumentationImpl();
public static final S3AInputStreamStatistics
EMPTY_INPUT_STREAM_STATISTICS = new EmptyInputStreamStatistics();
@ -69,11 +64,6 @@ public final class EmptyS3AStatisticsContext implements S3AStatisticsContext {
public static final StatisticsFromAwsSdk
EMPTY_STATISTICS_FROM_AWS_SDK = new EmptyStatisticsFromAwsSdk();
@Override
public MetastoreInstrumentation getS3GuardInstrumentation() {
return METASTORE_INSTRUMENTATION;
}
@Override
public S3AInputStreamStatistics newInputStreamStatistics() {
return EMPTY_INPUT_STREAM_STATISTICS;

View File

@ -73,10 +73,7 @@
import static org.apache.hadoop.service.launcher.LauncherExitCodes.EXIT_USAGE;
/**
* Audit and S3 bucket for directory markers.
* <p></p>
* This tool does not go anywhere near S3Guard; its scan bypasses any
* metastore as we are explicitly looking for marker objects.
* Audit an S3 bucket for directory markers.
*/
@InterfaceAudience.LimitedPrivate("management tools")
@InterfaceStability.Unstable
@ -818,10 +815,9 @@ pages, suffix(pages),
int end = Math.min(start + deletePageSize, size);
List<DeleteObjectsRequest.KeyVersion> page = markerKeys.subList(start,
end);
List<Path> undeleted = new ArrayList<>();
once("Remove S3 Keys",
tracker.getBasePath().toString(), () ->
operations.removeKeys(page, true, undeleted, null, false));
operations.removeKeys(page, true, false));
summary.deleteRequests++;
// and move to the start of the next page
start = end;

View File

@ -31,7 +31,6 @@
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.Retries;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
/**
* Operations which must be offered by the store for {@link MarkerTool}.
@ -41,8 +40,7 @@
public interface MarkerToolOperations {
/**
* Create an iterator over objects in S3 only; S3Guard
* is not involved.
* Create an iterator over objects in S3.
* The listing includes the key itself, if found.
* @param path path of the listing.
* @param key object key
@ -56,17 +54,10 @@ RemoteIterator<S3AFileStatus> listObjects(
throws IOException;
/**
* Remove keys from the store, updating the metastore on a
* partial delete represented as a MultiObjectDeleteException failure by
* deleting all those entries successfully deleted and then rethrowing
* the MultiObjectDeleteException.
* Remove keys from the store.
* @param keysToDelete collection of keys to delete on the s3-backend.
* if empty, no request is made of the object store.
* @param deleteFakeDir indicates whether this is for deleting fake dirs.
* @param undeletedObjectsOnFailure List which will be built up of all
* files that were not deleted. This happens even as an exception
* is raised.
* @param operationState bulk operation state
* @param quiet should a bulk query be quiet, or should its result list
* all deleted keys
* @return the deletion result if a multi object delete was invoked
@ -82,8 +73,6 @@ RemoteIterator<S3AFileStatus> listObjects(
DeleteObjectsResult removeKeys(
List<DeleteObjectsRequest.KeyVersion> keysToDelete,
boolean deleteFakeDir,
List<Path> undeletedObjectsOnFailure,
BulkOperationState operationState,
boolean quiet)
throws MultiObjectDeleteException, AmazonClientException,
IOException;

View File

@ -30,7 +30,6 @@
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.s3a.S3AFileStatus;
import org.apache.hadoop.fs.s3a.impl.OperationCallbacks;
import org.apache.hadoop.fs.s3a.s3guard.BulkOperationState;
/**
* Implement the marker tool operations by forwarding to the
@ -59,12 +58,10 @@ public RemoteIterator<S3AFileStatus> listObjects(final Path path,
public DeleteObjectsResult removeKeys(
final List<DeleteObjectsRequest.KeyVersion> keysToDelete,
final boolean deleteFakeDir,
final List<Path> undeletedObjectsOnFailure,
final BulkOperationState operationState,
final boolean quiet)
throws MultiObjectDeleteException, AmazonClientException, IOException {
return operationCallbacks.removeKeys(keysToDelete, deleteFakeDir,
undeletedObjectsOnFailure, operationState, quiet);
quiet);
}
}

View File

@ -18,7 +18,7 @@
if ! declare -f hadoop_subcommand_s3guard >/dev/null 2>/dev/null; then
if [[ "${HADOOP_SHELL_EXECNAME}" = hadoop ]]; then
hadoop_add_subcommand "s3guard" client "manage metadata on S3"
hadoop_add_subcommand "s3guard" client "S3 Commands"
fi
# this can't be indented otherwise shelldocs won't get it

View File

@ -39,7 +39,7 @@ are, how to configure their policies, etc.
* You need a role to assume, and know its "ARN".
* You need a pair of long-lived IAM User credentials, not the root account set.
* Have the AWS CLI installed, and test that it works there.
* Give the role access to S3, and, if using S3Guard, to DynamoDB.
* Give the role access to S3.
* For working with data encrypted with SSE-KMS, the role must
have access to the appropriate KMS keys.
@ -234,9 +234,6 @@ s3:Get*
s3:ListBucket
```
When using S3Guard, the client needs the appropriate
<a href="s3guard-permissions">DynamoDB access permissions</a>
To use SSE-KMS encryption, the client needs the
<a href="sse-kms-permissions">SSE-KMS Permissions</a> to access the
KMS key(s).
@ -277,47 +274,6 @@ If the caller doesn't have these permissions, the operation will fail with an
`AccessDeniedException`: the S3 Store does not provide the specifics of
the cause of the failure.
### <a name="s3guard-permissions"></a> S3Guard Permissions
To use S3Guard, all clients must have a subset of the
[AWS DynamoDB Permissions](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/api-permissions-reference.html).
To work with buckets protected with S3Guard, the client must have
all the following rights on the DynamoDB Table used to protect that bucket.
```
dynamodb:BatchGetItem
dynamodb:BatchWriteItem
dynamodb:DeleteItem
dynamodb:DescribeTable
dynamodb:GetItem
dynamodb:PutItem
dynamodb:Query
dynamodb:UpdateItem
```
This is true, *even if the client only has read access to the data*.
For the `hadoop s3guard` table management commands, _extra_ permissions are required:
```
dynamodb:CreateTable
dynamodb:DescribeLimits
dynamodb:DeleteTable
dynamodb:Scan
dynamodb:TagResource
dynamodb:UntagResource
dynamodb:UpdateTable
```
Without these permissions, tables cannot be created, destroyed or have their IO capacity
changed through the `s3guard set-capacity` call.
The `dynamodb:Scan` permission is needed for `s3guard prune`
The `dynamodb:CreateTable` permission is needed by a client when it tries to
create the DynamoDB table on startup, that is
`fs.s3a.s3guard.ddb.table.create` is `true` and the table does not already exist.
### <a name="mixed-permissions"></a> Mixed Permissions in a single S3 Bucket
Mixing permissions down the "directory tree" is limited
@ -348,10 +304,6 @@ file will exist.
For a directory copy, only a partial copy of the source data may take place
before the permission failure is raised.
*S3Guard*: if [S3Guard](s3guard.html) is used to manage the directory listings,
then after partial failures of rename/copy the DynamoDB tables can get out of sync.
### Example: Read access to the base, R/W to the path underneath
This example has the base bucket read only, and a directory underneath,
@ -818,29 +770,6 @@ Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Se
Note: the ability to read encrypted data in the store does not guarantee that the caller can encrypt new data.
It is a separate permission.
### <a name="dynamodb_exception"></a> `AccessDeniedException` + `AmazonDynamoDBException`
```
java.nio.file.AccessDeniedException: bucket1:
com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException:
User: arn:aws:sts::980678866538:assumed-role/s3guard-test-role/test is not authorized to perform:
dynamodb:DescribeTable on resource: arn:aws:dynamodb:us-west-1:980678866538:table/bucket1
(Service: AmazonDynamoDBv2; Status Code: 400;
```
The caller is trying to access an S3 bucket which uses S3Guard, but the caller
lacks the relevant DynamoDB access permissions.
The `dynamodb:DescribeTable` operation is the first one used in S3Guard to access,
the DynamoDB table, so it is often the first to fail. It can be a sign
that the role has no permissions at all to access the table named in the exception,
or just that this specific permission has been omitted.
If the role policy requested for the assumed role didn't ask for any DynamoDB
permissions, this is where all attempts to work with a S3Guarded bucket will
fail. Check the value of `fs.s3a.assumed.role.policy`
### Error `Unable to execute HTTP request`
This is a low-level networking error. Possible causes include:

View File

@ -1818,7 +1818,7 @@ directory on the job commit, so is *very* expensive, and not something which
we recommend when working with S3.
To use a S3Guard committer, it must also be identified as the Parquet committer.
To use an S3A committer, it must also be identified as the Parquet committer.
The fact that instances are dynamically instantiated somewhat complicates the process.
In early tests; we can switch committers for ORC output without making any changes
@ -1928,12 +1928,6 @@ files.
### Security Risks of all committers
#### Visibility
[Obsolete] If S3Guard is used for storing metadata, then the metadata is visible to
all users with read access. A malicious user with write access could delete
entries of newly generated files, so they would not be visible.
#### Malicious Serialized Data

View File

@ -755,10 +755,10 @@ in configuration option fs.s3a.committer.magic.enabled
The Job is configured to use the magic committer, but the S3A bucket has not been explicitly
declared as supporting it.
The Job is configured to use the magic committer, but the S3A bucket has not been explicitly declared as supporting it.
This can be done for those buckets which are known to be consistent, either
because [S3Guard](s3guard.html) is used to provide consistency,
or because the S3-compatible filesystem is known to be strongly consistent.
As this is now true by default, this error will only surface with a configuration which has explicitly disabled it.
Remove all global/per-bucket declarations of `fs.s3a.bucket.magic.enabled` or set them to `true`
```xml
<property>
@ -767,29 +767,35 @@ or because the S3-compatible filesystem is known to be strongly consistent.
</property>
```
Tip: you can verify that a bucket supports the magic committer through the
`hadoop s3guard bucket-info` command:
```
> hadoop s3guard bucket-info -magic s3a://landsat-pds/
Filesystem s3a://landsat-pds
Location: us-west-2
Filesystem s3a://landsat-pds is not using S3Guard
The "magic" committer is not supported
S3A Client
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
Endpoint: fs.s3a.endpoint=s3.amazonaws.com
Encryption: fs.s3a.server-side-encryption-algorithm=none
Input seek policy: fs.s3a.experimental.input.fadvise=normal
Change Detection Source: fs.s3a.change.detection.source=etag
Change Detection Mode: fs.s3a.change.detection.mode=server
Delegation token support is disabled
2019-05-17 13:53:38,245 [main] INFO util.ExitUtil (ExitUtil.java:terminate(210)) -
Exiting with status 46: 46: The magic committer is not enabled for s3a://landsat-pds
Signing Algorithm: fs.s3a.signing-algorithm=(unset)
Endpoint: fs.s3a.endpoint=s3.amazonaws.com
Encryption: fs.s3a.encryption.algorithm=none
Input seek policy: fs.s3a.experimental.input.fadvise=normal
Change Detection Source: fs.s3a.change.detection.source=etag
Change Detection Mode: fs.s3a.change.detection.mode=server
S3A Committers
The "magic" committer is supported in the filesystem
S3A Committer factory class: mapreduce.outputcommitter.factory.scheme.s3a=org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory
S3A Committer name: fs.s3a.committer.name=magic
Store magic committer integration: fs.s3a.committer.magic.enabled=true
Security
Delegation token support is disabled
Directory Markers
The directory marker policy is "delete"
Available Policies: delete, keep, authoritative
Authoritative paths: fs.s3a.authoritative.path=```
```
### Error message: "File being created has a magic path, but the filesystem has magic file support disabled"
@ -802,11 +808,6 @@ This message should not appear through the committer itself &mdash;it will
fail with the error message in the previous section, but may arise
if other applications are attempting to create files under the path `/__magic/`.
Make sure the filesystem meets the requirements of the magic committer
(a consistent S3A filesystem through S3Guard or the S3 service itself),
and set the `fs.s3a.committer.magic.enabled` flag to indicate that magic file
writes are supported.
### `FileOutputCommitter` appears to be still used (from logs or delays in commits)

View File

@ -91,7 +91,7 @@ of:
These credentials are obtained from the AWS Secure Token Service (STS) when the the token is issued.
* A set of AWS session credentials binding the user to a specific AWS IAM Role,
further restricted to only access the S3 bucket and matching S3Guard DynamoDB table.
further restricted to only access the S3 bucket.
Again, these credentials are requested when the token is issued.
@ -404,7 +404,6 @@ Else: as with session delegation tokens, an STS client is created. This time
set to restrict access to:
* CRUD access the specific bucket a token is being requested for
* CRUD access to the contents of any S3Guard DDB used (not admin rights though).
* access to all KMS keys (assumption: AWS KMS is where restrictions are set up)
*Example Generated Role Policy*
@ -428,12 +427,7 @@ set to restrict access to:
"Effect" : "Allow",
"Action" : [ "kms:Decrypt", "kms:GenerateDataKey" ],
"Resource" : "arn:aws:kms:*"
}, {
"Sid" : "9",
"Effect" : "Allow",
"Action" : [ "dynamodb:BatchGetItem", "dynamodb:BatchWriteItem", "dynamodb:DeleteItem", "dynamodb:DescribeTable", "dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query", "dynamodb:UpdateItem" ],
"Resource" : "arn:aws:dynamodb:eu-west-1:980678866fff:table/example-bucket"
} ]
}]
}
```

Some files were not shown because too many files have changed in this diff Show More