Go to file
Steve Loughran 49df838995
HADOOP-16697. Tune/audit S3A authoritative mode.
Contains:

HADOOP-16474. S3Guard ProgressiveRenameTracker to mark destination
              dirirectory as authoritative on success.
HADOOP-16684. S3guard bucket info to list a bit more about
              authoritative paths.
HADOOP-16722. S3GuardTool to support FilterFileSystem.

This patch improves the marking of newly created/import directory
trees in S3Guard DynamoDB tables as authoritative.

Specific changes:

 * Renamed directories are marked as authoritative if the entire
   operation succeeded (HADOOP-16474).
 * When updating parent table entries as part of any table write,
   there's no overwriting of their authoritative flag.

s3guard import changes:

* new -verbose flag to print out what is going on.

* The "s3guard import" command lets you declare that a directory tree
is to be marked as authoritative

  hadoop s3guard import -authoritative -verbose s3a://bucket/path

When importing a listing and a file is found, the import tool queries
the metastore and only updates the entry if the file is different from
before, where different == new timestamp, etag, or length. S3Guard can get
timestamp differences due to clock skew in PUT operations.

As the recursive list performed by the import command doesn't retrieve the
versionID, the existing entry may in fact be more complete.
When updating an existing due to clock skew the existing version ID
is propagated to the new entry (note: the etags must match; this is needed
to deal with inconsistent listings).

There is a new s3guard command to audit a s3guard bucket/path's
authoritative state:

  hadoop s3guard authoritative -check-config s3a://bucket/path

This is primarily for testing/auditing.

The s3guard bucket-info command also provides some more details on the
authoritative state of a store (HADOOP-16684).

Change-Id: I58001341c04f6f3597fcb4fcb1581ccefeb77d91
2020-01-10 11:11:56 +00:00
.github HADOOP-15184. Add GitHub pull request template. (#1419) 2019-09-11 11:10:11 +09:00
dev-support YARN-10054. Upgrade yarn to 1.21.1 in Dockerfile. (#1777) 2019-12-23 14:08:14 +09:00
hadoop-assemblies HADOOP-16654:Delete hadoop-ozone and hadoop-hdds subprojects from apache trunk 2019-11-15 14:53:28 -05:00
hadoop-build-tools HADOOP-16771. Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0. Contributed by Andras Bokor. 2019-12-20 13:10:26 +09:00
hadoop-client-modules HADOOP-16614. Add aarch64 support for dependent leveldbjni. 2019-10-24 11:45:57 -04:00
hadoop-cloud-storage-project HADOOP-16702. Move documentation of hadoop-cos to under src directory. 2019-11-12 17:47:17 +09:00
hadoop-common-project HADOOP-16697. Tune/audit S3A authoritative mode. 2020-01-10 11:11:56 +00:00
hadoop-dist HDFS-14639. [Dynamometer] Remove unnecessary duplicate directory from the distribution. Contributed by Erik Krogen. 2019-07-29 13:50:14 -07:00
hadoop-hdfs-project HDFS-15110. HttpFS: post requests are not supported for path "/". Contributed by hemanthboyina. 2020-01-10 17:53:19 +09:00
hadoop-mapreduce-project YARN-9018. Add functionality to AuxiliaryLocalPathHandler to return all locations to read for a given path. Contributed by Kuhu Shukla (kshukla) 2020-01-09 17:18:44 +00:00
hadoop-maven-plugins HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-minicluster HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-project YARN-10071. Sync Mockito version with other modules 2020-01-10 17:41:04 +09:00
hadoop-project-dist Make upstream aware of 3.2.1 release. 2019-09-23 06:20:54 +00:00
hadoop-submarine SUBMARINE-45. Can't specify queue by using the parameter --queue. Contributed by Ayush Saxena, Zac Zhou. 2019-08-15 13:18:29 +08:00
hadoop-tools HADOOP-16697. Tune/audit S3A authoritative mode. 2020-01-10 11:11:56 +00:00
hadoop-yarn-project YARN-10071. Sync Mockito version with other modules 2020-01-10 17:41:04 +09:00
licenses HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
licenses-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HDDS-1115. Provide ozone specific top-level pom.xml. 2019-02-24 14:40:52 -08:00
BUILDING.txt HADOOP-16744. Fix building instruction to enable zstd. (#1736) 2019-12-06 15:25:20 +09:00
Jenkinsfile HADOOP-16110 Upgrade to yetus 0.11.1 and use emoji vote on github pre commit (#1527). Contributed by Duo Zhang. 2019-11-19 14:21:49 +05:30
LICENSE-binary HADOOP-15993. Upgrade Kafka to 2.4.0 in hadoop-kafka module. (#1796) 2020-01-09 16:24:58 +09:00
LICENSE.txt YARN-9561. Add C changes for the new RuncContainerRuntime. Contributed by Eric Badger 2019-12-09 01:25:10 +00:00
NOTICE-binary HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.xml HADOOP-16771. Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0. Contributed by Andras Bokor. 2019-12-20 13:10:26 +09:00
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-16240. start-build-env.sh can consume all disk space during image creation. 2019-04-10 08:48:11 -07:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/