Go to file
Steve Loughran 9221704f85
HADOOP-16490. Avoid/handle cached 404s during S3A file creation.
Contributed by Steve Loughran.

This patch avoids issuing any HEAD path request when creating a file with overwrite=true,
so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFile
in their own code.

The Hadoop FsShell CommandWithDestination class is modified to not register uncreated files
for deleteOnExit(), because that calls exists() and so can place the 404 in the cache, even
after S3A is patched to not do it itself.

Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policy
independently configurable from other retry policies; it is also exponential, but with
different parameters. This is because every HEAD request will refresh any 404 cached in
the S3 Load Balancers. It's not enough to retry: we have to have a suitable gap between
attempts to (hopefully) ensure any cached entry wil be gone.

The options and values are:

fs.s3a.s3guard.consistency.retry.interval: 2s
fs.s3a.s3guard.consistency.retry.limit: 7

The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caught
so not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated.

Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way:
it will fail fast with a different error message. However, because create(path, overwrite=false) no
longer does HEAD path, we can at least be confident that S3A itself is not creating those cached
404 markers.

Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d
2019-09-11 16:46:25 +01:00
.github HADOOP-15184. Add GitHub pull request template. (#1419) 2019-09-11 11:10:11 +09:00
dev-support HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
hadoop-assemblies HADOOP-16534. Exclude submarine from hadoop source build. (#1356) 2019-09-03 17:40:38 +05:30
hadoop-build-tools HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-client-modules HADOOP-15998. Ensure jar validation works on Windows. 2019-08-29 23:09:04 -05:00
hadoop-cloud-storage-project HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-common-project HADOOP-16490. Avoid/handle cached 404s during S3A file creation. 2019-09-11 16:46:25 +01:00
hadoop-dist HDFS-14639. [Dynamometer] Remove unnecessary duplicate directory from the distribution. Contributed by Erik Krogen. 2019-07-29 13:50:14 -07:00
hadoop-hdds HDDS-2048: State check during container state transition in datanode should be lock protected (#1375) 2019-09-10 14:14:52 +05:30
hadoop-hdfs-project HDFS-14838. RBF: Display RPC (instead of HTTP) Port Number in RBF web UI. Contributed by Xieming Li 2019-09-11 16:54:08 +09:00
hadoop-mapreduce-project HADOOP-16549. Remove Unsupported SSL/TLS Versions from Docs/Properties. Contributed by Daisuke Kobayashi. 2019-09-10 10:51:47 +08:00
hadoop-maven-plugins HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-minicluster HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-ozone HDDS-2103. TestContainerReplication fails due to unhealthy container (#1421) 2019-09-11 19:49:10 +05:30
hadoop-project HADOOP-16542. Update commons-beanutils version to 1.9.4. Contributed by kevin su. 2019-09-10 19:58:34 +08:00
hadoop-project-dist HADOOP-16331. Fix ASF License check in pom.xml 2019-05-29 17:25:13 +09:00
hadoop-submarine SUBMARINE-45. Can't specify queue by using the parameter --queue. Contributed by Ayush Saxena, Zac Zhou. 2019-08-15 13:18:29 +08:00
hadoop-tools HADOOP-16490. Avoid/handle cached 404s during S3A file creation. 2019-09-11 16:46:25 +01:00
hadoop-yarn-project YARN-9824. Fall back to configured queue ordering policy class name 2019-09-10 15:19:07 -07:00
licenses HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
licenses-binary HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
.gitattributes HADOOP-13598. Add eol=lf for unix format files in .gitattributes. Contributed by Yiqun Lin. 2016-09-14 11:14:31 +09:00
.gitignore HDDS-1115. Provide ozone specific top-level pom.xml. 2019-02-24 14:40:52 -08:00
BUILDING.txt HADOOP-16263. Update BUILDING.txt with macOS native build instructions. Contributed by Siyao Meng. 2019-06-11 15:04:59 -07:00
Jenkinsfile HADOOP-16183. Use latest Yetus to support ozone specific build process 2019-05-02 16:48:30 +02:00
LICENSE-binary HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
LICENSE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
NOTICE-binary HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
NOTICE.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
pom.ozone.xml HDDS-2077. Add maven-gpg-plugin.version to pom.ozone.xml. (#1396) 2019-09-04 15:28:59 +05:30
pom.xml HADOOP-15184. Add GitHub pull request template. (#1419) 2019-09-11 11:10:11 +09:00
README.txt HADOOP-15958. Revisiting LICENSE and NOTICE files. 2019-08-27 13:47:12 +09:00
start-build-env.sh HADOOP-16240. start-build-env.sh can consume all disk space during image creation. 2019-04-10 08:48:11 -07:00

For the latest information about Hadoop, please visit our website at:

   http://hadoop.apache.org/

and our wiki, at:

   https://cwiki.apache.org/confluence/display/HADOOP/