hadoop/hadoop-tools/hadoop-distcp/src
Steve Loughran 20eec95867
HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail against S3 (#1936)
Contributed by Steve Loughran.

This strips out all the -p preservation options which have already been
processed when uploading a file before deciding whether or not to query
the far end for the status of the (existing/uploaded) file to see if any
other attributes need changing.

This will avoid 404 caching-related issues in S3, wherein a newly created
file can have a 404 entry in the S3 load balancer's cache from the
probes for the file's existence prior to the upload.

It partially addresses a regression caused by HADOOP-8143,
"Change distcp to have -pb on by default" that causes a resurfacing
of HADOOP-13145, "In DistCp, prevent unnecessary getFileStatus call when
not preserving metadata"
2020-04-07 17:55:55 +01:00
..
main HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail against S3 (#1936) 2020-04-07 17:55:55 +01:00
site HDFS-14788. Use dynamic regex filter to ignore copy of source files in Distcp. 2020-01-06 19:10:39 +00:00
test HADOOP-16932. distcp copy calls getFileStatus() needlessly and can fail against S3 (#1936) 2020-04-07 17:55:55 +01:00