20eec95867
Contributed by Steve Loughran. This strips out all the -p preservation options which have already been processed when uploading a file before deciding whether or not to query the far end for the status of the (existing/uploaded) file to see if any other attributes need changing. This will avoid 404 caching-related issues in S3, wherein a newly created file can have a 404 entry in the S3 load balancer's cache from the probes for the file's existence prior to the upload. It partially addresses a regression caused by HADOOP-8143, "Change distcp to have -pb on by default" that causes a resurfacing of HADOOP-13145, "In DistCp, prevent unnecessary getFileStatus call when not preserving metadata" |
||
---|---|---|
.. | ||
src | ||
pom.xml | ||
README |
DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.