HDFS-16556. Fix typos in distcp (#4217)

This commit is contained in:
GuoPhilipse 2022-04-23 02:01:20 +08:00 committed by GitHub
parent f84b88dd6b
commit 214f369073
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -49,7 +49,7 @@ Overview
[The erstwhile implementation of DistCp]
(http://hadoop.apache.org/docs/r1.2.1/distcp.html) has its share of quirks
and drawbacks, both in its usage, as well as its extensibility and
and drawbacks, both in its usage and its extensibility and
performance. The purpose of the DistCp refactor was to fix these
shortcomings, enabling it to be used and extended programmatically. New
paradigms have been introduced to improve runtime and setup performance,
@ -179,7 +179,7 @@ $H3 Update and Overwrite
hdfs://nn2:8020/target/10 32
hdfs://nn2:8020/target/20 64
Will effect:
The result will be:
hdfs://nn2:8020/target/1 32
hdfs://nn2:8020/target/2 32
@ -190,7 +190,7 @@ $H3 Update and Overwrite
because it doesn't exist at the target. `10` and `20` are overwritten since
the contents don't match the source.
If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesnt exist at the target. `10` and `20` are overwritten since the contents dont match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length).
If `-update` is used, `1` is skipped because the file-length and contents match. `2` is copied because it doesn't exist at the target. `10` and `20` are overwritten since the contents dont match the source. However, if `-append` is additionally used, then only `10` is overwritten (source length less than destination) and `20` is appended with the change in file (if the files match up to the destination's original length).
If `-overwrite` is used, `1` is overwritten as well.
@ -269,7 +269,7 @@ $H4 Experiment 1: Syncing diff of two adjacent snapshots
$H4 Experiment 2: syncing diff of two non-adjacent snapshots
First do a clean up from Experiment 1.
First do a cleanup from Experiment 1.
hdfs dfs -rm -skipTrash /dst/1.txt
@ -514,7 +514,7 @@ $H3 InputFormats and MapReduce Components
* A file with the same name exists at target, but `-overwrite` is
specified.
* A file with the same name exists at target, but differs in block-size
(and block-size needs to be preserved.
and block-size needs to be preserved.
* **CopyCommitter:** This class is responsible for the commit-phase of the
DistCp job, including:
@ -576,7 +576,7 @@ $H3 MapReduce and other side-effects
map on a re-execution will be marked as "skipped".
* If a map fails `mapreduce.map.maxattempts` times, the remaining map tasks
will be killed (unless `-i` is set).
* If `mapreduce.map.speculative` is set set final and true, the result of the
* If `mapreduce.map.speculative` is set to be true, the result of the
copy is undefined.
$H3 DistCp and Object Stores
@ -691,7 +691,7 @@ Frequently Asked Questions
directory is copied over, rather than the source-directory itself. This
behaviour is consistent with the legacy DistCp implementation as well.
2. **How does the new DistCp differ in semantics from the Legacy DistCp?**
2. **How does the new DistCp differs in semantics from the Legacy DistCp?**
* Files that are skipped during copy used to also have their
file-attributes (permissions, owner/group info, etc.) unchanged, when