HDFS-10974. Document replication factor for EC files. Contributed by Yiqun Lin.
This commit is contained in:
parent
c9b7ce9273
commit
8c591b8d19
@ -43,7 +43,8 @@ public static void registerCommands(CommandFactory factory) {
|
|||||||
public static final String DESCRIPTION =
|
public static final String DESCRIPTION =
|
||||||
"Set the replication level of a file. If <path> is a directory " +
|
"Set the replication level of a file. If <path> is a directory " +
|
||||||
"then the command recursively changes the replication factor of " +
|
"then the command recursively changes the replication factor of " +
|
||||||
"all files under the directory tree rooted at <path>.\n" +
|
"all files under the directory tree rooted at <path>. " +
|
||||||
|
"The EC files will be ignored here.\n" +
|
||||||
"-w: It requests that the command waits for the replication " +
|
"-w: It requests that the command waits for the replication " +
|
||||||
"to complete. This can potentially take a very long time.\n" +
|
"to complete. This can potentially take a very long time.\n" +
|
||||||
"-R: It is accepted for backwards compatibility. It has no effect.";
|
"-R: It is accepted for backwards compatibility. It has no effect.";
|
||||||
|
@ -647,7 +647,7 @@ setrep
|
|||||||
|
|
||||||
Usage: `hadoop fs -setrep [-R] [-w] <numReplicas> <path> `
|
Usage: `hadoop fs -setrep [-R] [-w] <numReplicas> <path> `
|
||||||
|
|
||||||
Changes the replication factor of a file. If *path* is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at *path*.
|
Changes the replication factor of a file. If *path* is a directory then the command recursively changes the replication factor of all files under the directory tree rooted at *path*. The EC files will be ignored when executing this command.
|
||||||
|
|
||||||
Options:
|
Options:
|
||||||
|
|
||||||
|
@ -778,7 +778,7 @@
|
|||||||
</comparator>
|
</comparator>
|
||||||
<comparator>
|
<comparator>
|
||||||
<type>RegexpComparator</type>
|
<type>RegexpComparator</type>
|
||||||
<expected-output>^\s*rooted at <path>\.( )*</expected-output>
|
<expected-output>^\s*rooted at <path>\. The EC files will be ignored here\.( )*</expected-output>
|
||||||
</comparator>
|
</comparator>
|
||||||
<comparator>
|
<comparator>
|
||||||
<type>RegexpComparator</type>
|
<type>RegexpComparator</type>
|
||||||
|
@ -23,6 +23,7 @@ Purpose
|
|||||||
However, for warm and cold datasets with relatively low I/O activities, additional block replicas are rarely accessed during normal operations, but still consume the same amount of resources as the first replica.
|
However, for warm and cold datasets with relatively low I/O activities, additional block replicas are rarely accessed during normal operations, but still consume the same amount of resources as the first replica.
|
||||||
|
|
||||||
Therefore, a natural improvement is to use Erasure Coding (EC) in place of replication, which provides the same level of fault-tolerance with much less storage space. In typical Erasure Coding (EC) setups, the storage overhead is no more than 50%.
|
Therefore, a natural improvement is to use Erasure Coding (EC) in place of replication, which provides the same level of fault-tolerance with much less storage space. In typical Erasure Coding (EC) setups, the storage overhead is no more than 50%.
|
||||||
|
Replication factor of an EC file is meaningless. It is always 1 and cannot be changed via -setrep command.
|
||||||
|
|
||||||
Background
|
Background
|
||||||
----------
|
----------
|
||||||
|
@ -217,7 +217,7 @@ Command Line Options
|
|||||||
|
|
||||||
Flag | Description | Notes
|
Flag | Description | Notes
|
||||||
----------------- | ------------------------------------ | --------
|
----------------- | ------------------------------------ | --------
|
||||||
`-p[rbugpcaxt]` | Preserve r: replication number b: block size u: user g: group p: permission c: checksum-type a: ACL x: XAttr t: timestamp | When `-update` is specified, status updates will **not** be synchronized unless the file sizes also differ (i.e. unless the file is re-created). If -pa is specified, DistCp preserves the permissions also because ACLs are a super-set of permissions.
|
`-p[rbugpcaxt]` | Preserve r: replication number b: block size u: user g: group p: permission c: checksum-type a: ACL x: XAttr t: timestamp | When `-update` is specified, status updates will **not** be synchronized unless the file sizes also differ (i.e. unless the file is re-created). If -pa is specified, DistCp preserves the permissions also because ACLs are a super-set of permissions. The option -pr is only valid if both source and target directory are not erasure coded.
|
||||||
`-i` | Ignore failures | As explained in the Appendix, this option will keep more accurate statistics about the copy than the default case. It also preserves logs from failed copies, which can be valuable for debugging. Finally, a failing map will not cause the job to fail before all splits are attempted.
|
`-i` | Ignore failures | As explained in the Appendix, this option will keep more accurate statistics about the copy than the default case. It also preserves logs from failed copies, which can be valuable for debugging. Finally, a failing map will not cause the job to fail before all splits are attempted.
|
||||||
`-log <logdir>` | Write logs to \<logdir\> | DistCp keeps logs of each file it attempts to copy as map output. If a map fails, the log output will not be retained if it is re-executed.
|
`-log <logdir>` | Write logs to \<logdir\> | DistCp keeps logs of each file it attempts to copy as map output. If a map fails, the log output will not be retained if it is re-executed.
|
||||||
`-m <num_maps>` | Maximum number of simultaneous copies | Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput.
|
`-m <num_maps>` | Maximum number of simultaneous copies | Specify the number of maps to copy data. Note that more maps may not necessarily improve throughput.
|
||||||
|
Loading…
Reference in New Issue
Block a user