HDFS-5841. Update HDFS caching documentation with new changes. (wang)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1562649 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Andrew Wang 2014-01-30 00:17:16 +00:00
parent c96d078033
commit 799ad3d670
3 changed files with 124 additions and 80 deletions

View File

@ -559,6 +559,8 @@ Release 2.3.0 - UNRELEASED
HDFS-5788. listLocatedStatus response can be very large. (Nathan Roberts HDFS-5788. listLocatedStatus response can be very large. (Nathan Roberts
via kihwal) via kihwal)
HDFS-5841. Update HDFS caching documentation with new changes. (wang)
OPTIMIZATIONS OPTIMIZATIONS
HDFS-5239. Allow FSNamesystem lock fairness to be configurable (daryn) HDFS-5239. Allow FSNamesystem lock fairness to be configurable (daryn)

View File

@ -620,7 +620,7 @@ public String getLongUsage() {
"directives being added to the pool. This can be specified in " + "directives being added to the pool. This can be specified in " +
"seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. " + "seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. " +
"Valid units are [smhd]. By default, no maximum is set. " + "Valid units are [smhd]. By default, no maximum is set. " +
"This can also be manually specified by \"never\"."); "A value of \"never\" specifies that there is no limit.");
return getShortUsage() + "\n" + return getShortUsage() + "\n" +
"Add a new cache pool.\n\n" + "Add a new cache pool.\n\n" +
listing.toString(); listing.toString();

View File

@ -22,110 +22,140 @@ Centralized Cache Management in HDFS
%{toc|section=1|fromDepth=2|toDepth=4} %{toc|section=1|fromDepth=2|toDepth=4}
* {Background} * {Overview}
Normally, HDFS relies on the operating system to cache data it reads from disk. <Centralized cache management> in HDFS is an explicit caching mechanism that
However, HDFS can also be configured to use centralized cache management. Under allows users to specify <paths> to be cached by HDFS. The NameNode will
centralized cache management, the HDFS NameNode itself decides which blocks communicate with DataNodes that have the desired blocks on disk, and instruct
should be cached, and where they should be cached. them to cache the blocks in off-heap caches.
Centralized cache management has several advantages. First of all, it Centralized cache management in HDFS has many significant advantages.
prevents frequently used block files from being evicted from memory. This is
particularly important when the size of the working set exceeds the size of [[1]] Explicit pinning prevents frequently used data from being evicted from
main memory, which is true for many big data applications. Secondly, when memory. This is particularly important when the size of the working set
HDFS decides what should be cached, it can let clients know about this exceeds the size of main memory, which is common for many HDFS workloads.
information through the getFileBlockLocations API. Finally, when the DataNode
knows a block is locked into memory, it can provide access to that block via [[1]] Because DataNode caches are managed by the NameNode, applications can
mmap. query the set of cached block locations when making task placement decisions.
Co-locating a task with a cached block replica improves read performance.
[[1]] When block has been cached by a DataNode, clients can use a new ,
more-efficient, zero-copy read API. Since checksum verification of cached
data is done once by the DataNode, clients can incur essentially zero
overhead when using this new API.
[[1]] Centralized caching can improve overall cluster memory utilization.
When relying on the OS buffer cache at each DataNode, repeated reads of
a block will result in all <n> replicas of the block being pulled into
buffer cache. With centralized cache management, a user can explicitly pin
only <m> of the <n> replicas, saving <n-m> memory.
* {Use Cases} * {Use Cases}
Centralized cache management is most useful for files which are accessed very Centralized cache management is useful for files that accessed repeatedly.
often. For example, a "fact table" in Hive which is often used in joins is a For example, a small <fact table> in Hive which is often used for joins is a
good candidate for caching. On the other hand, when running a classic good candidate for caching. On the other hand, caching the input of a <
"word count" MapReduce job which counts the number of words in each one year reporting query> is probably less useful, since the
document, there may not be any good candidates for caching, since all the historical data might only be read once.
files may be accessed exactly once.
Centralized cache management is also useful for mixed workloads with
performance SLAs. Caching the working set of a high-priority workload
insures that it does not contend for disk I/O with a low-priority workload.
* {Architecture} * {Architecture}
[images/caching.png] Caching Architecture [images/caching.png] Caching Architecture
With centralized cache management, the NameNode coordinates all caching In this architecture, the NameNode is responsible for coordinating all the
across the cluster. It receives cache information from each DataNode via the DataNode off-heap caches in the cluster. The NameNode periodically receives
cache report, a periodic message that describes all the blocks IDs cached on a <cache report> from each DataNode which describes all the blocks cached
a given DataNode. The NameNode will reply to DataNode heartbeat messages on a given DN. The NameNode manages DataNode caches by piggybacking cache and
with commands telling it which blocks to cache and which to uncache. uncache commands on the DataNode heartbeat.
The NameNode stores a set of path cache directives, which tell it which files The NameNode queries its set of <cache directives> to determine
to cache. The NameNode also stores a set of cache pools, which are groups of which paths should be cached. Cache directives are persistently stored in the
cache directives. These directives and pools are persisted to the edit log fsimage and edit log, and can be added, removed, and modified via Java and
and fsimage, and will be loaded if the cluster is restarted. command-line APIs. The NameNode also stores a set of <cache pools>,
which are administrative entities used to group cache directives together for
resource management and enforcing permissions.
Periodically, the NameNode rescans the namespace, to see which blocks need to The NameNode periodically rescans the namespace and active cache directives
be cached based on the current set of path cache directives. Rescans are also to determine which blocks need to be cached or uncached and assign caching
triggered by relevant user actions, such as adding or removing a cache work to DataNodes. Rescans can also be triggered by user actions like adding
directive or removing a cache pool. or removing a cache directive or removing a cache pool.
Cache directives also may specific a numeric cache replication, which is the
number of replicas to cache. This number may be equal to or smaller than the
file's block replication. If multiple cache directives cover the same file
with different cache replication settings, then the highest cache replication
setting is applied.
We do not currently cache blocks which are under construction, corrupt, or We do not currently cache blocks which are under construction, corrupt, or
otherwise incomplete. If a cache directive covers a symlink, the symlink otherwise incomplete. If a cache directive covers a symlink, the symlink
target is not cached. target is not cached.
Caching is currently done on a per-file basis, although we would like to add Caching is currently done on the file or directory-level. Block and sub-block
block-level granularity in the future. caching is an item of future work.
* {Interface} * {Concepts}
The NameNode stores a list of "cache directives." These directives contain a ** {Cache directive}
path as well as the number of times blocks in that path should be replicated.
Paths can be either directories or files. If the path specifies a file, that A <cache directive> defines a path that should be cached. Paths can be either
file is cached. If the path specifies a directory, all the files in the directories or files. Directories are cached non-recursively, meaning only
directory will be cached. However, this process is not recursive-- only the files in the first-level listing of the directory.
direct children of the directory will be cached.
** {hdfs cacheadmin Shell} Directives also specify additional parameters, such as the cache replication
factor and expiration time. The replication factor specifies the number of
block replicas to cache. If multiple cache directives refer to the same file,
the maximum cache replication factor is applied.
Path cache directives can be created by the <<<hdfs cacheadmin The expiration time is specified on the command line as a <time-to-live
-addDirective>>> command and removed via the <<<hdfs cacheadmin (TTL)>, a relative expiration time in the future. After a cache directive
-removeDirective>>> command. To list the current path cache directives, use expires, it is no longer considered by the NameNode when making caching
<<<hdfs cacheadmin -listDirectives>>>. Each path cache directive has a decisions.
unique 64-bit ID number which will not be reused if it is deleted. To remove
all path cache directives with a specified path, use <<<hdfs cacheadmin
-removeDirectives>>>.
Directives are grouped into "cache pools." Each cache pool gets a share of ** {Cache pool}
the cluster's resources. Additionally, cache pools are used for
authentication. Cache pools have a mode, user, and group, similar to regular
files. The same authentication rules are applied as for normal files. So, for
example, if the mode is 0777, any user can add or remove directives from the
cache pool. If the mode is 0644, only the owner can write to the cache pool,
but anyone can read from it. And so forth.
Cache pools are identified by name. They can be created by the <<<hdfs A <cache pool> is an administrative entity used to manage groups of cache
cacheAdmin -addPool>>> command, modified by the <<<hdfs cacheadmin directives. Cache pools have UNIX-like <permissions>, which restrict which
-modifyPool>>> command, and removed via the <<<hdfs cacheadmin users and groups have access to the pool. Write permissions allow users to
-removePool>>> command. To list the current cache pools, use <<<hdfs add and remove cache directives to the pool. Read permissions allow users to
cacheAdmin -listPools>>> list the cache directives in a pool, as well as additional metadata. Execute
permissions are unused.
Cache pools are also used for resource management. Pools can enforce a
maximum <limit>, which restricts the number of bytes that can be cached in
aggregate by directives in the pool. Normally, the sum of the pool limits
will approximately equal the amount of aggregate memory reserved for
HDFS caching on the cluster. Cache pools also track a number of statistics
to help cluster users determine what is and should be cached.
Pools also can enforce a maximum time-to-live. This restricts the maximum
expiration time of directives being added to the pool.
* {<<<cacheadmin>>> command-line interface}
On the command-line, administrators and users can interact with cache pools
and directives via the <<<hdfs cacheadmin>>> subcommand.
Cache directives are identified by a unique, non-repeating 64-bit integer ID.
IDs will not be reused even if a cache directive is later removed.
Cache pools are identified by a unique string name.
** {Cache directive commands}
*** {addDirective} *** {addDirective}
Usage: <<<hdfs cacheadmin -addDirective -path <path> -replication <replication> -pool <pool-name> >>> Usage: <<<hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]>>>
Add a new cache directive. Add a new cache directive.
*--+--+ *--+--+
\<path\> | A path to cache. The path can be a directory or a file. \<path\> | A path to cache. The path can be a directory or a file.
*--+--+ *--+--+
\<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives.
*--+--+
-force | Skips checking of cache pool resource limits.
*--+--+
\<replication\> | The cache replication factor to use. Defaults to 1. \<replication\> | The cache replication factor to use. Defaults to 1.
*--+--+ *--+--+
\<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives. \<time-to-live\> | How long the directive is valid. Can be specified in minutes, hours, and days, e.g. 30m, 4h, 2d. Valid units are [smhd]. "never" indicates a directive that never expires. If unspecified, the directive never expires.
*--+--+ *--+--+
*** {removeDirective} *** {removeDirective}
@ -150,7 +180,7 @@ Centralized Cache Management in HDFS
*** {listDirectives} *** {listDirectives}
Usage: <<<hdfs cacheadmin -listDirectives [-path <path>] [-pool <pool>] >>> Usage: <<<hdfs cacheadmin -listDirectives [-stats] [-path <path>] [-pool <pool>]>>>
List cache directives. List cache directives.
@ -159,10 +189,14 @@ Centralized Cache Management in HDFS
*--+--+ *--+--+
\<pool\> | List only path cache directives in that pool. \<pool\> | List only path cache directives in that pool.
*--+--+ *--+--+
-stats | List path-based cache directive statistics.
*--+--+
** {Cache pool commands}
*** {addPool} *** {addPool}
Usage: <<<hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-weight <weight>] >>> Usage: <<<hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>>>>
Add a new cache pool. Add a new cache pool.
@ -175,12 +209,14 @@ Centralized Cache Management in HDFS
*--+--+ *--+--+
\<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755. \<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755.
*--+--+ *--+--+
\<weight\> | Weight of the pool. This is a relative measure of the importance of the pool used during cache resource management. By default, it is set to 100. \<limit\> | The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set.
*--+--+
\<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool. This can be specified in seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. Valid units are [smhd]. By default, no maximum is set. A value of \"never\" specifies that there is no limit.
*--+--+ *--+--+
*** {modifyPool} *** {modifyPool}
Usage: <<<hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-weight <weight>] >>> Usage: <<<hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]>>>
Modifies the metadata of an existing cache pool. Modifies the metadata of an existing cache pool.
@ -193,7 +229,9 @@ Centralized Cache Management in HDFS
*--+--+ *--+--+
\<mode\> | Unix-style permissions of the pool in octal. \<mode\> | Unix-style permissions of the pool in octal.
*--+--+ *--+--+
\<weight\> | Weight of the pool. \<limit\> | Maximum number of bytes that can be cached by this pool.
*--+--+
\<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool.
*--+--+ *--+--+
*** {removePool} *** {removePool}
@ -208,11 +246,13 @@ Centralized Cache Management in HDFS
*** {listPools} *** {listPools}
Usage: <<<hdfs cacheadmin -listPools [name] >>> Usage: <<<hdfs cacheadmin -listPools [-stats] [<name>]>>>
Display information about one or more cache pools, e.g. name, owner, group, Display information about one or more cache pools, e.g. name, owner, group,
permissions, etc. permissions, etc.
*--+--+
-stats | Display additional cache pool statistics.
*--+--+ *--+--+
\<name\> | If specified, list only the named cache pool. \<name\> | If specified, list only the named cache pool.
*--+--+ *--+--+
@ -244,10 +284,12 @@ Centralized Cache Management in HDFS
* dfs.datanode.max.locked.memory * dfs.datanode.max.locked.memory
The DataNode will treat this as the maximum amount of memory it can use for This determines the maximum amount of memory a DataNode will use for caching.
its cache. When setting this value, please remember that you will need space The "locked-in-memory size" ulimit (<<<ulimit -l>>>) of the DataNode user
in memory for other things, such as the Java virtual machine (JVM) itself also needs to be increased to match this parameter (see below section on
and the operating system's page cache. {{OS Limits}}). When setting this value, please remember that you will need
space in memory for other things as well, such as the DataNode and
application JVM heaps and the operating system page cache.
*** Optional *** Optional