HDFS-5386. Add feature documentation for datanode caching. Contributed by Colin Patrick McCabe.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1537243 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Chris Nauroth 2013-10-30 19:01:13 +00:00
parent 9c29b86cab
commit 6aeaef9276
5 changed files with 322 additions and 0 deletions

View File

@ -188,6 +188,9 @@ Trunk (Unreleased)
HDFS-5378. In CacheReport, don't send genstamp and length on the wire HDFS-5378. In CacheReport, don't send genstamp and length on the wire
(Contributed by Colin Patrick McCabe) (Contributed by Colin Patrick McCabe)
HDFS-5386. Add feature documentation for datanode caching.
(Colin Patrick McCabe via cnauroth)
OPTIMIZATIONS OPTIMIZATIONS
HDFS-5349. DNA_CACHE and DNA_UNCACHE should be by blockId only. (cmccabe) HDFS-5349. DNA_CACHE and DNA_UNCACHE should be by blockId only. (cmccabe)

View File

@ -1486,6 +1486,24 @@
</description> </description>
</property> </property>
<property>
<name>dfs.namenode.list.cache.descriptors.num.responses</name>
<value>100</value>
<description>
This value controls the number of cache descriptors that the NameNode will
send over the wire in response to a listDirectives RPC.
</description>
</property>
<property>
<name>dfs.namenode.list.cache.pools.num.responses</name>
<value>100</value>
<description>
This value controls the number of cache pools that the NameNode will
send over the wire in response to a listPools RPC.
</description>
</property>
<property> <property>
<name>dfs.namenode.path.based.cache.refresh.interval.ms</name> <name>dfs.namenode.path.based.cache.refresh.interval.ms</name>
<value>300000</value> <value>300000</value>

View File

@ -0,0 +1,300 @@
~~ Licensed under the Apache License, Version 2.0 (the "License");
~~ you may not use this file except in compliance with the License.
~~ You may obtain a copy of the License at
~~
~~ http://www.apache.org/licenses/LICENSE-2.0
~~
~~ Unless required by applicable law or agreed to in writing, software
~~ distributed under the License is distributed on an "AS IS" BASIS,
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~~ See the License for the specific language governing permissions and
~~ limitations under the License. See accompanying LICENSE file.
---
Hadoop Distributed File System-${project.version} - Centralized Cache Management in HDFS
---
---
${maven.build.timestamp}
Centralized Cache Management in HDFS
\[ {{{./index.html}Go Back}} \]
%{toc|section=1|fromDepth=2|toDepth=4}
* {Background}
Normally, HDFS relies on the operating system to cache data it reads from disk.
However, HDFS can also be configured to use centralized cache management. Under
centralized cache management, the HDFS NameNode itself decides which blocks
should be cached, and where they should be cached.
Centralized cache management has several advantages. First of all, it
prevents frequently used block files from being evicted from memory. This is
particularly important when the size of the working set exceeds the size of
main memory, which is true for many big data applications. Secondly, when
HDFS decides what should be cached, it can let clients know about this
information through the getFileBlockLocations API. Finally, when the DataNode
knows a block is locked into memory, it can provide access to that block via
mmap.
* {Use Cases}
Centralized cache management is most useful for files which are accessed very
often. For example, a "fact table" in Hive which is often used in joins is a
good candidate for caching. On the other hand, when running a classic
"word count" MapReduce job which counts the number of words in each
document, there may not be any good candidates for caching, since all the
files may be accessed exactly once.
* {Architecture}
[images/caching.png] Caching Architecture
With centralized cache management, the NameNode coordinates all caching
across the cluster. It receives cache information from each DataNode via the
cache report, a periodic message that describes all the blocks IDs cached on
a given DataNode. The NameNode will reply to DataNode heartbeat messages
with commands telling it which blocks to cache and which to uncache.
The NameNode stores a set of path cache directives, which tell it which files
to cache. The NameNode also stores a set of cache pools, which are groups of
cache directives. These directives and pools are persisted to the edit log
and fsimage, and will be loaded if the cluster is restarted.
Periodically, the NameNode rescans the namespace, to see which blocks need to
be cached based on the current set of path cache directives. Rescans are also
triggered by relevant user actions, such as adding or removing a cache
directive or removing a cache pool.
Cache directives also may specific a numeric cache replication, which is the
number of replicas to cache. This number may be equal to or smaller than the
file's block replication. If multiple cache directives cover the same file
with different cache replication settings, then the highest cache replication
setting is applied.
We do not currently cache blocks which are under construction, corrupt, or
otherwise incomplete. If a cache directive covers a symlink, the symlink
target is not cached.
Caching is currently done on a per-file basis, although we would like to add
block-level granularity in the future.
* {Interface}
The NameNode stores a list of "cache directives." These directives contain a
path as well as the number of times blocks in that path should be replicated.
Paths can be either directories or files. If the path specifies a file, that
file is cached. If the path specifies a directory, all the files in the
directory will be cached. However, this process is not recursive-- only the
direct children of the directory will be cached.
** {hdfs cacheadmin Shell}
Path cache directives can be created by the <<<hdfs cacheadmin
-addDirective>>> command and removed via the <<<hdfs cacheadmin
-removeDirective>>> command. To list the current path cache directives, use
<<<hdfs cacheadmin -listDirectives>>>. Each path cache directive has a
unique 64-bit ID number which will not be reused if it is deleted. To remove
all path cache directives with a specified path, use <<<hdfs cacheadmin
-removeDirectives>>>.
Directives are grouped into "cache pools." Each cache pool gets a share of
the cluster's resources. Additionally, cache pools are used for
authentication. Cache pools have a mode, user, and group, similar to regular
files. The same authentication rules are applied as for normal files. So, for
example, if the mode is 0777, any user can add or remove directives from the
cache pool. If the mode is 0644, only the owner can write to the cache pool,
but anyone can read from it. And so forth.
Cache pools are identified by name. They can be created by the <<<hdfs
cacheAdmin -addPool>>> command, modified by the <<<hdfs cacheadmin
-modifyPool>>> command, and removed via the <<<hdfs cacheadmin
-removePool>>> command. To list the current cache pools, use <<<hdfs
cacheAdmin -listPools>>>
*** {addDirective}
Usage: <<<hdfs cacheadmin -addDirective -path <path> -replication <replication> -pool <pool-name> >>>
Add a new PathBasedCache directive.
*--+--+
\<path\> | A path to cache. The path can be a directory or a file.
*--+--+
\<replication\> | The cache replication factor to use. Defaults to 1.
*--+--+
\<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives.
*--+--+
*** {removeDirective}
Usage: <<<hdfs cacheadmin -removeDirective <id> >>>
Remove a cache directive.
*--+--+
\<id\> | The id of the cache directive to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of PathBasedCache directive IDs, use the -listDirectives command.
*--+--+
*** {removeDirectives}
Usage: <<<hdfs cacheadmin -removeDirectives <path> >>>
Remove every cache directive with the specified path.
*--+--+
\<path\> | The path of the cache directives to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cache directives, use the -listDirectives command.
*--+--+
*** {listDirectives}
Usage: <<<hdfs cacheadmin -listDirectives [-path <path>] [-pool <pool>] >>>
List PathBasedCache directives.
*--+--+
\<path\> | List only PathBasedCache directives with this path. Note that if there is a PathBasedCache directive for <path> in a cache pool that we don't have read access for, it will not be listed.
*--+--+
\<pool\> | List only path cache directives in that pool.
*--+--+
*** {addPool}
Usage: <<<hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-weight <weight>] >>>
Add a new cache pool.
*--+--+
\<name\> | Name of the new pool.
*--+--+
\<owner\> | Username of the owner of the pool. Defaults to the current user.
*--+--+
\<group\> | Group of the pool. Defaults to the primary group name of the current user.
*--+--+
\<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755.
*--+--+
\<weight\> | Weight of the pool. This is a relative measure of the importance of the pool used during cache resource management. By default, it is set to 100.
*--+--+
*** {modifyPool}
Usage: <<<hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-weight <weight>] >>>
Modifies the metadata of an existing cache pool.
*--+--+
\<name\> | Name of the pool to modify.
*--+--+
\<owner\> | Username of the owner of the pool.
*--+--+
\<group\> | Groupname of the group of the pool.
*--+--+
\<mode\> | Unix-style permissions of the pool in octal.
*--+--+
\<weight\> | Weight of the pool.
*--+--+
*** {removePool}
Usage: <<<hdfs cacheadmin -removePool <name> >>>
Remove a cache pool. This also uncaches paths associated with the pool.
*--+--+
\<name\> | Name of the cache pool to remove.
*--+--+
*** {listPools}
Usage: <<<hdfs cacheadmin -listPools [name] >>>
Display information about one or more cache pools, e.g. name, owner, group,
permissions, etc.
*--+--+
\<name\> | If specified, list only the named cache pool.
*--+--+
*** {help}
Usage: <<<hdfs cacheadmin -help <command-name> >>>
Get detailed help about a command.
*--+--+
\<command-name\> | The command for which to get detailed help. If no command is specified, print detailed help for all commands.
*--+--+
* {Configuration}
** {Native Libraries}
In order to lock block files into memory, the DataNode relies on native JNI
code found in <<<libhadoop.so>>>. Be sure to
{{{../hadoop-common/NativeLibraries.html}enable JNI}} if you are using HDFS
centralized cache management.
** {Configuration Properties}
*** Required
Be sure to configure the following:
* dfs.namenode.caching.enabled
This must be set to true to enable caching. If this is false, the NameNode
will ignore cache reports, and will not ask DataNodes to cache
blocks.
* dfs.datanode.max.locked.memory
The DataNode will treat this as the maximum amount of memory it can use for
its cache. When setting this value, please remember that you will need space
in memory for other things, such as the Java virtual machine (JVM) itself
and the operating system's page cache.
*** Optional
The following properties are not required, but may be specified for tuning:
* dfs.namenode.path.based.cache.refresh.interval.ms
The NameNode will use this as the amount of milliseconds between subsequent
path cache rescans. This calculates the blocks to cache and each DataNode
containing a replica of the block that should cache it.
By default, this parameter is set to 300000, which is five minutes.
* dfs.datanode.fsdatasetcache.max.threads.per.volume
The DataNode will use this as the maximum number of threads per volume to
use for caching new data.
By default, this parameter is set to 4.
* dfs.cachereport.intervalMsec
The DataNode will use this as the amount of milliseconds between sending a
full report of its cache state to the NameNode.
By default, this parameter is set to 10000, which is 10 seconds.
** {OS Limits}
If you get the error "Cannot start datanode because the configured max
locked memory size... is more than the datanode's available RLIMIT_MEMLOCK
ulimit," that means that the operating system is imposing a lower limit
on the amount of memory that you can lock than what you have configured. To
fix this, you must adjust the ulimit -l value that the DataNode runs with.
Usually, this value is configured in <<</etc/security/limits.conf>>>.
However, it will vary depending on what operating system and distribution
you are using.
You will know that you have correctly configured this value when you can run
<<<ulimit -l>>> from the shell and get back either a higher value than what
you have configured with <<<dfs.datanode.max.locked.memory>>>, or the string
"unlimited," indicating that there is no limit. Note that it's typical for
<<<ulimit -l>>> to output the memory lock limit in KB, but
dfs.datanode.max.locked.memory must be specified in bytes.

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@ -80,6 +80,7 @@
<item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/> <item name="HttpFS Gateway" href="hadoop-hdfs-httpfs/index.html"/>
<item name="Short Circuit Local Reads" <item name="Short Circuit Local Reads"
href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/> href="hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html"/>
<item name="Centralized Cache Management" href="hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html"/>
<item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/> <item name="HDFS NFS Gateway" href="hadoop-project-dist/hadoop-hdfs/HdfsNfsGateway.html"/>
</menu> </menu>