hadoop/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/ExtendedAttributes.md

6.3 KiB

Extended Attributes in HDFS

Overview

Extended attributes (abbreviated as xattrs) are a filesystem feature that allow user applications to associate additional metadata with a file or directory. Unlike system-level inode metadata such as file permissions or modification time, extended attributes are not interpreted by the system and are instead used by applications to store additional information about an inode. Extended attributes could be used, for instance, to specify the character encoding of a plain-text document.

HDFS extended attributes

Extended attributes in HDFS are modeled after extended attributes in Linux (see the Linux manpage for attr(5) and related documentation). An extended attribute is a name-value pair, with a string name and binary value. Xattrs names must also be prefixed with a namespace. For example, an xattr named myXattr in the user namespace would be specified as user.myXattr. Multiple xattrs can be associated with a single inode.

Namespaces and Permissions

In HDFS, there are five valid namespaces: user, trusted, system, security, and raw. Each of these namespaces have different access restrictions.

The user namespace is the namespace that will commonly be used by client applications. Access to extended attributes in the user namespace is controlled by the corresponding file permissions.

The trusted namespace is available only to HDFS superusers.

The system namespace is reserved for internal HDFS use. This namespace is not accessible through userspace methods, and is reserved for implementing internal HDFS features.

The security namespace is reserved for internal HDFS use. This namespace is generally not accessible through userspace methods. One particular use of security is the security.hdfs.unreadable.by.superuser extended attribute. This xattr can only be set on files, and it will prevent the superuser from reading the file's contents. The superuser can still read and modify file metadata, such as the owner, permissions, etc. This xattr can be set and accessed by any user, assuming normal filesystem permissions. This xattr is also write-once, and cannot be removed once set. This xattr does not allow a value to be set.

The raw namespace is reserved for internal system attributes that sometimes need to be exposed. Like system namespace attributes they are not visible to the user except when getXAttr/getXAttrs is called on a file or directory in the /.reserved/raw HDFS directory hierarchy. These attributes can only be accessed by the superuser. An example of where raw namespace extended attributes are used is the distcp utility. Encryption zone meta data is stored in raw.* extended attributes, so as long as the administrator uses /.reserved/raw pathnames in source and target, the encrypted files in the encryption zones are transparently copied.

Interacting with extended attributes

The Hadoop shell has support for interacting with extended attributes via hadoop fs -getfattr and hadoop fs -setfattr. These commands are styled after the Linux getfattr(1) and setfattr(1) commands.

getfattr

hadoop fs -getfattr [-R] -n name | -d [-e en] <path>

Displays the extended attribute names and values (if any) for a file or directory.

-R Recursively list the attributes for all files and directories.
-n name Dump the named extended attribute value.
-d Dump all extended attribute values associated with pathname.
-e <encoding> Encode values after retrieving them. Valid encodings are "text", "hex", and "base64". Values encoded as text strings are enclosed in double quotes ("), and values encoded as hexadecimal and base64 are prefixed with 0x and 0s, respectively.
<path> The file or directory.

setfattr

hadoop fs -setfattr -n name [-v value] | -x name <path>

Sets an extended attribute name and value for a file or directory.

-n name The extended attribute name.
-v value The extended attribute value. There are three different encoding methods for the value. If the argument is enclosed in double quotes, then the value is the string inside the quotes. If the argument is prefixed with 0x or 0X, then it is taken as a hexadecimal number. If the argument begins with 0s or 0S, then it is taken as a base64 encoding.
-x name Remove the extended attribute.
<path> The file or directory.

Configuration options

HDFS supports extended attributes out of the box, without additional configuration. Administrators could potentially be interested in the options limiting the number of xattrs per inode and the size of xattrs, since xattrs increase the on-disk and in-memory space consumption of an inode.

  • dfs.namenode.xattrs.enabled

    Whether support for extended attributes is enabled on the NameNode. By default, extended attributes are enabled.

  • dfs.namenode.fs-limits.max-xattrs-per-inode

    The maximum number of extended attributes per inode. By default, this limit is 32.

  • dfs.namenode.fs-limits.max-xattr-size

    The maximum combined size of the name and value of an extended attribute in bytes. By default, this limit is 16384 bytes.