true
if the key is deprecated and
false
otherwise.]]>
null
if
no such property exists. If the key is deprecated, it returns the value of
the first key which replaces the deprecated key and is not null.
Values are processed for variable expansion
before being returned.
As a side effect get loads the properties from the sources if called for
the first time as a lazy init.
@param name the property name, will be trimmed before get value.
@return the value of the name
or its replacing property,
or null if no such property exists.]]>
name
exists without value]]>
String
,
null
if no such property exists.
If the key is deprecated, it returns the value of
the first key which replaces the deprecated key and is not null
Values are processed for variable expansion
before being returned.
@param name the property name.
@return the value of the name
or its replacing property,
or null if no such property exists.]]>
String
,
defaultValue
if no such property exists.
See @{Configuration#getTrimmed} for more details.
@param name the property name.
@param defaultValue the property default value.
@return the value of the name
or defaultValue
if it is not set.]]>
name
property or
its replacing property and null if no such property exists.]]>
name
property. If
name
is deprecated or there is a deprecated name associated to it,
it sets the value to both names. Name will be trimmed before put into
configuration.
@param name property name.
@param value property value.]]>
name
property. If
name
is deprecated, it also sets the value
to
the keys that replace the deprecated key. Name will be trimmed before put
into configuration.
@param name property name.
@param value property value.
@param source the place that this configuration value came from
(For debugging).
@throws IllegalArgumentException when the value or name is null.]]>
defaultValue
is returned.
@param name property name, will be trimmed before get value.
@param defaultValue default value.
@return property value, or defaultValue
if the property
doesn't exist.]]>
int
.
If no such property exists, the provided default value is returned,
or if the specified value is not a valid int
,
then an error is thrown.
@param name property name.
@param defaultValue default value.
@throws NumberFormatException when the value is invalid
@return property value as an int
,
or defaultValue
.]]>
int
values.
If no such property exists, an empty array is returned.
@param name property name
@return property value interpreted as an array of comma-delimited
int
values]]>
int
.
@param name property name.
@param value int
value of the property.]]>
long
.
If no such property exists, the provided default value is returned,
or if the specified value is not a valid long
,
then an error is thrown.
@param name property name.
@param defaultValue default value.
@throws NumberFormatException when the value is invalid
@return property value as a long
,
or defaultValue
.]]>
long
or
human readable format. If no such property exists, the provided default
value is returned, or if the specified value is not a valid
long
or human readable format, then an error is thrown. You
can use the following suffix (case insensitive): k(kilo), m(mega), g(giga),
t(tera), p(peta), e(exa)
@param name property name.
@param defaultValue default value.
@throws NumberFormatException when the value is invalid
@return property value as a long
,
or defaultValue
.]]>
long
.
@param name property name.
@param value long
value of the property.]]>
float
.
If no such property exists, the provided default value is returned,
or if the specified value is not a valid float
,
then an error is thrown.
@param name property name.
@param defaultValue default value.
@throws NumberFormatException when the value is invalid
@return property value as a float
,
or defaultValue
.]]>
float
.
@param name property name.
@param value property value.]]>
double
.
If no such property exists, the provided default value is returned,
or if the specified value is not a valid double
,
then an error is thrown.
@param name property name.
@param defaultValue default value.
@throws NumberFormatException when the value is invalid
@return property value as a double
,
or defaultValue
.]]>
double
.
@param name property name.
@param value property value.]]>
boolean
.
If no such property is specified, or if the specified value is not a valid
boolean
, then defaultValue
is returned.
@param name property name.
@param defaultValue default value.
@return property value as a boolean
,
or defaultValue
.]]>
boolean
.
@param name property name.
@param value boolean
value of the property.]]>
set(<name>, value.toString())
.
@param name property name
@param value new value
@param set(<name>, value + <time suffix>)
.
@param name Property name
@param value Time duration
@param unit Unit of time]]>
Pattern
.
If no such property is specified, or if the specified value is not a valid
Pattern
, then DefaultValue
is returned.
Note that the returned value is NOT trimmed by this method.
@param name property name
@param defaultValue default value
@return property value as a compiled Pattern, or defaultValue]]>
String
s.
If no such property is specified then empty collection is returned.
This is an optimized version of {@link #getStrings(String)}
@param name property name.
@return property value as a collection of String
s.]]>
String
s.
If no such property is specified then null
is returned.
@param name property name.
@return property value as an array of String
s,
or null
.]]>
String
s.
If no such property is specified then default value is returned.
@param name property name.
@param defaultValue The default value
@return property value as an array of String
s,
or default value.]]>
String
s, trimmed of the leading and trailing whitespace.
If no such property is specified then empty Collection
is returned.
@param name property name.
@return property value as a collection of String
s, or empty Collection
]]>
String
s, trimmed of the leading and trailing whitespace.
If no such property is specified then an empty array is returned.
@param name property name.
@return property value as an array of trimmed String
s,
or empty array.]]>
String
s, trimmed of the leading and trailing whitespace.
If no such property is specified then default value is returned.
@param name property name.
@param defaultValue The default value
@return property value as an array of trimmed String
s,
or default value.]]>
InetSocketAddress
. If hostProperty
is
null
, addressProperty
will be used. This
is useful for cases where we want to differentiate between host
bind address and address clients should use to establish connection.
@param hostProperty bind host property name.
@param addressProperty address property name.
@param defaultAddressValue the default value
@param defaultPort the default port
@return InetSocketAddress]]>
InetSocketAddress
.
@param name property name.
@param defaultAddress the default value
@param defaultPort the default port
@return InetSocketAddress]]>
host:port
.]]>
host:port
. The wildcard
address is replaced with the local host's address. If the host and address
properties are configured the host component of the address will be combined
with the port component of the addr to generate the address. This is to allow
optional control over which host name is used in multi-home bind-host
cases where a host can have multiple names
@param hostProperty the bind-host configuration name
@param addressProperty the service address configuration name
@param defaultAddressValue the service default address configuration value
@param addr InetSocketAddress of the service listener
@return InetSocketAddress for clients to connect]]>
host:port
. The wildcard
address is replaced with the local host's address.
@param name property name.
@param addr InetSocketAddress of a listener to store in the given property
@return InetSocketAddress for clients to connect]]>
Class
.
The value of the property specifies a list of comma separated class names.
If no such property is specified, then defaultValue
is
returned.
@param name the property name.
@param defaultValue default value.
@return property value as a Class[]
,
or defaultValue
.]]>
Class
.
If no such property is specified, then defaultValue
is
returned.
@param name the conf key name.
@param defaultValue default value.
@return property value as a Class
,
or defaultValue
.]]>
Class
implementing the interface specified by xface
.
If no such property is specified, then defaultValue
is
returned.
An exception is thrown if the returned class does not implement the named
interface.
@param name the conf key name.
@param defaultValue default value.
@param xface the interface implemented by the named class.
@return property value as a Class
,
or defaultValue
.]]>
List
of objects implementing the interface specified by xface
.
An exception is thrown if any of the classes does not exist, or if it does
not implement the named interface.
@param name the property name.
@param xface the interface implemented by the classes named by
name
.
@return a List
of objects implementing xface
.]]>
theClass
implementing the given interface xface
.
An exception is thrown if theClass
does not implement the
interface xface
.
@param name property name.
@param theClass property value.
@param xface the interface implemented by the named class.]]>
{ "property": { "key" : "key1", "value" : "value1", "isFinal" : "key1.isFinal", "resource" : "key1.resource" } }
{ "properties" : [ { key : "key1", value : "value1", isFinal : "key1.isFinal", resource : "key1.resource" }, { key : "key2", value : "value2", isFinal : "ke2.isFinal", resource : "key2.resource" } ] }
@param config the configuration @param propertyName property name @param out the Writer to write to @throws IOException @throws IllegalArgumentException when property name is not empty and the property is not found in configuration]]>
@param config the configuration @param out the Writer to write to @throws IOException]]>
false
to turn it off.]]>
Configurations are specified by resources. A resource contains a set of
name/value pairs as XML data. Each resource is named by either a
String
or by a {@link Path}. If named by a String
,
then the classpath is examined for a file with that name. If named by a
Path
, then the local filesystem is examined directly, without
referring to the classpath.
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
Configuration parameters may be declared final. Once a resource declares a value final, no subsequently-loaded resource can alter that value. For example, one might define a final parameter with:
<property>
<name>dfs.hosts.include</name>
<value>/etc/hadoop/conf/hosts.include</value>
<final>true</final>
</property>
Administrators typically define parameters as final in
core-site.xml for values that user applications may not alter.
Value strings are first processed for variable expansion. The available properties are:
For example, if a configuration resource contains the following property definitions:
<property>
<name>basedir</name>
<value>/user/${user.name}</value>
</property>
<property>
<name>tempdir</name>
<value>${basedir}/tmp</value>
</property>
<property>
<name>otherdir</name>
<value>${env.BASE_DIR}/other</value>
</property>
When conf.get("tempdir") is called, then ${basedir} will be resolved to another property in this Configuration, while ${user.name} would then ordinarily be resolved to the value of the System property with that name.
When conf.get("otherdir") is called, then ${env.BASE_DIR} will be resolved to the value of the ${BASE_DIR} environment variable. It supports ${env.NAME:-default} and ${env.NAME-default} notations. The former is resolved to "default" if ${NAME} environment variable is undefined or its value is empty. The latter behaves the same way only if ${NAME} is undefined.
By default, warnings will be given to any deprecated configuration parameters and these are suppressible by configuring log4j.logger.org.apache.hadoop.conf.Configuration.deprecation in log4j.properties file.
Optionally we can tag related properties together by using tag attributes. System tags are defined by hadoop.tags.system property. Users can define there own custom tags in hadoop.tags.custom property.
For example, we can tag existing property as:
<property>
<name>dfs.replication</name>
<value>3</value>
<tag>HDFS,REQUIRED</tag>
</property>
<property>
<name>dfs.data.transfer.protection</name>
<value>3</value>
<tag>HDFS,SECURITY</tag>
</property>
Properties marked with tags can be retrieved with conf .getAllPropertiesByTag("HDFS") or conf.getAllPropertiesByTags (Arrays.asList("YARN","SECURITY")).
]]>KeyProvider
implementations must be thread safe.]]>
uri
is not supported.]]>
BlockLocation(offset: 0, length: 3 * BLOCK_SIZE, hosts: {"host1:9866", "host2:9866","host3:9866","host4:9866","host5:9866"})Please refer to {@link FileSystem#getFileBlockLocations(FileStatus, long, long)} or {@link FileContext#getFileBlockLocations(Path, long, long)} for more examples.]]>
In the case of an exception, the state of the buffer (the contents of the buffer, the {@code buf.position()}, the {@code buf.limit()}, etc.) is undefined, and callers should be prepared to recover from this eventuality.
Callers should use {@link StreamCapabilities#hasCapability(String)} with {@link StreamCapabilities#PREADBYTEBUFFER} to check if the underlying stream supports this interface, otherwise they might get a {@link UnsupportedOperationException}.
Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt.
This does not change the current offset of a file, and is thread-safe. @param position position within file @param buf the ByteBuffer to receive the results of the read operation. @return the number of bytes read, possibly zero, or -1 if reached end-of-stream @throws IOException if there is some error performing the read]]>
In the case of an exception, the state of the buffer (the contents of the buffer, the {@code buf.position()}, the {@code buf.limit()}, etc.) is undefined, and callers should be prepared to recover from this eventuality.
Callers should use {@link StreamCapabilities#hasCapability(String)} with {@link StreamCapabilities#READBYTEBUFFER} to check if the underlying stream supports this interface, otherwise they might get a {@link UnsupportedOperationException}.
Implementations should treat 0-length requests as legitimate, and must not signal an error upon their receipt. @param buf the ByteBuffer to receive the results of the read operation. @return the number of bytes read, possibly zero, or -1 if reach end-of-stream @throws IOException if there is some error performing the read]]>
EnumSet.of(CreateFlag.CREATE, CreateFlag.APPEND)
Use the CreateFlag as follows:
absOrFqPath
could
not be instantiated.]]>
f
is not valid]]>
f
already exists
@throws FileNotFoundException If parent of f
does not exist
and createParent
is false
@throws ParentNotDirectoryException If parent of f
is not a
directory.
@throws UnsupportedFileSystemException If file system for f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server
RuntimeExceptions:
@throws InvalidPathException If path f
is not valid]]>
dir
does not exist
and createParent
is false
@throws ParentNotDirectoryException If parent of dir
is not a
directory
@throws UnsupportedFileSystemException If file system for dir
is not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server
RuntimeExceptions:
@throws InvalidPathException If path dir
is not valid]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server
RuntimeExceptions:
@throws InvalidPathException If path f
is invalid]]>
f
is not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
true
if the file has been truncated to the desired
newLength
and is immediately available to be reused for
write operations such as append
, or
false
if a background process of adjusting the length of
the last block has been started, and clients should wait for it to
complete before proceeding with further file updates.
@throws AccessControlException If access is denied
@throws FileNotFoundException If file f
does not exist
@throws UnsupportedFileSystemException If file system for f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
If OVERWRITE option is not passed as an argument, rename fails if the dst already exists.
If OVERWRITE option is passed as an argument, rename overwrites the dst if it is a file or an empty directory. Rename fails if dst is a non-empty directory.
Note that atomicity of rename is dependent on the file system implementation. Please refer to the file system documentation for details
@param src path to be renamed
@param dst new path after rename
@throws AccessControlException If access is denied
@throws FileAlreadyExistsException If dst
already exists and
options
has {@link Options.Rename#OVERWRITE}
option false.
@throws FileNotFoundException If src
does not exist
@throws ParentNotDirectoryException If parent of dst
is not a
directory
@throws UnsupportedFileSystemException If file system for src
and dst
is not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server
RuntimeExceptions:
@throws HadoopIllegalArgumentException If username
or
groupname
is invalid.]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred]]>
f
is
not supported
@throws IOException If the given path does not refer to a symlink
or an I/O error occurred]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
link
already exists
@throws FileNotFoundException If target
does not exist
@throws ParentNotDirectoryException If parent of link
is not a
directory.
@throws UnsupportedFileSystemException If file system for
target
or link
is not supported
@throws IOException If an I/O error occurred]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
f
is
not supported
@throws IOException If an I/O error occurred
Exceptions applicable to file systems accessed over RPC:
@throws RpcClientException If an exception occurred in the RPC client
@throws RpcServerException If an exception occurred in the RPC server
@throws UnexpectedServerException If server implementation throws
undeclared exception to RPC server]]>
Hadoop also supports working-directory-relative names, which are paths relative to the current working directory (similar to Unix). The working directory can be in a different file system than the default FS.
Thus, Hadoop path names can be specified as one of the following:
The file system related server-side defaults are:
UnsupportedOperationException
.
@return the protocol scheme for this FileSystem.
@throws UnsupportedOperationException if the operation is unsupported
(default).]]>
BlockLocation( { "localhost:9866" }, { "localhost" }, 0, file.getLen())In HDFS, if file is three-replicated, the returned array contains elements like:
BlockLocation(offset: 0, length: BLOCK_SIZE, hosts: {"host1:9866", "host2:9866, host3:9866"}) BlockLocation(offset: BLOCK_SIZE, length: BLOCK_SIZE, hosts: {"host2:9866", "host3:9866, host4:9866"})And if a file is erasure-coded, the returned BlockLocation are logical block groups. Suppose we have a RS_3_2 coded file (3 data units and 2 parity units). 1. If the file size is less than one stripe size, say 2 * CELL_SIZE, then there will be one BlockLocation returned, with 0 offset, actual file size and 4 hosts (2 data blocks and 2 parity blocks) hosting the actual blocks. 3. If the file size is less than one group size but greater than one stripe size, then there will be one BlockLocation returned, with 0 offset, actual file size with 5 hosts (3 data blocks and 2 parity blocks) hosting the actual blocks. 4. If the file size is greater than one group size, 3 * BLOCK_SIZE + 123 for example, then the result will be like:
BlockLocation(offset: 0, length: 3 * BLOCK_SIZE, hosts: {"host1:9866", "host2:9866","host3:9866","host4:9866","host5:9866"}) BlockLocation(offset: 3 * BLOCK_SIZE, length: 123, hosts: {"host1:9866", "host4:9866", "host5:9866"})@param file FilesStatus to get data from @param start offset into the given file @param len length for which to get locations for @throws IOException IO failure]]>
If OVERWRITE option is not passed as an argument, rename fails if the dst already exists.
If OVERWRITE option is passed as an argument, rename overwrites the dst if it is a file or an empty directory. Rename fails if dst is a non-empty directory.
Note that atomicity of rename is dependent on the file system implementation. Please refer to the file system documentation for details. This default implementation is non atomic.
This method is deprecated since it is a temporary method added to support the transition from FileSystem to FileContext for user applications. @param src path to be renamed @param dst new path after rename @throws FileNotFoundException src path does not exist, or the parent path of dst does not exist. @throws FileAlreadyExistsException dest path exists and is a file @throws ParentNotDirectoryException if the parent path of dest is not a directory @throws IOException on failure]]>
true
if the file has been truncated to the desired
newLength
and is immediately available to be reused for
write operations such as append
, or
false
if a background process of adjusting the length of
the last block has been started, and clients should wait for it to
complete before proceeding with further file updates.
@throws IOException IO failure
@throws UnsupportedOperationException if the operation is unsupported
(default).]]>
Will not return null. Expect IOException upon access error. @param f given path @return the statuses of the files/directories in the given patch @throws FileNotFoundException when the path does not exist @throws IOException see specific implementation]]>
A filename pattern is composed of regular characters and special pattern matching characters, which are:
The local implementation is {@link LocalFileSystem} and distributed implementation is DistributedFileSystem. There are other implementations for object stores and (outside the Apache Hadoop codebase), third party filesystems.
Notes
FilterFileSystem
itself simply overrides all methods of
FileSystem
with versions that
pass all requests to the contained file
system. Subclasses of FilterFileSystem
may further override some of these methods
and may also provide additional methods
and fields.]]>
file
]]>
pathname
should be included]]>
ftp
]]>
Consult the filesystem specification document for the requirements of an implementation of this interface.]]>
This is designed to affordable to use in log statements. @param source source of statistics -may be null. @return an object whose toString() operation returns the current values.]]>
This is for use in log statements where for the cost of creation of this entry is low; it is affordable to use in log statements. @param statistics statistics to stringify -may be null. @return an object whose toString() operation returns the current values.]]>
It is annotated for correct serializations with jackson2.
]]>The instance can be serialized, and its {@code toString()} method lists all the values. @param statistics statistics @return a snapshot of the current values.]]>
If a statistic has 0 samples then it is considered to be empty.
All 'empty' statistics are equivalent, independent of the sum value.
For non-empty statistics, sum and sample values must match for equality.
It is serializable and annotated for correct serializations with jackson2.
Thread safety. The operations to add/copy sample data, are thread safe.
So is the {@link #mean()} method. This ensures that when used to aggregated statistics, the aggregate value and sample count are set and evaluated consistently.
Other methods marked as synchronized because Findbugs overreacts to the idea that some operations to update sum and sample count are synchronized, but that things like equals are not.
]]>Fencing is configured by the operator as an ordered list of methods to attempt. Each method will be tried in turn, and the next in the list will only be attempted if the previous one fails. See {@link NodeFencer} for more information.
If an implementation also implements {@link Configurable} then its
setConf
method will be called upon instantiation.]]>
Compared with ObjectWritable
, this class is much more effective,
because ObjectWritable
will append the class declaration as a String
into the output file in every Key-Value pair.
Generic Writable implements {@link Configurable} interface, so that it will be configured by the framework. The configuration is passed to the wrapped objects implementing {@link Configurable} interface before deserialization.
how to use it:getTypes()
, defines
the classes which will be wrapped in GenericObject in application.
Attention: this classes defined in getTypes()
method, must
implement Writable
interface.
@since Nov 8, 2006]]>public class GenericObject extends GenericWritable { private static Class[] CLASSES = { ClassType1.class, ClassType2.class, ClassType3.class, }; protected Class[] getTypes() { return CLASSES; } }
data
file,
containing all keys and values in the map, and a smaller index
file, containing a fraction of the keys. The fraction is determined by
{@link Writer#getIndexInterval()}.
The index file is read entirely into memory. Thus key implementations should try to keep themselves small.
Map files are created by adding entries in-order. To maintain a large database, perform updates by copying the previous version of a database and merging in a sorted change list, to create a new version of the database in a new file. Sorting large change lists can be done with {@link SequenceFile.Sorter}.]]>
SequenceFile
provides {@link SequenceFile.Writer},
{@link SequenceFile.Reader} and {@link Sorter} classes for writing,
reading and sorting respectively.
SequenceFile
Writer
s based on the
{@link CompressionType} used to compress key/value pairs:
Writer
: Uncompressed records.
RecordCompressWriter
: Record-compressed files, only compress
values.
BlockCompressWriter
: Block-compressed files, both keys &
values are collected in 'blocks'
separately and compressed. The size of
the 'block' is configurable.
The actual compression algorithm used to compress key and/or values can be specified by using the appropriate {@link CompressionCodec}.
The recommended way is to use the static createWriter methods
provided by the SequenceFile
to chose the preferred format.
The {@link SequenceFile.Reader} acts as the bridge and can read any of the
above SequenceFile
formats.
Essentially there are 3 different formats for SequenceFile
s
depending on the CompressionType
specified. All of them share a
common header described below.
CompressionCodec
class which is used for
compression of keys and/or values (if compression is
enabled).
100
kilobytes or so.
100
kilobytes or so.
The compressed blocks of key lengths and value lengths consist of the actual lengths of individual keys/values encoded in ZeroCompressedInteger format.
@see CompressionCodec]]>start
. The starting
position is measured in bytes and the return value is in
terms of byte position in the buffer. The backing buffer is
not converted to a string for this operation.
@return byte position of the first occurrence of the search
string in the UTF-8 buffer or -1 if not found]]>
new byte[0]
).]]>
Also includes utilities for serializing/deserialing a string, coding/decoding a string, checking if a byte array contains valid UTF8 code, calculating the length of an encoded string.]]>
DataOuput
to serialize this object into.
@throws IOException]]>
For efficiency, implementations should attempt to re-use storage in the existing object where possible.
@param inDataInput
to deseriablize this object from.
@throws IOException]]>
key
or value
type in the Hadoop Map-Reduce
framework implements this interface.
Implementations typically implement a static read(DataInput)
method which constructs a new instance, calls {@link #readFields(DataInput)}
and returns the instance.
Example:
]]>public class MyWritable implements Writable { // Some data private int counter; private long timestamp; // Default constructor to allow (de)serialization MyWritable() { } public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public static MyWritable read(DataInput in) throws IOException { MyWritable w = new MyWritable(); w.readFields(in); return w; } }
WritableComparable
s can be compared to each other, typically
via Comparator
s. Any type which is to be used as a
key
in the Hadoop Map-Reduce framework should implement this
interface.
Note that hashCode()
is frequently used in Hadoop to partition
keys. It's important that your implementation of hashCode() returns the same
result across different instances of the JVM. Note also that the default
hashCode()
implementation in Object
does not
satisfy this property.
Example:
]]>public class MyWritableComparable implements WritableComparable{@literal} { // Some data private int counter; private long timestamp; public void write(DataOutput out) throws IOException { out.writeInt(counter); out.writeLong(timestamp); } public void readFields(DataInput in) throws IOException { counter = in.readInt(); timestamp = in.readLong(); } public int compareTo(MyWritableComparable o) { int thisValue = this.value; int thatValue = o.value; return (thisValue < thatValue ? -1 : (thisValue==thatValue ? 0 : 1)); } public int hashCode() { final int prime = 31; int result = 1; result = prime * result + counter; result = prime * result + (int) (timestamp ^ (timestamp >>> 32)); return result } }
One may optimize compare-intensive operations by overriding {@link #compare(byte[],int,int,byte[],int,int)}. Static utility methods are provided to assist in optimized implementations of this method.]]>
Compressor
@param conf the Configuration
object which contains confs for creating or reinit the compressor
@return Compressor
for the given
CompressionCodec
from the pool or a new one]]>
Decompressor
@return Decompressor
for the given
CompressionCodec
the pool or a new one]]>
The code alias is the short class name (without the package name). If the short class name ends with 'Codec', then there are two aliases for the codec, the complete short class name and the short class name without the 'Codec' ending. For example for the 'GzipCodec' codec class name the alias are 'gzip' and 'gzipcodec'. @param codecName the canonical class name of the codec @return the codec object]]>
The code alias is the short class name (without the package name). If the short class name ends with 'Codec', then there are two aliases for the codec, the complete short class name and the short class name without the 'Codec' ending. For example for the 'GzipCodec' codec class name the alias are 'gzip' and 'gzipcodec'. @param codecName the canonical class name of the codec @return the codec class]]>
b[]
remain unmodified until
the caller is explicitly notified--via {@link #needsInput()}--that the
buffer may be safely modified. With this requirement, an extra
buffer-copy can be avoided.)
@param b Input data
@param off Start offset
@param len Length]]>
true
if the input data buffer is empty and
{@link #setInput(byte[], int, int)} should be called in
order to provide more input.]]>
true
if a preset dictionary is needed for decompression]]>
true
and {@link #getRemaining()}
returns a positive value. finished() will be reset with the
{@link #reset()} method.
@return true
if the end of the decompressed
data output stream has been reached.]]>
true
and getRemaining() returns
a zero value, indicates that the end of data stream has been reached and
is not a concatenated data stream.
@return The number of bytes remaining in the compressed data buffer.]]>
false
when reset() is called.]]>
The behavior of TFile can be customized by the following variables through Configuration:
Suggestions on performance optimization.
To add a new serialization framework write an implementation of {@link org.apache.hadoop.io.serializer.Serialization} and add its name to the "io.serializations" property.
]]>Use {@link org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization} for serialization of classes generated by Avro's 'specific' compiler.
Use {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization} for other classes. {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization} work for any class which is either in the package list configured via {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerialization#AVRO_REFLECT_PACKAGES} or implement {@link org.apache.hadoop.io.serializer.avro.AvroReflectSerializable} interface.
]]>basepath
property. All metrics will be logged to a file in the
current interval's directory in a file named <hostname>.log, where
<hostname> is the name of the host on which the metrics logging
process is running. The base path is set by the
<prefix>.sink.<instance>.basepath
property. The
time zone used to create the current interval's directory name is GMT. If
the basepath
property isn't specified, it will default to
"/tmp", which is the temp directory on whatever default file
system is configured for the cluster.
The <prefix>.sink.<instance>.ignore-error
property controls whether an exception is thrown when an error is encountered
writing a log file. The default value is true
. When set to
false
, file errors are quietly swallowed.
The roll-interval
property sets the amount of time before
rolling the directory. The default value is 1 hour. The roll interval may
not be less than 1 minute. The property's value should be given as
number unit, where number is an integer value, and
unit is a valid unit. Valid units are minute, hour,
and day. The units are case insensitive and may be abbreviated or
plural. If no units are specified, hours are assumed. For example,
"2", "2h", "2 hour", and
"2 hours" are all valid ways to specify two hours.
The roll-offset-interval-millis
property sets the upper
bound on a random time interval (in milliseconds) that is used to delay
before the initial roll. All subsequent rolls will happen an integer
number of roll intervals after the initial roll, hence retaining the original
offset. The purpose of this property is to insert some variance in the roll
times so that large clusters using this sink on every node don't cause a
performance impact on HDFS by rolling simultaneously. The default value is
30000 (30s). When writing to HDFS, as a rule of thumb, the roll offset in
millis should be no less than the number of sink instances times 5.
The primary use of this class is for logging to HDFS. As it uses {@link org.apache.hadoop.fs.FileSystem} to access the target file system, however, it can be used to write to the local file system, Amazon S3, or any other supported file system. The base path for the sink will determine the file system used. An unqualified path will write to the default file system set by the configuration.
Not all file systems support the ability to append to files. In file
systems without the ability to append to files, only one writer can write to
a file at a time. To allow for concurrent writes from multiple daemons on a
single host, the source
property is used to set unique headers
for the log files. The property should be set to the name of
the source daemon, e.g. namenode. The value of the
source
property should typically be the same as the property's
prefix. If this property is not set, the source is taken to be
unknown.
Instead of appending to an existing file, by default the sink will create a new file with a suffix of ".<n>", where n is the next lowest integer that isn't already used in a file name, similar to the Hadoop daemon logs. NOTE: the file with the highest sequence number is the newest file, unlike the Hadoop daemon logs.
For file systems that allow append, the sink supports appending to the
existing file instead. If the allow-append
property is set to
true, the sink will instead append to the existing file on file systems that
support appends. By default, the allow-append
property is
false.
Note that when writing to HDFS with allow-append
set to true,
there is a minimum acceptable number of data nodes. If the number of data
nodes drops below that minimum, the append will succeed, but reading the
data will fail with an IOException in the DataStreamer class. The minimum
number of data nodes required for a successful append is generally 2 or
3.
Note also that when writing to HDFS, the file size information is not updated until the file is closed (at the end of the interval) even though the data is being written successfully. This is a known HDFS limitation that exists because of the performance cost of updating the metadata. See HDFS-5478.
When using this sink in a secure (Kerberos) environment, two additional
properties must be set: keytab-key
and
principal-key
. keytab-key
should contain the key by
which the keytab file can be found in the configuration, for example,
yarn.nodemanager.keytab
. principal-key
should
contain the key by which the principal can be found in the configuration,
for example, yarn.nodemanager.principal
.]]>
*.sink.statsd.class=org.apache.hadoop.metrics2.sink.StatsDSink [prefix].sink.statsd.server.host= [prefix].sink.statsd.server.port= [prefix].sink.statsd.skip.hostname=true|false (optional) [prefix].sink.statsd.service.name=NameNode (name you want for service)]]>
This class does not extend the Configured
base class, and should not be changed to do so, as it causes problems
for subclasses. The constructor of the Configured
calls
the {@link #setConf(Configuration)} method, which will call into the
subclasses before they have been fully constructed.]]>
RawScriptBasedMapping
that performs
the work: reading the configuration parameters, executing any defined
script, handling errors and such like. The outer
class extends {@link CachedDNSToSwitchMapping} to cache the delegated
queries.
This DNS mapper's {@link #isSingleSwitch()} predicate returns true if and only if a script is defined.]]>
This class uses the configuration parameter {@code net.topology.table.file.name} to locate the mapping file.
Calls to {@link #resolve(List)} will look up the address as defined in the mapping file. If no entry corresponding to the address is found, the value {@code /default-rack} is returned.
]]>An instance of the default {@link DelegationTokenAuthenticator} will be used.]]>
null
the default one will be used.]]>
null
the default one will be used.
@param connConfigurator a connection configurator.]]>
TRUE
if the token is transmitted in the
URL query string, FALSE
if the delegation token is transmitted
using the {@link DelegationTokenAuthenticator#DELEGATION_TOKEN_HEADER} HTTP
header.]]>
FALSE
if the delegation token is transmitted using the
{@link DelegationTokenAuthenticator#DELEGATION_TOKEN_HEADER} HTTP header.]]>
doAs
parameter is not NULL,
the request will be done on behalf of the specified doAs
user.
@param url the URL to connect to. Only HTTP/S URLs are supported.
@param token the authentication token being used for the user.
@param doAs user to do the the request on behalf of, if NULL the request is
as self.
@return an authenticated {@link HttpURLConnection}.
@throws IOException if an IO error occurred.
@throws AuthenticationException if an authentication exception occurred.]]>
The authentication mechanisms supported by default are Hadoop Simple authentication (also known as pseudo authentication) and Kerberos SPNEGO authentication.
Additional authentication mechanisms can be supported via {@link DelegationTokenAuthenticator} implementations.
The default {@link DelegationTokenAuthenticator} is the {@link KerberosDelegationTokenAuthenticator} class which supports automatic fallback from Kerberos SPNEGO to Hadoop Simple authentication via the {@link PseudoDelegationTokenAuthenticator} class.
AuthenticatedURL
instances are not thread-safe.]]>
It falls back to the {@link PseudoDelegationTokenAuthenticator} if the HTTP endpoint does not trigger a SPNEGO authentication]]>
This mimics the model of Hadoop Simple authentication trusting the {@link UserGroupInformation#getCurrentUser()} value.]]>
The service state is checked before the operation begins. This process is not thread safe. @param service a service or null]]>
This permits implementations to change the configuration before the init operation. As the ServiceLauncher only creates an instance of the base {@link Configuration} class, it is recommended to instantiate any subclass (such as YarnConfiguration) that injects new resources.
@param config the initial configuration build up by the service launcher. @param args list of arguments passed to the command line after any launcher-specific commands have been stripped. @return the configuration to init the service with. Recommended: pass down the config parameter with any changes @throws Exception any problem]]>
If an exception is raised, the policy is:
Approximate HTTP equivalent: {@code 505: Version Not Supported}]]>
Many of the exit codes are designed to resemble HTTP error codes, squashed into a single byte. e.g 44 , "not found" is the equivalent of 404. The various 2XX HTTP error codes aren't followed; the Unix standard of "0" for success is used.
0-10: general command issues 30-39: equivalent to the 3XX responses, where those responses are considered errors by the application. 40-49: client-side/CLI/config problems 50-59: service-side problems. 60+ : application specific error codes]]>
If the last argument is a throwable, it becomes the cause of the exception. It will also be used as a parameter for the format. @param exitCode exit code @param format format for message to use in exception @param args list of arguments]]>
Progressable
to explicitly report progress to the Hadoop framework. This is especially
important for operations which take significant amount of time since,
in-lieu of the reported progress, the framework has to assume that an error
has occurred and time-out the operation.]]>
Class
of the given object.]]>
".sh"
otherwise.
@param parent File parent directory
@param basename String script file basename
@return File referencing the script in the directory]]>
".sh"
otherwise.
@param basename String script file basename
@return String script file name]]>
Shell
processes and destroys them one by one. This method is thread safe]]>
@deprecated use one of the exception-raising getter methods, specifically {@link #getWinUtilsPath()} or {@link #getWinUtilsFile()}]]>
du
or
df
. It also offers facilities to gate commands by
time-intervals.]]>
ShutdownHookManager
singleton.]]>
TimeUnit
]]>
The JVM runs ShutdownHooks in a non-deterministic order or in parallel. This class registers a single JVM shutdownHook and run all the shutdownHooks registered to it (to this class) in order based on their priority. Unless a hook was registered with a shutdown explicitly set through {@link #addShutdownHook(Runnable, int, long, TimeUnit)}, the shutdown time allocated to it is set by the configuration option {@link CommonConfigurationKeysPublic#SERVICE_SHUTDOWN_TIMEOUT} in {@code core-site.xml}, with a default value of {@link CommonConfigurationKeysPublic#SERVICE_SHUTDOWN_TIMEOUT_DEFAULT} seconds.]]>
Tool
, is the standard for any Map-Reduce tool/application.
The tool/application should delegate the handling of
standard command-line options to {@link ToolRunner#run(Tool, String[])}
and only handle its custom arguments.
Here is how a typical Tool
is implemented:
public class MyApp extends Configured implements Tool { public int run(String[] args) throws Exception { //Configuration
processed byToolRunner
Configuration conf = getConf(); // Create a JobConf using the processedconf
JobConf job = new JobConf(conf, MyApp.class); // Process custom command-line options Path in = new Path(args[1]); Path out = new Path(args[2]); // Specify various job-specific parameters job.setJobName("my-app"); job.setInputPath(in); job.setOutputPath(out); job.setMapperClass(MyMapper.class); job.setReducerClass(MyReducer.class); // Submit the job, then poll for progress until the job is complete RunningJob runningJob = JobClient.runJob(job); if (runningJob.isSuccessful()) { return 0; } else { return 1; } } public static void main(String[] args) throws Exception { // LetToolRunner
handle generic command-line options int res = ToolRunner.run(new Configuration(), new MyApp(), args); System.exit(res); } }
@see GenericOptionsParser @see ToolRunner]]>
Configuration
, or builds one if null.
Sets the Tool
's configuration with the possibly modified
version of the conf
.
@param conf Configuration
for the Tool
.
@param tool Tool
to run.
@param args command-line arguments to the tool.
@return exit code of the {@link Tool#run(String[])} method.]]>
Configuration
.
Equivalent to run(tool.getConf(), tool, args)
.
@param tool Tool
to run.
@param args command-line arguments to the tool.
@return exit code of the {@link Tool#run(String[])} method.]]>
ToolRunner
can be used to run classes implementing
Tool
interface. It works in conjunction with
{@link GenericOptionsParser} to parse the
generic hadoop command line arguments and modifies the
Configuration
of the Tool
. The
application-specific options are passed along without being modified.
@see Tool
@see GenericOptionsParser]]>
The Bloom filter is a data structure that was introduced in 1970 and that has been adopted by the networking research community in the past decade thanks to the bandwidth efficiencies that it offers for the transmission of set membership information between networked hosts. A sender encodes the information into a bit vector, the Bloom filter, that is more compact than a conventional representation. Computation and space costs for construction are linear in the number of elements. The receiver uses the filter to test whether various elements are members of the set. Though the filter will occasionally return a false positive, it will never return a false negative. When creating the filter, the sender can choose its desired point in a trade-off between the false positive rate and the size.
Originally created by European Commission One-Lab Project 034819. @see Filter The general behavior of a filter @see Space/Time Trade-Offs in Hash Coding with Allowable Errors]]>
Invariant: nothing happens if the specified key does not belong to this counter Bloom filter. @param key The key to remove.]]>
NOTE: due to the bucket size of this filter, inserting the same
key more than 15 times will cause an overflow at all filter positions
associated with this key, and it will significantly increase the error
rate for this and other keys. For this reason the filter can only be
used to store small count values 0 <= N << 15
.
@param key key to be tested
@return 0 if the key is not present. Otherwise, a positive value v will
be returned such that v == count
with probability equal to the
error rate of this filter, and v > count
otherwise.
Additionally, if the filter experienced an underflow as a result of
{@link #delete(Key)} operation, the return value may be lower than the
count
with the probability of the false negative rate of such
filter.]]>
A counting Bloom filter is an improvement to standard a Bloom filter as it allows dynamic additions and deletions of set membership information. This is achieved through the use of a counting vector instead of a bit vector.
Originally created by European Commission One-Lab Project 034819. @see Filter The general behavior of a filter @see Summary cache: a scalable wide-area web cache sharing protocol]]>
A dynamic Bloom filter (DBF) makes use of a s * m
bit matrix but
each of the s
rows is a standard Bloom filter. The creation
process of a DBF is iterative. At the start, the DBF is a 1 * m
bit matrix, i.e., it is composed of a single standard Bloom filter.
It assumes that nr
elements are recorded in the
initial bit vector, where nr {@literal <=} n
(n
is the cardinality of the set A
to record in
the filter).
As the size of A
grows during the execution of the application,
several keys must be inserted in the DBF. When inserting a key into the DBF,
one must first get an active Bloom filter in the matrix. A Bloom filter is
active when the number of recorded keys, nr
, is
strictly less than the current cardinality of A
, n
.
If an active Bloom filter is found, the key is inserted and
nr
is incremented by one. On the other hand, if there
is no active Bloom filter, a new one is created (i.e., a new row is added to
the matrix) according to the current size of A
and the element
is added in this new Bloom filter and the nr
value of
this new Bloom filter is set to one. A given key is said to belong to the
DBF if the k
positions are set to one in one of the matrix rows.
Originally created by European Commission One-Lab Project 034819. @see Filter The general behavior of a filter @see BloomFilter A Bloom filter @see Theory and Network Applications of Dynamic Bloom Filters]]>
Invariant: if the false positive is null
, nothing happens.
@param key The false positive key to add.]]>
It allows the removal of selected false positives at the cost of introducing random false negatives, and with the benefit of eliminating some random false positives at the same time.
Originally created by European Commission One-Lab Project 034819. @see Filter The general behavior of a filter @see BloomFilter A Bloom filter @see RemoveScheme The different selective clearing algorithms @see Retouched Bloom Filters: Allowing Networked Applications to Trade Off Selected False Positives Against False Negatives]]>
One key feature is that the {@link #awaitFuture(Future)} and {@link #awaitFuture(Future, long, TimeUnit)} calls will extract and rethrow exceptions raised in the future's execution, including extracting the inner IOException of any {@code UncheckedIOException} raised in the future. This makes it somewhat easier to execute IOException-raising code inside futures.
]]>