The most important difference is that unlike GFS, Hadoop DFS files have strictly one writer at any one time. Bytes are always appended to the end of the writer's stream. There is no notion of "record appends" or "mutations" that are then checked or reordered. Writers simply emit a byte stream. That byte stream is guaranteed to be stored in the order written.
]]>The client will then have to contact one of the indicated DataNodes to obtain the actual data. @param src file name @param offset range start offset @param length range length @return file length and array of blocks with their locations @throws IOException @throws UnresolvedLinkException if the path contains a symlink. @throws FileNotFoundException if the path does not exist.]]>
Once created, the file is visible and available for read to other clients. Although, other clients cannot {@link #delete(String, boolean)}, re-create or {@link #rename(String, String)} it until the file is completed or explicitly as a result of lease expiration.
Blocks have a maximum size. Clients that intend to create multi-block files must also use {@link #addBlock(String, String, Block, DatanodeInfo[])}. @param src path of the file being created. @param masked masked permission. @param clientName name of the current client. @param flag indicates whether the file should be overwritten if it already exists or create if it does not exist or append. @param createParent create missing parent directory if true @param replication block replication factor. @param blockSize maximum block size. @throws AccessControlException if permission to create file is denied by the system. As usually on the client side the exception will be wrapped into {@link org.apache.hadoop.ipc.RemoteException}. @throws QuotaExceededException if the file creation violates any quota restriction @throws IOException if other errors occur. @throws UnresolvedLinkException if the path contains a symlink. @throws AlreadyBeingCreatedException if the path does not exist. @throws NSQuotaExceededException if the namespace quota is exceeded.]]>
Without OVERWRITE option, rename fails if the dst already exists. With OVERWRITE option, rename overwrites the dst, if it is a file or an empty directory. Rename fails if dst is a non-empty directory.
This implementation of rename is atomic.
@param src existing file or directory name. @param dst new name. @param options Rename options @throws IOException if rename failed @throws UnresolvedLinkException if the path contains a symlink.]]>
Safe mode is entered automatically at name node startup. Safe mode can also be entered manually using {@link #setSafeMode(FSConstants.SafeModeAction) setSafeMode(SafeModeAction.SAFEMODE_GET)}.
At startup the name node accepts data node reports collecting information about block locations. In order to leave safe mode it needs to collect a configurable percentage called threshold of blocks, which satisfy the minimal replication condition. The minimal replication condition is that each block must have at least dfs.namenode.replication.min replicas. When the threshold is reached the name node extends safe mode for a configurable amount of time to let the remaining data nodes to check in before it will start replicating missing blocks. Then the name node leaves safe mode.
If safe mode is turned on manually using {@link #setSafeMode(FSConstants.SafeModeAction) setSafeMode(SafeModeAction.SAFEMODE_ENTER)} then the name node stays in safe mode until it is manually turned off using {@link #setSafeMode(FSConstants.SafeModeAction) setSafeMode(SafeModeAction.SAFEMODE_LEAVE)}. Current state of the name node can be verified using {@link #setSafeMode(FSConstants.SafeModeAction) setSafeMode(SafeModeAction.SAFEMODE_GET)}
SYNOPSIS
To start: bin/start-balancer.sh [-threshold] Example: bin/ start-balancer.sh start the balancer with a default threshold of 10% bin/ start-balancer.sh -threshold 5 start the balancer with a threshold of 5% To stop: bin/ stop-balancer.sh
DESCRIPTION
The threshold parameter is a fraction in the range of (0%, 100%) with a default value of 10%. The threshold sets a target for whether the cluster is balanced. A cluster is balanced if for each datanode, the utilization of the node (ratio of used space at the node to total capacity of the node) differs from the utilization of the (ratio of used space in the cluster to total capacity of the cluster) by no more than the threshold value. The smaller the threshold, the more balanced a cluster will become. It takes more time to run the balancer for small threshold values. Also for a very small threshold the cluster may not be able to reach the balanced state when applications write and delete files concurrently.
The tool moves blocks from highly utilized datanodes to poorly utilized datanodes iteratively. In each iteration a datanode moves or receives no more than the lesser of 10G bytes or the threshold fraction of its capacity. Each iteration runs no more than 20 minutes. At the end of each iteration, the balancer obtains updated datanodes information from the namenode.
A system property that limits the balancer's use of bandwidth is defined in the default configuration file:
dfs.balance.bandwidthPerSec 1048576 Specifies the maximum bandwidth that each datanode can utilize for the balancing purpose in term of the number of bytes per second.
This property determines the maximum speed at which a block will be moved from one datanode to another. The default value is 1MB/s. The higher the bandwidth, the faster a cluster can reach the balanced state, but with greater competition with application processes. If an administrator changes the value of this property in the configuration file, the change is observed when HDFS is next restarted.
MONITERING BALANCER PROGRESS
After the balancer is started, an output file name where the balancer progress will be recorded is printed on the screen. The administrator can monitor the running of the balancer by reading the output file. The output shows the balancer's status iteration by iteration. In each iteration it prints the starting time, the iteration number, the total number of bytes that have been moved in the previous iterations, the total number of bytes that are left to move in order for the cluster to be balanced, and the number of bytes that are being moved in this iteration. Normally "Bytes Already Moved" is increasing while "Bytes Left To Move" is decreasing.
Running multiple instances of the balancer in an HDFS cluster is prohibited by the tool.
The balancer automatically exits when any of the following five conditions is satisfied:
Upon exit, a balancer returns an exit code and prints one of the following messages to the output file in corresponding to the above exit reasons:
The administrator can interrupt the execution of the balancer at any time by running the command "stop-balancer.sh" on the machine where the balancer is running.]]>
false
otherwise.
@throws IOException
@see StorageDirectory#lock()]]>
Local storage can reside in multiple directories. Each directory should contain the same VERSION file as the others. During startup Hadoop servers (name-node and data-nodes) read their local storage information from them.
The servers hold a lock for each storage directory while they run so that other nodes were not able to startup sharing the same storage. The locks are released when the servers stop (normally or abnormally).]]>
If locking is supported we guarantee exculsive access to the storage directory. Otherwise, no guarantee is given. @throws IOException if locking fails]]>
For the metrics that are sampled and averaged, one must specify a metrics context that does periodic update calls. Most metrics contexts do. The default Null metrics context however does NOT. So if you aren't using any other metrics context then you can turn on the viewing and averaging of sampled metrics by specifying the following two lines in the hadoop-meterics.properties file:
dfs.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread dfs.period=10
Note that the metrics are collected regardless of the context used. The context with the update thread is used to average the data periodically Impl details: We use a dynamic mbean that gets the list of the metrics from the metrics registry passed as an argument to the constructor]]>
{@link #blocksRead}.inc()]]>
Finally, the namenode returns its namespaceID as the registrationID for the datanodes. namespaceID is a persistent attribute of the name space. The registrationID is checked every time the datanode is communicating with the namenode. Datanodes with inappropriate registrationID are rejected. If the namenode stops, and then restarts it can restore its namespaceID and will continue serving the datanodes that has previously registered with the namenode without restarting the whole cluster. @see org.apache.hadoop.hdfs.server.datanode.DataNode#register()]]>
ugi=<ugi in RPC>
ip=<remote IP>
cmd=<command>
src=<src path>
dst=<dst path (optional)>
perm=<permissions (optional)>
]]>
zero
in the conf.
@param conf confirguration
@throws IOException]]>
{@link #filesTotal}.set()]]>
dfs.class=org.apache.hadoop.metrics.spi.NullContextWithUpdateThread dfs.period=10
Note that the metrics are collected regardless of the context used. The context with the update thread is used to average the data periodically Impl details: We use a dynamic mbean that gets the list of the metrics from the metrics registry passed as an argument to the constructor]]>
{@link #syncs}.inc()]]>
size
.
@see org.apache.hadoop.hdfs.server.balancer.Balancer
@param datanode a data node
@param size requested size
@return a list of blocks & their locations
@throws RemoteException if size is less than or equal to 0 or
datanode does not exist]]>