513 lines
26 KiB
Plaintext
513 lines
26 KiB
Plaintext
|
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
|||
|
~~ you may not use this file except in compliance with the License.
|
|||
|
~~ You may obtain a copy of the License at
|
|||
|
~~
|
|||
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
|||
|
~~
|
|||
|
~~ Unless required by applicable law or agreed to in writing, software
|
|||
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|||
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|||
|
~~ See the License for the specific language governing permissions and
|
|||
|
~~ limitations under the License. See accompanying LICENSE file.
|
|||
|
|
|||
|
---
|
|||
|
HDFS Architecture
|
|||
|
---
|
|||
|
Dhruba Borthakur
|
|||
|
---
|
|||
|
${maven.build.timestamp}
|
|||
|
|
|||
|
%{toc|section=1|fromDepth=0}
|
|||
|
|
|||
|
HDFS Architecture
|
|||
|
|
|||
|
Introduction
|
|||
|
|
|||
|
The Hadoop Distributed File System (HDFS) is a distributed file system
|
|||
|
designed to run on commodity hardware. It has many similarities with
|
|||
|
existing distributed file systems. However, the differences from other
|
|||
|
distributed file systems are significant. HDFS is highly fault-tolerant
|
|||
|
and is designed to be deployed on low-cost hardware. HDFS provides high
|
|||
|
throughput access to application data and is suitable for applications
|
|||
|
that have large data sets. HDFS relaxes a few POSIX requirements to
|
|||
|
enable streaming access to file system data. HDFS was originally built
|
|||
|
as infrastructure for the Apache Nutch web search engine project. HDFS
|
|||
|
is part of the Apache Hadoop Core project. The project URL is
|
|||
|
{{http://hadoop.apache.org/}}.
|
|||
|
|
|||
|
Assumptions and Goals
|
|||
|
|
|||
|
Hardware Failure
|
|||
|
|
|||
|
Hardware failure is the norm rather than the exception. An HDFS
|
|||
|
instance may consist of hundreds or thousands of server machines, each
|
|||
|
storing part of the file system’s data. The fact that there are a huge
|
|||
|
number of components and that each component has a non-trivial
|
|||
|
probability of failure means that some component of HDFS is always
|
|||
|
non-functional. Therefore, detection of faults and quick, automatic
|
|||
|
recovery from them is a core architectural goal of HDFS.
|
|||
|
|
|||
|
Streaming Data Access
|
|||
|
|
|||
|
Applications that run on HDFS need streaming access to their data sets.
|
|||
|
They are not general purpose applications that typically run on general
|
|||
|
purpose file systems. HDFS is designed more for batch processing rather
|
|||
|
than interactive use by users. The emphasis is on high throughput of
|
|||
|
data access rather than low latency of data access. POSIX imposes many
|
|||
|
hard requirements that are not needed for applications that are
|
|||
|
targeted for HDFS. POSIX semantics in a few key areas has been traded
|
|||
|
to increase data throughput rates.
|
|||
|
|
|||
|
Large Data Sets
|
|||
|
|
|||
|
Applications that run on HDFS have large data sets. A typical file in
|
|||
|
HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support
|
|||
|
large files. It should provide high aggregate data bandwidth and scale
|
|||
|
to hundreds of nodes in a single cluster. It should support tens of
|
|||
|
millions of files in a single instance.
|
|||
|
|
|||
|
Simple Coherency Model
|
|||
|
|
|||
|
HDFS applications need a write-once-read-many access model for files. A
|
|||
|
file once created, written, and closed need not be changed. This
|
|||
|
assumption simplifies data coherency issues and enables high throughput
|
|||
|
data access. A Map/Reduce application or a web crawler application fits
|
|||
|
perfectly with this model. There is a plan to support appending-writes
|
|||
|
to files in the future.
|
|||
|
|
|||
|
“Moving Computation is Cheaper than Moving Data”
|
|||
|
|
|||
|
A computation requested by an application is much more efficient if it
|
|||
|
is executed near the data it operates on. This is especially true when
|
|||
|
the size of the data set is huge. This minimizes network congestion and
|
|||
|
increases the overall throughput of the system. The assumption is that
|
|||
|
it is often better to migrate the computation closer to where the data
|
|||
|
is located rather than moving the data to where the application is
|
|||
|
running. HDFS provides interfaces for applications to move themselves
|
|||
|
closer to where the data is located.
|
|||
|
|
|||
|
Portability Across Heterogeneous Hardware and Software Platforms
|
|||
|
|
|||
|
HDFS has been designed to be easily portable from one platform to
|
|||
|
another. This facilitates widespread adoption of HDFS as a platform of
|
|||
|
choice for a large set of applications.
|
|||
|
|
|||
|
NameNode and DataNodes
|
|||
|
|
|||
|
HDFS has a master/slave architecture. An HDFS cluster consists of a
|
|||
|
single NameNode, a master server that manages the file system namespace
|
|||
|
and regulates access to files by clients. In addition, there are a
|
|||
|
number of DataNodes, usually one per node in the cluster, which manage
|
|||
|
storage attached to the nodes that they run on. HDFS exposes a file
|
|||
|
system namespace and allows user data to be stored in files.
|
|||
|
Internally, a file is split into one or more blocks and these blocks
|
|||
|
are stored in a set of DataNodes. The NameNode executes file system
|
|||
|
namespace operations like opening, closing, and renaming files and
|
|||
|
directories. It also determines the mapping of blocks to DataNodes. The
|
|||
|
DataNodes are responsible for serving read and write requests from the
|
|||
|
file system’s clients. The DataNodes also perform block creation,
|
|||
|
deletion, and replication upon instruction from the NameNode.
|
|||
|
|
|||
|
|
|||
|
[images/hdfsarchitecture.png] HDFS Architecture
|
|||
|
|
|||
|
The NameNode and DataNode are pieces of software designed to run on
|
|||
|
commodity machines. These machines typically run a GNU/Linux operating
|
|||
|
system (OS). HDFS is built using the Java language; any machine that
|
|||
|
supports Java can run the NameNode or the DataNode software. Usage of
|
|||
|
the highly portable Java language means that HDFS can be deployed on a
|
|||
|
wide range of machines. A typical deployment has a dedicated machine
|
|||
|
that runs only the NameNode software. Each of the other machines in the
|
|||
|
cluster runs one instance of the DataNode software. The architecture
|
|||
|
does not preclude running multiple DataNodes on the same machine but in
|
|||
|
a real deployment that is rarely the case.
|
|||
|
|
|||
|
The existence of a single NameNode in a cluster greatly simplifies the
|
|||
|
architecture of the system. The NameNode is the arbitrator and
|
|||
|
repository for all HDFS metadata. The system is designed in such a way
|
|||
|
that user data never flows through the NameNode.
|
|||
|
|
|||
|
The File System Namespace
|
|||
|
|
|||
|
HDFS supports a traditional hierarchical file organization. A user or
|
|||
|
an application can create directories and store files inside these
|
|||
|
directories. The file system namespace hierarchy is similar to most
|
|||
|
other existing file systems; one can create and remove files, move a
|
|||
|
file from one directory to another, or rename a file. HDFS does not yet
|
|||
|
implement user quotas or access permissions. HDFS does not support hard
|
|||
|
links or soft links. However, the HDFS architecture does not preclude
|
|||
|
implementing these features.
|
|||
|
|
|||
|
The NameNode maintains the file system namespace. Any change to the
|
|||
|
file system namespace or its properties is recorded by the NameNode. An
|
|||
|
application can specify the number of replicas of a file that should be
|
|||
|
maintained by HDFS. The number of copies of a file is called the
|
|||
|
replication factor of that file. This information is stored by the
|
|||
|
NameNode.
|
|||
|
|
|||
|
Data Replication
|
|||
|
|
|||
|
HDFS is designed to reliably store very large files across machines in
|
|||
|
a large cluster. It stores each file as a sequence of blocks; all
|
|||
|
blocks in a file except the last block are the same size. The blocks of
|
|||
|
a file are replicated for fault tolerance. The block size and
|
|||
|
replication factor are configurable per file. An application can
|
|||
|
specify the number of replicas of a file. The replication factor can be
|
|||
|
specified at file creation time and can be changed later. Files in HDFS
|
|||
|
are write-once and have strictly one writer at any time.
|
|||
|
|
|||
|
The NameNode makes all decisions regarding replication of blocks. It
|
|||
|
periodically receives a Heartbeat and a Blockreport from each of the
|
|||
|
DataNodes in the cluster. Receipt of a Heartbeat implies that the
|
|||
|
DataNode is functioning properly. A Blockreport contains a list of all
|
|||
|
blocks on a DataNode.
|
|||
|
|
|||
|
[images/hdfsdatanodes.png] HDFS DataNodes
|
|||
|
|
|||
|
Replica Placement: The First Baby Steps
|
|||
|
|
|||
|
The placement of replicas is critical to HDFS reliability and
|
|||
|
performance. Optimizing replica placement distinguishes HDFS from most
|
|||
|
other distributed file systems. This is a feature that needs lots of
|
|||
|
tuning and experience. The purpose of a rack-aware replica placement
|
|||
|
policy is to improve data reliability, availability, and network
|
|||
|
bandwidth utilization. The current implementation for the replica
|
|||
|
placement policy is a first effort in this direction. The short-term
|
|||
|
goals of implementing this policy are to validate it on production
|
|||
|
systems, learn more about its behavior, and build a foundation to test
|
|||
|
and research more sophisticated policies.
|
|||
|
|
|||
|
Large HDFS instances run on a cluster of computers that commonly spread
|
|||
|
across many racks. Communication between two nodes in different racks
|
|||
|
has to go through switches. In most cases, network bandwidth between
|
|||
|
machines in the same rack is greater than network bandwidth between
|
|||
|
machines in different racks.
|
|||
|
|
|||
|
The NameNode determines the rack id each DataNode belongs to via the
|
|||
|
process outlined in {{{../hadoop-common/ClusterSetup.html#Hadoop+Rack+Awareness}Hadoop Rack Awareness}}. A simple but non-optimal policy
|
|||
|
is to place replicas on unique racks. This prevents losing data when an
|
|||
|
entire rack fails and allows use of bandwidth from multiple racks when
|
|||
|
reading data. This policy evenly distributes replicas in the cluster
|
|||
|
which makes it easy to balance load on component failure. However, this
|
|||
|
policy increases the cost of writes because a write needs to transfer
|
|||
|
blocks to multiple racks.
|
|||
|
|
|||
|
For the common case, when the replication factor is three, HDFS’s
|
|||
|
placement policy is to put one replica on one node in the local rack,
|
|||
|
another on a different node in the local rack, and the last on a
|
|||
|
different node in a different rack. This policy cuts the inter-rack
|
|||
|
write traffic which generally improves write performance. The chance of
|
|||
|
rack failure is far less than that of node failure; this policy does
|
|||
|
not impact data reliability and availability guarantees. However, it
|
|||
|
does reduce the aggregate network bandwidth used when reading data
|
|||
|
since a block is placed in only two unique racks rather than three.
|
|||
|
With this policy, the replicas of a file do not evenly distribute
|
|||
|
across the racks. One third of replicas are on one node, two thirds of
|
|||
|
replicas are on one rack, and the other third are evenly distributed
|
|||
|
across the remaining racks. This policy improves write performance
|
|||
|
without compromising data reliability or read performance.
|
|||
|
|
|||
|
The current, default replica placement policy described here is a work
|
|||
|
in progress.
|
|||
|
|
|||
|
Replica Selection
|
|||
|
|
|||
|
To minimize global bandwidth consumption and read latency, HDFS tries
|
|||
|
to satisfy a read request from a replica that is closest to the reader.
|
|||
|
If there exists a replica on the same rack as the reader node, then
|
|||
|
that replica is preferred to satisfy the read request. If angg/ HDFS
|
|||
|
cluster spans multiple data centers, then a replica that is resident in
|
|||
|
the local data center is preferred over any remote replica.
|
|||
|
|
|||
|
Safemode
|
|||
|
|
|||
|
On startup, the NameNode enters a special state called Safemode.
|
|||
|
Replication of data blocks does not occur when the NameNode is in the
|
|||
|
Safemode state. The NameNode receives Heartbeat and Blockreport
|
|||
|
messages from the DataNodes. A Blockreport contains the list of data
|
|||
|
blocks that a DataNode is hosting. Each block has a specified minimum
|
|||
|
number of replicas. A block is considered safely replicated when the
|
|||
|
minimum number of replicas of that data block has checked in with the
|
|||
|
NameNode. After a configurable percentage of safely replicated data
|
|||
|
blocks checks in with the NameNode (plus an additional 30 seconds), the
|
|||
|
NameNode exits the Safemode state. It then determines the list of data
|
|||
|
blocks (if any) that still have fewer than the specified number of
|
|||
|
replicas. The NameNode then replicates these blocks to other DataNodes.
|
|||
|
|
|||
|
The Persistence of File System Metadata
|
|||
|
|
|||
|
The HDFS namespace is stored by the NameNode. The NameNode uses a
|
|||
|
transaction log called the EditLog to persistently record every change
|
|||
|
that occurs to file system metadata. For example, creating a new file
|
|||
|
in HDFS causes the NameNode to insert a record into the EditLog
|
|||
|
indicating this. Similarly, changing the replication factor of a file
|
|||
|
causes a new record to be inserted into the EditLog. The NameNode uses
|
|||
|
a file in its local host OS file system to store the EditLog. The
|
|||
|
entire file system namespace, including the mapping of blocks to files
|
|||
|
and file system properties, is stored in a file called the FsImage. The
|
|||
|
FsImage is stored as a file in the NameNode’s local file system too.
|
|||
|
|
|||
|
The NameNode keeps an image of the entire file system namespace and
|
|||
|
file Blockmap in memory. This key metadata item is designed to be
|
|||
|
compact, such that a NameNode with 4 GB of RAM is plenty to support a
|
|||
|
huge number of files and directories. When the NameNode starts up, it
|
|||
|
reads the FsImage and EditLog from disk, applies all the transactions
|
|||
|
from the EditLog to the in-memory representation of the FsImage, and
|
|||
|
flushes out this new version into a new FsImage on disk. It can then
|
|||
|
truncate the old EditLog because its transactions have been applied to
|
|||
|
the persistent FsImage. This process is called a checkpoint. In the
|
|||
|
current implementation, a checkpoint only occurs when the NameNode
|
|||
|
starts up. Work is in progress to support periodic checkpointing in the
|
|||
|
near future.
|
|||
|
|
|||
|
The DataNode stores HDFS data in files in its local file system. The
|
|||
|
DataNode has no knowledge about HDFS files. It stores each block of
|
|||
|
HDFS data in a separate file in its local file system. The DataNode
|
|||
|
does not create all files in the same directory. Instead, it uses a
|
|||
|
heuristic to determine the optimal number of files per directory and
|
|||
|
creates subdirectories appropriately. It is not optimal to create all
|
|||
|
local files in the same directory because the local file system might
|
|||
|
not be able to efficiently support a huge number of files in a single
|
|||
|
directory. When a DataNode starts up, it scans through its local file
|
|||
|
system, generates a list of all HDFS data blocks that correspond to
|
|||
|
each of these local files and sends this report to the NameNode: this
|
|||
|
is the Blockreport.
|
|||
|
|
|||
|
The Communication Protocols
|
|||
|
|
|||
|
All HDFS communication protocols are layered on top of the TCP/IP
|
|||
|
protocol. A client establishes a connection to a configurable TCP port
|
|||
|
on the NameNode machine. It talks the ClientProtocol with the NameNode.
|
|||
|
The DataNodes talk to the NameNode using the DataNode Protocol. A
|
|||
|
Remote Procedure Call (RPC) abstraction wraps both the Client Protocol
|
|||
|
and the DataNode Protocol. By design, the NameNode never initiates any
|
|||
|
RPCs. Instead, it only responds to RPC requests issued by DataNodes or
|
|||
|
clients.
|
|||
|
|
|||
|
Robustness
|
|||
|
|
|||
|
The primary objective of HDFS is to store data reliably even in the
|
|||
|
presence of failures. The three common types of failures are NameNode
|
|||
|
failures, DataNode failures and network partitions.
|
|||
|
|
|||
|
Data Disk Failure, Heartbeats and Re-Replication
|
|||
|
|
|||
|
Each DataNode sends a Heartbeat message to the NameNode periodically. A
|
|||
|
network partition can cause a subset of DataNodes to lose connectivity
|
|||
|
with the NameNode. The NameNode detects this condition by the absence
|
|||
|
of a Heartbeat message. The NameNode marks DataNodes without recent
|
|||
|
Heartbeats as dead and does not forward any new IO requests to them.
|
|||
|
Any data that was registered to a dead DataNode is not available to
|
|||
|
HDFS any more. DataNode death may cause the replication factor of some
|
|||
|
blocks to fall below their specified value. The NameNode constantly
|
|||
|
tracks which blocks need to be replicated and initiates replication
|
|||
|
whenever necessary. The necessity for re-replication may arise due to
|
|||
|
many reasons: a DataNode may become unavailable, a replica may become
|
|||
|
corrupted, a hard disk on a DataNode may fail, or the replication
|
|||
|
factor of a file may be increased.
|
|||
|
|
|||
|
Cluster Rebalancing
|
|||
|
|
|||
|
The HDFS architecture is compatible with data rebalancing schemes. A
|
|||
|
scheme might automatically move data from one DataNode to another if
|
|||
|
the free space on a DataNode falls below a certain threshold. In the
|
|||
|
event of a sudden high demand for a particular file, a scheme might
|
|||
|
dynamically create additional replicas and rebalance other data in the
|
|||
|
cluster. These types of data rebalancing schemes are not yet
|
|||
|
implemented.
|
|||
|
|
|||
|
Data Integrity
|
|||
|
|
|||
|
It is possible that a block of data fetched from a DataNode arrives
|
|||
|
corrupted. This corruption can occur because of faults in a storage
|
|||
|
device, network faults, or buggy software. The HDFS client software
|
|||
|
implements checksum checking on the contents of HDFS files. When a
|
|||
|
client creates an HDFS file, it computes a checksum of each block of
|
|||
|
the file and stores these checksums in a separate hidden file in the
|
|||
|
same HDFS namespace. When a client retrieves file contents it verifies
|
|||
|
that the data it received from each DataNode matches the checksum
|
|||
|
stored in the associated checksum file. If not, then the client can opt
|
|||
|
to retrieve that block from another DataNode that has a replica of that
|
|||
|
block.
|
|||
|
|
|||
|
Metadata Disk Failure
|
|||
|
|
|||
|
The FsImage and the EditLog are central data structures of HDFS. A
|
|||
|
corruption of these files can cause the HDFS instance to be
|
|||
|
non-functional. For this reason, the NameNode can be configured to
|
|||
|
support maintaining multiple copies of the FsImage and EditLog. Any
|
|||
|
update to either the FsImage or EditLog causes each of the FsImages and
|
|||
|
EditLogs to get updated synchronously. This synchronous updating of
|
|||
|
multiple copies of the FsImage and EditLog may degrade the rate of
|
|||
|
namespace transactions per second that a NameNode can support. However,
|
|||
|
this degradation is acceptable because even though HDFS applications
|
|||
|
are very data intensive in nature, they are not metadata intensive.
|
|||
|
When a NameNode restarts, it selects the latest consistent FsImage and
|
|||
|
EditLog to use.
|
|||
|
|
|||
|
The NameNode machine is a single point of failure for an HDFS cluster.
|
|||
|
If the NameNode machine fails, manual intervention is necessary.
|
|||
|
Currently, automatic restart and failover of the NameNode software to
|
|||
|
another machine is not supported.
|
|||
|
|
|||
|
Snapshots
|
|||
|
|
|||
|
Snapshots support storing a copy of data at a particular instant of
|
|||
|
time. One usage of the snapshot feature may be to roll back a corrupted
|
|||
|
HDFS instance to a previously known good point in time. HDFS does not
|
|||
|
currently support snapshots but will in a future release.
|
|||
|
|
|||
|
Data Organization
|
|||
|
|
|||
|
Data Blocks
|
|||
|
|
|||
|
HDFS is designed to support very large files. Applications that are
|
|||
|
compatible with HDFS are those that deal with large data sets. These
|
|||
|
applications write their data only once but they read it one or more
|
|||
|
times and require these reads to be satisfied at streaming speeds. HDFS
|
|||
|
supports write-once-read-many semantics on files. A typical block size
|
|||
|
used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB
|
|||
|
chunks, and if possible, each chunk will reside on a different
|
|||
|
DataNode.
|
|||
|
|
|||
|
Staging
|
|||
|
|
|||
|
A client request to create a file does not reach the NameNode
|
|||
|
immediately. In fact, initially the HDFS client caches the file data
|
|||
|
into a temporary local file. Application writes are transparently
|
|||
|
redirected to this temporary local file. When the local file
|
|||
|
accumulates data worth over one HDFS block size, the client contacts
|
|||
|
the NameNode. The NameNode inserts the file name into the file system
|
|||
|
hierarchy and allocates a data block for it. The NameNode responds to
|
|||
|
the client request with the identity of the DataNode and the
|
|||
|
destination data block. Then the client flushes the block of data from
|
|||
|
the local temporary file to the specified DataNode. When a file is
|
|||
|
closed, the remaining un-flushed data in the temporary local file is
|
|||
|
transferred to the DataNode. The client then tells the NameNode that
|
|||
|
the file is closed. At this point, the NameNode commits the file
|
|||
|
creation operation into a persistent store. If the NameNode dies before
|
|||
|
the file is closed, the file is lost.
|
|||
|
|
|||
|
The above approach has been adopted after careful consideration of
|
|||
|
target applications that run on HDFS. These applications need streaming
|
|||
|
writes to files. If a client writes to a remote file directly without
|
|||
|
any client side buffering, the network speed and the congestion in the
|
|||
|
network impacts throughput considerably. This approach is not without
|
|||
|
precedent. Earlier distributed file systems, e.g. AFS, have used client
|
|||
|
side caching to improve performance. A POSIX requirement has been
|
|||
|
relaxed to achieve higher performance of data uploads.
|
|||
|
|
|||
|
Replication Pipelining
|
|||
|
|
|||
|
When a client is writing data to an HDFS file, its data is first
|
|||
|
written to a local file as explained in the previous section. Suppose
|
|||
|
the HDFS file has a replication factor of three. When the local file
|
|||
|
accumulates a full block of user data, the client retrieves a list of
|
|||
|
DataNodes from the NameNode. This list contains the DataNodes that will
|
|||
|
host a replica of that block. The client then flushes the data block to
|
|||
|
the first DataNode. The first DataNode starts receiving the data in
|
|||
|
small portions (4 KB), writes each portion to its local repository and
|
|||
|
transfers that portion to the second DataNode in the list. The second
|
|||
|
DataNode, in turn starts receiving each portion of the data block,
|
|||
|
writes that portion to its repository and then flushes that portion to
|
|||
|
the third DataNode. Finally, the third DataNode writes the data to its
|
|||
|
local repository. Thus, a DataNode can be receiving data from the
|
|||
|
previous one in the pipeline and at the same time forwarding data to
|
|||
|
the next one in the pipeline. Thus, the data is pipelined from one
|
|||
|
DataNode to the next.
|
|||
|
|
|||
|
Accessibility
|
|||
|
|
|||
|
HDFS can be accessed from applications in many different ways.
|
|||
|
Natively, HDFS provides a
|
|||
|
{{{http://hadoop.apache.org/docs/current/api/}FileSystem Java API}}
|
|||
|
for applications to use. A C language wrapper for this Java API is also
|
|||
|
available. In addition, an HTTP browser can also be used to browse the files
|
|||
|
of an HDFS instance. Work is in progress to expose HDFS through the WebDAV
|
|||
|
protocol.
|
|||
|
|
|||
|
FS Shell
|
|||
|
|
|||
|
HDFS allows user data to be organized in the form of files and
|
|||
|
directories. It provides a commandline interface called FS shell that
|
|||
|
lets a user interact with the data in HDFS. The syntax of this command
|
|||
|
set is similar to other shells (e.g. bash, csh) that users are already
|
|||
|
familiar with. Here are some sample action/command pairs:
|
|||
|
|
|||
|
*---------+---------+
|
|||
|
|| Action | Command
|
|||
|
*---------+---------+
|
|||
|
| Create a directory named <<</foodir>>> | <<<bin/hadoop dfs -mkdir /foodir>>>
|
|||
|
*---------+---------+
|
|||
|
| Remove a directory named <<</foodir>>> | <<<bin/hadoop dfs -rmr /foodir>>>
|
|||
|
*---------+---------+
|
|||
|
| View the contents of a file named <<</foodir/myfile.txt>>> | <<<bin/hadoop dfs -cat /foodir/myfile.txt>>>
|
|||
|
*---------+---------+
|
|||
|
|
|||
|
FS shell is targeted for applications that need a scripting language to
|
|||
|
interact with the stored data.
|
|||
|
|
|||
|
DFSAdmin
|
|||
|
|
|||
|
The DFSAdmin command set is used for administering an HDFS cluster.
|
|||
|
These are commands that are used only by an HDFS administrator. Here
|
|||
|
are some sample action/command pairs:
|
|||
|
|
|||
|
*---------+---------+
|
|||
|
|| Action | Command
|
|||
|
*---------+---------+
|
|||
|
|Put the cluster in Safemode | <<<bin/hadoop dfsadmin -safemode enter>>>
|
|||
|
*---------+---------+
|
|||
|
|Generate a list of DataNodes | <<<bin/hadoop dfsadmin -report>>>
|
|||
|
*---------+---------+
|
|||
|
|Recommission or decommission DataNode(s) | <<<bin/hadoop dfsadmin -refreshNodes>>>
|
|||
|
*---------+---------+
|
|||
|
|
|||
|
Browser Interface
|
|||
|
|
|||
|
A typical HDFS install configures a web server to expose the HDFS
|
|||
|
namespace through a configurable TCP port. This allows a user to
|
|||
|
navigate the HDFS namespace and view the contents of its files using a
|
|||
|
web browser.
|
|||
|
|
|||
|
Space Reclamation
|
|||
|
|
|||
|
File Deletes and Undeletes
|
|||
|
|
|||
|
When a file is deleted by a user or an application, it is not
|
|||
|
immediately removed from HDFS. Instead, HDFS first renames it to a file
|
|||
|
in the <<</trash>>> directory. The file can be restored quickly as long as it
|
|||
|
remains in <<</trash>>>. A file remains in <<</trash>>> for a configurable amount
|
|||
|
of time. After the expiry of its life in <<</trash>>>, the NameNode deletes
|
|||
|
the file from the HDFS namespace. The deletion of a file causes the
|
|||
|
blocks associated with the file to be freed. Note that there could be
|
|||
|
an appreciable time delay between the time a file is deleted by a user
|
|||
|
and the time of the corresponding increase in free space in HDFS.
|
|||
|
|
|||
|
A user can Undelete a file after deleting it as long as it remains in
|
|||
|
the <<</trash>>> directory. If a user wants to undelete a file that he/she
|
|||
|
has deleted, he/she can navigate the <<</trash>>> directory and retrieve the
|
|||
|
file. The <<</trash>>> directory contains only the latest copy of the file
|
|||
|
that was deleted. The <<</trash>>> directory is just like any other directory
|
|||
|
with one special feature: HDFS applies specified policies to
|
|||
|
automatically delete files from this directory. The current default
|
|||
|
policy is to delete files from <<</trash>>> that are more than 6 hours old.
|
|||
|
In the future, this policy will be configurable through a well defined
|
|||
|
interface.
|
|||
|
|
|||
|
Decrease Replication Factor
|
|||
|
|
|||
|
When the replication factor of a file is reduced, the NameNode selects
|
|||
|
excess replicas that can be deleted. The next Heartbeat transfers this
|
|||
|
information to the DataNode. The DataNode then removes the
|
|||
|
corresponding blocks and the corresponding free space appears in the
|
|||
|
cluster. Once again, there might be a time delay between the completion
|
|||
|
of the setReplication API call and the appearance of free space in the
|
|||
|
cluster.
|
|||
|
|
|||
|
References
|
|||
|
|
|||
|
Hadoop {{{http://hadoop.apache.org/docs/current/api/}JavaDoc API}}.
|
|||
|
|
|||
|
HDFS source code: {{http://hadoop.apache.org/version_control.html}}
|