b324a6134e
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1424459 13f79535-47bb-0310-9956-ffa450edef68
513 lines
26 KiB
Plaintext
513 lines
26 KiB
Plaintext
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||
~~ you may not use this file except in compliance with the License.
|
||
~~ You may obtain a copy of the License at
|
||
~~
|
||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||
~~
|
||
~~ Unless required by applicable law or agreed to in writing, software
|
||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||
~~ See the License for the specific language governing permissions and
|
||
~~ limitations under the License. See accompanying LICENSE file.
|
||
|
||
---
|
||
HDFS Architecture
|
||
---
|
||
Dhruba Borthakur
|
||
---
|
||
${maven.build.timestamp}
|
||
|
||
%{toc|section=1|fromDepth=0}
|
||
|
||
HDFS Architecture
|
||
|
||
Introduction
|
||
|
||
The Hadoop Distributed File System (HDFS) is a distributed file system
|
||
designed to run on commodity hardware. It has many similarities with
|
||
existing distributed file systems. However, the differences from other
|
||
distributed file systems are significant. HDFS is highly fault-tolerant
|
||
and is designed to be deployed on low-cost hardware. HDFS provides high
|
||
throughput access to application data and is suitable for applications
|
||
that have large data sets. HDFS relaxes a few POSIX requirements to
|
||
enable streaming access to file system data. HDFS was originally built
|
||
as infrastructure for the Apache Nutch web search engine project. HDFS
|
||
is part of the Apache Hadoop Core project. The project URL is
|
||
{{http://hadoop.apache.org/}}.
|
||
|
||
Assumptions and Goals
|
||
|
||
Hardware Failure
|
||
|
||
Hardware failure is the norm rather than the exception. An HDFS
|
||
instance may consist of hundreds or thousands of server machines, each
|
||
storing part of the file system’s data. The fact that there are a huge
|
||
number of components and that each component has a non-trivial
|
||
probability of failure means that some component of HDFS is always
|
||
non-functional. Therefore, detection of faults and quick, automatic
|
||
recovery from them is a core architectural goal of HDFS.
|
||
|
||
Streaming Data Access
|
||
|
||
Applications that run on HDFS need streaming access to their data sets.
|
||
They are not general purpose applications that typically run on general
|
||
purpose file systems. HDFS is designed more for batch processing rather
|
||
than interactive use by users. The emphasis is on high throughput of
|
||
data access rather than low latency of data access. POSIX imposes many
|
||
hard requirements that are not needed for applications that are
|
||
targeted for HDFS. POSIX semantics in a few key areas has been traded
|
||
to increase data throughput rates.
|
||
|
||
Large Data Sets
|
||
|
||
Applications that run on HDFS have large data sets. A typical file in
|
||
HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support
|
||
large files. It should provide high aggregate data bandwidth and scale
|
||
to hundreds of nodes in a single cluster. It should support tens of
|
||
millions of files in a single instance.
|
||
|
||
Simple Coherency Model
|
||
|
||
HDFS applications need a write-once-read-many access model for files. A
|
||
file once created, written, and closed need not be changed. This
|
||
assumption simplifies data coherency issues and enables high throughput
|
||
data access. A Map/Reduce application or a web crawler application fits
|
||
perfectly with this model. There is a plan to support appending-writes
|
||
to files in the future.
|
||
|
||
“Moving Computation is Cheaper than Moving Data”
|
||
|
||
A computation requested by an application is much more efficient if it
|
||
is executed near the data it operates on. This is especially true when
|
||
the size of the data set is huge. This minimizes network congestion and
|
||
increases the overall throughput of the system. The assumption is that
|
||
it is often better to migrate the computation closer to where the data
|
||
is located rather than moving the data to where the application is
|
||
running. HDFS provides interfaces for applications to move themselves
|
||
closer to where the data is located.
|
||
|
||
Portability Across Heterogeneous Hardware and Software Platforms
|
||
|
||
HDFS has been designed to be easily portable from one platform to
|
||
another. This facilitates widespread adoption of HDFS as a platform of
|
||
choice for a large set of applications.
|
||
|
||
NameNode and DataNodes
|
||
|
||
HDFS has a master/slave architecture. An HDFS cluster consists of a
|
||
single NameNode, a master server that manages the file system namespace
|
||
and regulates access to files by clients. In addition, there are a
|
||
number of DataNodes, usually one per node in the cluster, which manage
|
||
storage attached to the nodes that they run on. HDFS exposes a file
|
||
system namespace and allows user data to be stored in files.
|
||
Internally, a file is split into one or more blocks and these blocks
|
||
are stored in a set of DataNodes. The NameNode executes file system
|
||
namespace operations like opening, closing, and renaming files and
|
||
directories. It also determines the mapping of blocks to DataNodes. The
|
||
DataNodes are responsible for serving read and write requests from the
|
||
file system’s clients. The DataNodes also perform block creation,
|
||
deletion, and replication upon instruction from the NameNode.
|
||
|
||
|
||
[images/hdfsarchitecture.png] HDFS Architecture
|
||
|
||
The NameNode and DataNode are pieces of software designed to run on
|
||
commodity machines. These machines typically run a GNU/Linux operating
|
||
system (OS). HDFS is built using the Java language; any machine that
|
||
supports Java can run the NameNode or the DataNode software. Usage of
|
||
the highly portable Java language means that HDFS can be deployed on a
|
||
wide range of machines. A typical deployment has a dedicated machine
|
||
that runs only the NameNode software. Each of the other machines in the
|
||
cluster runs one instance of the DataNode software. The architecture
|
||
does not preclude running multiple DataNodes on the same machine but in
|
||
a real deployment that is rarely the case.
|
||
|
||
The existence of a single NameNode in a cluster greatly simplifies the
|
||
architecture of the system. The NameNode is the arbitrator and
|
||
repository for all HDFS metadata. The system is designed in such a way
|
||
that user data never flows through the NameNode.
|
||
|
||
The File System Namespace
|
||
|
||
HDFS supports a traditional hierarchical file organization. A user or
|
||
an application can create directories and store files inside these
|
||
directories. The file system namespace hierarchy is similar to most
|
||
other existing file systems; one can create and remove files, move a
|
||
file from one directory to another, or rename a file. HDFS does not yet
|
||
implement user quotas or access permissions. HDFS does not support hard
|
||
links or soft links. However, the HDFS architecture does not preclude
|
||
implementing these features.
|
||
|
||
The NameNode maintains the file system namespace. Any change to the
|
||
file system namespace or its properties is recorded by the NameNode. An
|
||
application can specify the number of replicas of a file that should be
|
||
maintained by HDFS. The number of copies of a file is called the
|
||
replication factor of that file. This information is stored by the
|
||
NameNode.
|
||
|
||
Data Replication
|
||
|
||
HDFS is designed to reliably store very large files across machines in
|
||
a large cluster. It stores each file as a sequence of blocks; all
|
||
blocks in a file except the last block are the same size. The blocks of
|
||
a file are replicated for fault tolerance. The block size and
|
||
replication factor are configurable per file. An application can
|
||
specify the number of replicas of a file. The replication factor can be
|
||
specified at file creation time and can be changed later. Files in HDFS
|
||
are write-once and have strictly one writer at any time.
|
||
|
||
The NameNode makes all decisions regarding replication of blocks. It
|
||
periodically receives a Heartbeat and a Blockreport from each of the
|
||
DataNodes in the cluster. Receipt of a Heartbeat implies that the
|
||
DataNode is functioning properly. A Blockreport contains a list of all
|
||
blocks on a DataNode.
|
||
|
||
[images/hdfsdatanodes.png] HDFS DataNodes
|
||
|
||
Replica Placement: The First Baby Steps
|
||
|
||
The placement of replicas is critical to HDFS reliability and
|
||
performance. Optimizing replica placement distinguishes HDFS from most
|
||
other distributed file systems. This is a feature that needs lots of
|
||
tuning and experience. The purpose of a rack-aware replica placement
|
||
policy is to improve data reliability, availability, and network
|
||
bandwidth utilization. The current implementation for the replica
|
||
placement policy is a first effort in this direction. The short-term
|
||
goals of implementing this policy are to validate it on production
|
||
systems, learn more about its behavior, and build a foundation to test
|
||
and research more sophisticated policies.
|
||
|
||
Large HDFS instances run on a cluster of computers that commonly spread
|
||
across many racks. Communication between two nodes in different racks
|
||
has to go through switches. In most cases, network bandwidth between
|
||
machines in the same rack is greater than network bandwidth between
|
||
machines in different racks.
|
||
|
||
The NameNode determines the rack id each DataNode belongs to via the
|
||
process outlined in {{{../hadoop-common/ClusterSetup.html#Hadoop+Rack+Awareness}Hadoop Rack Awareness}}. A simple but non-optimal policy
|
||
is to place replicas on unique racks. This prevents losing data when an
|
||
entire rack fails and allows use of bandwidth from multiple racks when
|
||
reading data. This policy evenly distributes replicas in the cluster
|
||
which makes it easy to balance load on component failure. However, this
|
||
policy increases the cost of writes because a write needs to transfer
|
||
blocks to multiple racks.
|
||
|
||
For the common case, when the replication factor is three, HDFS’s
|
||
placement policy is to put one replica on one node in the local rack,
|
||
another on a different node in the local rack, and the last on a
|
||
different node in a different rack. This policy cuts the inter-rack
|
||
write traffic which generally improves write performance. The chance of
|
||
rack failure is far less than that of node failure; this policy does
|
||
not impact data reliability and availability guarantees. However, it
|
||
does reduce the aggregate network bandwidth used when reading data
|
||
since a block is placed in only two unique racks rather than three.
|
||
With this policy, the replicas of a file do not evenly distribute
|
||
across the racks. One third of replicas are on one node, two thirds of
|
||
replicas are on one rack, and the other third are evenly distributed
|
||
across the remaining racks. This policy improves write performance
|
||
without compromising data reliability or read performance.
|
||
|
||
The current, default replica placement policy described here is a work
|
||
in progress.
|
||
|
||
Replica Selection
|
||
|
||
To minimize global bandwidth consumption and read latency, HDFS tries
|
||
to satisfy a read request from a replica that is closest to the reader.
|
||
If there exists a replica on the same rack as the reader node, then
|
||
that replica is preferred to satisfy the read request. If angg/ HDFS
|
||
cluster spans multiple data centers, then a replica that is resident in
|
||
the local data center is preferred over any remote replica.
|
||
|
||
Safemode
|
||
|
||
On startup, the NameNode enters a special state called Safemode.
|
||
Replication of data blocks does not occur when the NameNode is in the
|
||
Safemode state. The NameNode receives Heartbeat and Blockreport
|
||
messages from the DataNodes. A Blockreport contains the list of data
|
||
blocks that a DataNode is hosting. Each block has a specified minimum
|
||
number of replicas. A block is considered safely replicated when the
|
||
minimum number of replicas of that data block has checked in with the
|
||
NameNode. After a configurable percentage of safely replicated data
|
||
blocks checks in with the NameNode (plus an additional 30 seconds), the
|
||
NameNode exits the Safemode state. It then determines the list of data
|
||
blocks (if any) that still have fewer than the specified number of
|
||
replicas. The NameNode then replicates these blocks to other DataNodes.
|
||
|
||
The Persistence of File System Metadata
|
||
|
||
The HDFS namespace is stored by the NameNode. The NameNode uses a
|
||
transaction log called the EditLog to persistently record every change
|
||
that occurs to file system metadata. For example, creating a new file
|
||
in HDFS causes the NameNode to insert a record into the EditLog
|
||
indicating this. Similarly, changing the replication factor of a file
|
||
causes a new record to be inserted into the EditLog. The NameNode uses
|
||
a file in its local host OS file system to store the EditLog. The
|
||
entire file system namespace, including the mapping of blocks to files
|
||
and file system properties, is stored in a file called the FsImage. The
|
||
FsImage is stored as a file in the NameNode’s local file system too.
|
||
|
||
The NameNode keeps an image of the entire file system namespace and
|
||
file Blockmap in memory. This key metadata item is designed to be
|
||
compact, such that a NameNode with 4 GB of RAM is plenty to support a
|
||
huge number of files and directories. When the NameNode starts up, it
|
||
reads the FsImage and EditLog from disk, applies all the transactions
|
||
from the EditLog to the in-memory representation of the FsImage, and
|
||
flushes out this new version into a new FsImage on disk. It can then
|
||
truncate the old EditLog because its transactions have been applied to
|
||
the persistent FsImage. This process is called a checkpoint. In the
|
||
current implementation, a checkpoint only occurs when the NameNode
|
||
starts up. Work is in progress to support periodic checkpointing in the
|
||
near future.
|
||
|
||
The DataNode stores HDFS data in files in its local file system. The
|
||
DataNode has no knowledge about HDFS files. It stores each block of
|
||
HDFS data in a separate file in its local file system. The DataNode
|
||
does not create all files in the same directory. Instead, it uses a
|
||
heuristic to determine the optimal number of files per directory and
|
||
creates subdirectories appropriately. It is not optimal to create all
|
||
local files in the same directory because the local file system might
|
||
not be able to efficiently support a huge number of files in a single
|
||
directory. When a DataNode starts up, it scans through its local file
|
||
system, generates a list of all HDFS data blocks that correspond to
|
||
each of these local files and sends this report to the NameNode: this
|
||
is the Blockreport.
|
||
|
||
The Communication Protocols
|
||
|
||
All HDFS communication protocols are layered on top of the TCP/IP
|
||
protocol. A client establishes a connection to a configurable TCP port
|
||
on the NameNode machine. It talks the ClientProtocol with the NameNode.
|
||
The DataNodes talk to the NameNode using the DataNode Protocol. A
|
||
Remote Procedure Call (RPC) abstraction wraps both the Client Protocol
|
||
and the DataNode Protocol. By design, the NameNode never initiates any
|
||
RPCs. Instead, it only responds to RPC requests issued by DataNodes or
|
||
clients.
|
||
|
||
Robustness
|
||
|
||
The primary objective of HDFS is to store data reliably even in the
|
||
presence of failures. The three common types of failures are NameNode
|
||
failures, DataNode failures and network partitions.
|
||
|
||
Data Disk Failure, Heartbeats and Re-Replication
|
||
|
||
Each DataNode sends a Heartbeat message to the NameNode periodically. A
|
||
network partition can cause a subset of DataNodes to lose connectivity
|
||
with the NameNode. The NameNode detects this condition by the absence
|
||
of a Heartbeat message. The NameNode marks DataNodes without recent
|
||
Heartbeats as dead and does not forward any new IO requests to them.
|
||
Any data that was registered to a dead DataNode is not available to
|
||
HDFS any more. DataNode death may cause the replication factor of some
|
||
blocks to fall below their specified value. The NameNode constantly
|
||
tracks which blocks need to be replicated and initiates replication
|
||
whenever necessary. The necessity for re-replication may arise due to
|
||
many reasons: a DataNode may become unavailable, a replica may become
|
||
corrupted, a hard disk on a DataNode may fail, or the replication
|
||
factor of a file may be increased.
|
||
|
||
Cluster Rebalancing
|
||
|
||
The HDFS architecture is compatible with data rebalancing schemes. A
|
||
scheme might automatically move data from one DataNode to another if
|
||
the free space on a DataNode falls below a certain threshold. In the
|
||
event of a sudden high demand for a particular file, a scheme might
|
||
dynamically create additional replicas and rebalance other data in the
|
||
cluster. These types of data rebalancing schemes are not yet
|
||
implemented.
|
||
|
||
Data Integrity
|
||
|
||
It is possible that a block of data fetched from a DataNode arrives
|
||
corrupted. This corruption can occur because of faults in a storage
|
||
device, network faults, or buggy software. The HDFS client software
|
||
implements checksum checking on the contents of HDFS files. When a
|
||
client creates an HDFS file, it computes a checksum of each block of
|
||
the file and stores these checksums in a separate hidden file in the
|
||
same HDFS namespace. When a client retrieves file contents it verifies
|
||
that the data it received from each DataNode matches the checksum
|
||
stored in the associated checksum file. If not, then the client can opt
|
||
to retrieve that block from another DataNode that has a replica of that
|
||
block.
|
||
|
||
Metadata Disk Failure
|
||
|
||
The FsImage and the EditLog are central data structures of HDFS. A
|
||
corruption of these files can cause the HDFS instance to be
|
||
non-functional. For this reason, the NameNode can be configured to
|
||
support maintaining multiple copies of the FsImage and EditLog. Any
|
||
update to either the FsImage or EditLog causes each of the FsImages and
|
||
EditLogs to get updated synchronously. This synchronous updating of
|
||
multiple copies of the FsImage and EditLog may degrade the rate of
|
||
namespace transactions per second that a NameNode can support. However,
|
||
this degradation is acceptable because even though HDFS applications
|
||
are very data intensive in nature, they are not metadata intensive.
|
||
When a NameNode restarts, it selects the latest consistent FsImage and
|
||
EditLog to use.
|
||
|
||
The NameNode machine is a single point of failure for an HDFS cluster.
|
||
If the NameNode machine fails, manual intervention is necessary.
|
||
Currently, automatic restart and failover of the NameNode software to
|
||
another machine is not supported.
|
||
|
||
Snapshots
|
||
|
||
Snapshots support storing a copy of data at a particular instant of
|
||
time. One usage of the snapshot feature may be to roll back a corrupted
|
||
HDFS instance to a previously known good point in time. HDFS does not
|
||
currently support snapshots but will in a future release.
|
||
|
||
Data Organization
|
||
|
||
Data Blocks
|
||
|
||
HDFS is designed to support very large files. Applications that are
|
||
compatible with HDFS are those that deal with large data sets. These
|
||
applications write their data only once but they read it one or more
|
||
times and require these reads to be satisfied at streaming speeds. HDFS
|
||
supports write-once-read-many semantics on files. A typical block size
|
||
used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB
|
||
chunks, and if possible, each chunk will reside on a different
|
||
DataNode.
|
||
|
||
Staging
|
||
|
||
A client request to create a file does not reach the NameNode
|
||
immediately. In fact, initially the HDFS client caches the file data
|
||
into a temporary local file. Application writes are transparently
|
||
redirected to this temporary local file. When the local file
|
||
accumulates data worth over one HDFS block size, the client contacts
|
||
the NameNode. The NameNode inserts the file name into the file system
|
||
hierarchy and allocates a data block for it. The NameNode responds to
|
||
the client request with the identity of the DataNode and the
|
||
destination data block. Then the client flushes the block of data from
|
||
the local temporary file to the specified DataNode. When a file is
|
||
closed, the remaining un-flushed data in the temporary local file is
|
||
transferred to the DataNode. The client then tells the NameNode that
|
||
the file is closed. At this point, the NameNode commits the file
|
||
creation operation into a persistent store. If the NameNode dies before
|
||
the file is closed, the file is lost.
|
||
|
||
The above approach has been adopted after careful consideration of
|
||
target applications that run on HDFS. These applications need streaming
|
||
writes to files. If a client writes to a remote file directly without
|
||
any client side buffering, the network speed and the congestion in the
|
||
network impacts throughput considerably. This approach is not without
|
||
precedent. Earlier distributed file systems, e.g. AFS, have used client
|
||
side caching to improve performance. A POSIX requirement has been
|
||
relaxed to achieve higher performance of data uploads.
|
||
|
||
Replication Pipelining
|
||
|
||
When a client is writing data to an HDFS file, its data is first
|
||
written to a local file as explained in the previous section. Suppose
|
||
the HDFS file has a replication factor of three. When the local file
|
||
accumulates a full block of user data, the client retrieves a list of
|
||
DataNodes from the NameNode. This list contains the DataNodes that will
|
||
host a replica of that block. The client then flushes the data block to
|
||
the first DataNode. The first DataNode starts receiving the data in
|
||
small portions (4 KB), writes each portion to its local repository and
|
||
transfers that portion to the second DataNode in the list. The second
|
||
DataNode, in turn starts receiving each portion of the data block,
|
||
writes that portion to its repository and then flushes that portion to
|
||
the third DataNode. Finally, the third DataNode writes the data to its
|
||
local repository. Thus, a DataNode can be receiving data from the
|
||
previous one in the pipeline and at the same time forwarding data to
|
||
the next one in the pipeline. Thus, the data is pipelined from one
|
||
DataNode to the next.
|
||
|
||
Accessibility
|
||
|
||
HDFS can be accessed from applications in many different ways.
|
||
Natively, HDFS provides a
|
||
{{{http://hadoop.apache.org/docs/current/api/}FileSystem Java API}}
|
||
for applications to use. A C language wrapper for this Java API is also
|
||
available. In addition, an HTTP browser can also be used to browse the files
|
||
of an HDFS instance. Work is in progress to expose HDFS through the WebDAV
|
||
protocol.
|
||
|
||
FS Shell
|
||
|
||
HDFS allows user data to be organized in the form of files and
|
||
directories. It provides a commandline interface called FS shell that
|
||
lets a user interact with the data in HDFS. The syntax of this command
|
||
set is similar to other shells (e.g. bash, csh) that users are already
|
||
familiar with. Here are some sample action/command pairs:
|
||
|
||
*---------+---------+
|
||
|| Action | Command
|
||
*---------+---------+
|
||
| Create a directory named <<</foodir>>> | <<<bin/hadoop dfs -mkdir /foodir>>>
|
||
*---------+---------+
|
||
| Remove a directory named <<</foodir>>> | <<<bin/hadoop dfs -rmr /foodir>>>
|
||
*---------+---------+
|
||
| View the contents of a file named <<</foodir/myfile.txt>>> | <<<bin/hadoop dfs -cat /foodir/myfile.txt>>>
|
||
*---------+---------+
|
||
|
||
FS shell is targeted for applications that need a scripting language to
|
||
interact with the stored data.
|
||
|
||
DFSAdmin
|
||
|
||
The DFSAdmin command set is used for administering an HDFS cluster.
|
||
These are commands that are used only by an HDFS administrator. Here
|
||
are some sample action/command pairs:
|
||
|
||
*---------+---------+
|
||
|| Action | Command
|
||
*---------+---------+
|
||
|Put the cluster in Safemode | <<<bin/hadoop dfsadmin -safemode enter>>>
|
||
*---------+---------+
|
||
|Generate a list of DataNodes | <<<bin/hadoop dfsadmin -report>>>
|
||
*---------+---------+
|
||
|Recommission or decommission DataNode(s) | <<<bin/hadoop dfsadmin -refreshNodes>>>
|
||
*---------+---------+
|
||
|
||
Browser Interface
|
||
|
||
A typical HDFS install configures a web server to expose the HDFS
|
||
namespace through a configurable TCP port. This allows a user to
|
||
navigate the HDFS namespace and view the contents of its files using a
|
||
web browser.
|
||
|
||
Space Reclamation
|
||
|
||
File Deletes and Undeletes
|
||
|
||
When a file is deleted by a user or an application, it is not
|
||
immediately removed from HDFS. Instead, HDFS first renames it to a file
|
||
in the <<</trash>>> directory. The file can be restored quickly as long as it
|
||
remains in <<</trash>>>. A file remains in <<</trash>>> for a configurable amount
|
||
of time. After the expiry of its life in <<</trash>>>, the NameNode deletes
|
||
the file from the HDFS namespace. The deletion of a file causes the
|
||
blocks associated with the file to be freed. Note that there could be
|
||
an appreciable time delay between the time a file is deleted by a user
|
||
and the time of the corresponding increase in free space in HDFS.
|
||
|
||
A user can Undelete a file after deleting it as long as it remains in
|
||
the <<</trash>>> directory. If a user wants to undelete a file that he/she
|
||
has deleted, he/she can navigate the <<</trash>>> directory and retrieve the
|
||
file. The <<</trash>>> directory contains only the latest copy of the file
|
||
that was deleted. The <<</trash>>> directory is just like any other directory
|
||
with one special feature: HDFS applies specified policies to
|
||
automatically delete files from this directory. The current default
|
||
policy is to delete files from <<</trash>>> that are more than 6 hours old.
|
||
In the future, this policy will be configurable through a well defined
|
||
interface.
|
||
|
||
Decrease Replication Factor
|
||
|
||
When the replication factor of a file is reduced, the NameNode selects
|
||
excess replicas that can be deleted. The next Heartbeat transfers this
|
||
information to the DataNode. The DataNode then removes the
|
||
corresponding blocks and the corresponding free space appears in the
|
||
cluster. Once again, there might be a time delay between the completion
|
||
of the setReplication API call and the appearance of free space in the
|
||
cluster.
|
||
|
||
References
|
||
|
||
Hadoop {{{http://hadoop.apache.org/docs/current/api/}JavaDoc API}}.
|
||
|
||
HDFS source code: {{http://hadoop.apache.org/version_control.html}}
|