HDFS-11995. HDFS Architecture documentation incorrectly describes writing to a local temporary file. Contributed by Nandakumar.
This commit is contained in:
parent
73fb75017e
commit
d954a64730
@ -201,38 +201,13 @@ A typical block size used by HDFS is 128 MB.
|
||||
Thus, an HDFS file is chopped up into 128 MB chunks, and if possible,
|
||||
each chunk will reside on a different DataNode.
|
||||
|
||||
### Staging
|
||||
|
||||
A client request to create a file does not reach the NameNode immediately.
|
||||
In fact, initially the HDFS client caches the file data into a local buffer.
|
||||
Application writes are transparently redirected to this local buffer.
|
||||
When the local file accumulates data worth over one chunk size, the client contacts the NameNode.
|
||||
The NameNode inserts the file name into the file system hierarchy and allocates a data block for it.
|
||||
The NameNode responds to the client request with the identity of the DataNode and the destination data block.
|
||||
Then the client flushes the chunk of data from the local buffer to the specified DataNode.
|
||||
When a file is closed, the remaining un-flushed data in the local buffer is transferred to the DataNode.
|
||||
The client then tells the NameNode that the file is closed. At this point,
|
||||
the NameNode commits the file creation operation into a persistent store.
|
||||
If the NameNode dies before the file is closed, the file is lost.
|
||||
|
||||
The above approach has been adopted after careful consideration of target applications that run on HDFS.
|
||||
These applications need streaming writes to files.
|
||||
If a client writes to a remote file directly without any client side buffering,
|
||||
the network speed and the congestion in the network impacts throughput considerably.
|
||||
This approach is not without precedent.
|
||||
Earlier distributed file systems, e.g. AFS, have used client side caching to improve performance.
|
||||
A POSIX requirement has been relaxed to achieve higher performance of data uploads.
|
||||
|
||||
### Replication Pipelining
|
||||
|
||||
When a client is writing data to an HDFS file,
|
||||
its data is first written to a local buffer as explained in the previous section.
|
||||
Suppose the HDFS file has a replication factor of three.
|
||||
When the local buffer accumulates a chunk of user data,
|
||||
the client retrieves a list of DataNodes from the NameNode.
|
||||
When a client is writing data to an HDFS file with a replication factor of three,
|
||||
the NameNode retrieves a list of DataNodes using a replication target choosing algorithm.
|
||||
This list contains the DataNodes that will host a replica of that block.
|
||||
The client then flushes the data chunk to the first DataNode.
|
||||
The first DataNode starts receiving the data in small portions,
|
||||
The client then writes to the first DataNode.
|
||||
The first DataNode starts receiving the data in portions,
|
||||
writes each portion to its local repository and transfers that portion to the second DataNode in the list.
|
||||
The second DataNode, in turn starts receiving each portion of the data block,
|
||||
writes that portion to its repository and then flushes that portion to the third DataNode.
|
||||
|
Loading…
Reference in New Issue
Block a user