diff --git a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt index cbf5172feb..dd3323dda3 100644 --- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt +++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt @@ -602,6 +602,8 @@ Release 2.5.0 - UNRELEASED HDFS-6680. BlockPlacementPolicyDefault does not choose favored nodes correctly. (szetszwo) + HDFS-6712. Document HDFS Multihoming Settings. (Arpit Agarwal) + OPTIMIZATIONS HDFS-6214. Webhdfs has poor throughput for files >2GB (daryn) diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm new file mode 100644 index 0000000000..2be45671e2 --- /dev/null +++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/HdfsMultihoming.apt.vm @@ -0,0 +1,145 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Distributed File System-${project.version} - Support for Multi-Homed Networks + --- + --- + ${maven.build.timestamp} + +HDFS Support for Multihomed Networks + + This document is targetted to cluster administrators deploying <<>> in + multihomed networks. Similar support for <<>>/<<>> is + work in progress and will be documented when available. + +%{toc|section=1|fromDepth=0} + +* Multihoming Background + + In multihomed networks the cluster nodes are connected to more than one + network interface. There could be multiple reasons for doing so. + + [[1]] <>: Security requirements may dictate that intra-cluster + traffic be confined to a different network than the network used to + transfer data in and out of the cluster. + + [[2]] <>: Intra-cluster traffic may use one or more high bandwidth + interconnects like Fiber Channel, Infiniband or 10GbE. + + [[3]] <>: The nodes may have multiple network adapters + connected to a single network to handle network adapter failure. + + + Note that NIC Bonding (also known as NIC Teaming or Link + Aggregation) is a related but separate topic. The following settings + are usually not applicable to a NIC bonding configuration which handles + multiplexing and failover transparently while presenting a single 'logical + network' to applications. + +* Fixing Hadoop Issues In Multihomed Environments + +** Ensuring HDFS Daemons Bind All Interfaces + + By default <<>> endpoints are specified as either hostnames or IP addresses. + In either case <<>> daemons will bind to a single IP address making + the daemons unreachable from other networks. + + The solution is to have separate setting for server endpoints to force binding + the wildcard IP address <<>> i.e. <<<0.0.0.0>>>. Do NOT supply a port + number with any of these settings. + +---- + + dfs.namenode.rpc-bind-host + 0.0.0.0 + + The actual address the RPC server will bind to. If this optional address is + set, it overrides only the hostname portion of dfs.namenode.rpc-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node listen on all interfaces by + setting it to 0.0.0.0. + + + + + dfs.namenode.servicerpc-bind-host + 0.0.0.0 + + The actual address the service RPC server will bind to. If this optional address is + set, it overrides only the hostname portion of dfs.namenode.servicerpc-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node listen on all interfaces by + setting it to 0.0.0.0. + + + + + dfs.namenode.http-bind-host + 0.0.0.0 + + The actual adress the HTTP server will bind to. If this optional address + is set, it overrides only the hostname portion of dfs.namenode.http-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node HTTP server listen on all + interfaces by setting it to 0.0.0.0. + + + + + dfs.namenode.https-bind-host + 0.0.0.0 + + The actual adress the HTTPS server will bind to. If this optional address + is set, it overrides only the hostname portion of dfs.namenode.https-address. + It can also be specified per name node or name service for HA/Federation. + This is useful for making the name node HTTPS server listen on all + interfaces by setting it to 0.0.0.0. + + +---- + +** Clients use Hostnames when connecting to DataNodes + + By default <<>> clients connect to DataNodes using the IP address + provided by the NameNode. Depending on the network configuration this + IP address may be unreachable by the clients. The fix is letting clients perform + their own DNS resolution of the DataNode hostname. The following setting + enables this behavior. + +---- + + dfs.client.use.datanode.hostname + true + Whether clients should use datanode hostnames when + connecting to datanodes. + + +---- + +** DataNodes use HostNames when connecting to other DataNodes + + Rarely, the NameNode-resolved IP address for a DataNode may be unreachable + from other DataNodes. The fix is to force DataNodes to perform their own + DNS resolution for inter-DataNode connections. The following setting enables + this behavior. + +---- + + dfs.datanode.use.datanode.hostname + true + Whether datanodes should use datanode hostnames when + connecting to other datanodes for data transfer. + + +---- + diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index 4bfbf6359b..ec9329216d 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -89,6 +89,7 @@ +