diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index c302e8f63e..37ccdda2c0 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -230,6 +230,9 @@ Release 2.0.3-alpha - 2013-02-06 MAPREDUCE-4971. Minor extensibility enhancements to Counters & FileOutputFormat. (Arun C Murthy via sseth) + MAPREDUCE-4977. Documentation for pluggable shuffle and pluggable sort. + (tucu) + OPTIMIZATIONS MAPREDUCE-4893. Fixed MR ApplicationMaster to do optimal assignment of diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm new file mode 100644 index 0000000000..8dd2f2ecef --- /dev/null +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm @@ -0,0 +1,96 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort + --- + --- + ${maven.build.timestamp} + +Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort + + \[ {{{./index.html}Go Back}} \] + +* Introduction + + The pluggable shuffle and pluggable sort capabilities allow replacing the + built in shuffle and sort logic with alternate implementations. Example use + cases for this are: using a different application protocol other than HTTP + such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or + replacing the sort logic with custom algorithms that enable Hash aggregation + and Limit-N query. + + <> The pluggable shuffle and pluggable sort capabilities are + experimental and unstable. This means the provided APIs may change and break + compatibility in future versions of Hadoop. + +* Implementing a Custom Shuffle and a Custom Sort + + A custom shuffle implementation requires a + <<>> + implementation class running in the NodeManagers and a + <<>> implementation class + running in the Reducer tasks. + + The default implementations provided by Hadoop can be used as references: + + * <<>> + + * <<>> + + A custom sort implementation requires a <<>> + implementation class running in the Mapper tasks and (optionally, depending + on the sort implementation) a <<>> + implementation class running in the Reducer tasks. + + The default implementations provided by Hadoop can be used as references: + + * <<>> + + * <<>> + +* Configuration + + Except for the auxiliary service running in the NodeManagers serving the + shuffle (by default the <<>>), all the pluggable components + run in the job tasks. This means, they can be configured on per job basis. + The auxiliary service servicing the Shuffle must be configured in the + NodeManagers configuration. + +** Job Configuration Properties (on per job basis): + +*--------------------------------------+---------------------+-----------------+ +| <> | <> | <> | +*--------------------------------------+---------------------+-----------------+ +| <<>> | <<>> | The <<>> implementation to use | +*--------------------------------------+---------------------+-----------------+ +| <<>> | <<>> | The <<>> implementation to use | +*--------------------------------------+---------------------+-----------------+ + + These properties can also be set in the <<>> to change the default values for all jobs. + +** NodeManager Configuration properties, <<>> in all nodes: + +*--------------------------------------+---------------------+-----------------+ +| <> | <> | <> | +*--------------------------------------+---------------------+-----------------+ +| <<>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name | +*--------------------------------------+---------------------+-----------------+ +| <<>> | <<>> | The auxiliary service class to use | +*--------------------------------------+---------------------+-----------------+ + + <> If setting an auxiliary service in addition the default + <<>> service, then a new service key should be added to the + <<>> property, for example <<>>. + Then the property defining the corresponding class must be + <<>>. + \ No newline at end of file diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index 2cfe2e8c45..a199ce3502 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -65,6 +65,7 @@ +