From 7f96128e43f5fe9c71e649926b415e7a890587ad Mon Sep 17 00:00:00 2001 From: Arun Murthy Date: Sun, 16 Jun 2013 19:09:48 +0000 Subject: [PATCH] MAPREDUCE-5184. Document compatibility for MapReduce applications in hadoop-2 vis-a-vis hadoop-1. Contributed by Zhijie Shen. git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1493570 13f79535-47bb-0310-9956-ffa450edef68 --- hadoop-mapreduce-project/CHANGES.txt | 3 + ...educe_Compatibility_Hadoop1_Hadoop2.apt.vm | 107 ++++++++++++++++++ hadoop-project/src/site/site.xml | 1 + .../src/site/apt/index.apt.vm | 3 +- 4 files changed, 113 insertions(+), 1 deletion(-) create mode 100644 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt index 8b6581ccd0..5de6fa51c1 100644 --- a/hadoop-mapreduce-project/CHANGES.txt +++ b/hadoop-mapreduce-project/CHANGES.txt @@ -288,6 +288,9 @@ Release 2.1.0-beta - UNRELEASED MAPREDUCE-5192. Allow for alternate resolutions of TaskCompletionEvents. (cdouglas via acmurthy) + MAPREDUCE-5184. Document compatibility for MapReduce applications in + hadoop-2 vis-a-vis hadoop-1. (Zhijie Shen via acmurthy) + OPTIMIZATIONS MAPREDUCE-4974. Optimising the LineRecordReader initialize() method diff --git a/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm new file mode 100644 index 0000000000..aaa5f176e3 --- /dev/null +++ b/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm @@ -0,0 +1,107 @@ +~~ Licensed under the Apache License, Version 2.0 (the "License"); +~~ you may not use this file except in compliance with the License. +~~ You may obtain a copy of the License at +~~ +~~ http://www.apache.org/licenses/LICENSE-2.0 +~~ +~~ Unless required by applicable law or agreed to in writing, software +~~ distributed under the License is distributed on an "AS IS" BASIS, +~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +~~ See the License for the specific language governing permissions and +~~ limitations under the License. See accompanying LICENSE file. + + --- + Hadoop Map Reduce Next Generation-${project.version} - Backward Compatibility + --- + --- + ${maven.build.timestamp} + +Apache Hadoop MapReduce - Migrating from Apache Hadoop 1.x to Apache Hadoop 2.x + + \[ {{{../../hadoop-yarn/hadoop-yarn-site/index.html}Go Back}} \] + +* {Introduction} + + This document provides information for users to migrate their Apache Hadoop + MapReduce applications from Apache Hadoop 1.x to Apache Hadoop 2.x. + + In Apache Hadoop 2.x we have spun off resource management capabilities + into Apache Hadoop YARN, a general purpose, distributed application management + framework while Apache Hadoop MapReduce (aka MRv2) remains as a pure + distributed computation framework. + + In general, the previous MapReduce runtime (aka MRv1) has been reused and + no major surgery has been conducted on it. Therefore, MRv2 is able to ensure + satisfactory compatibility with MRv1 applications. However, due to some + improvements and code refactorings, a few APIs have been rendered + backward-incompatible. + + The remainder of this page will discuss the scope and the level of backward + compatibility that we support in Apache Hadoop MapReduce 2.x (MRv2). + +* {Binary Compatibility} + + First, we ensure binary compatibility to the applications that use old + <> APIs. This means that applications which were built against MRv1 + <> APIs can run directly on YARN without recompilation, merely by + pointing them to an Apache Hadoop 2.x cluster via configuration. + +* {Source Compatibility} + + We cannot ensure complete binary compatibility with the applications that use + <> APIs, as these APIs have evolved a lot since MRv1. However, we + ensure source compatibility for <> APIs that break binary + compatibility. In other words, users should recompile their applications that + use <> APIs against MRv2 jars. One notable binary incompatibility + break is Counter and CounterGroup. + +* {Not Supported} + + MRAdmin has been removed in MRv2 because because <<>> commands + no longer exist. They have been replaced by the commands in <<>>. We + neither support binary compatibility nor source compatibility for the + applications that use this class directly. + +* {Tradeoffs between MRv1 Users and Early MRv2 Adopters} + + Unfortunately, maintaining binary compatibility for MRv1 applications may lead + to binary incompatibility issues for early MRv2 adopters, in particular Hadoop + 0.23 users. For <> APIs, we have chosen to be compatible with MRv1 + applications, which have a larger user base. For <> APIs, if they + don't significantly break Hadoop 0.23 applications, we still change them to be + compatible with MRv1 applications. Below is the list of MapReduce APIs which + are incompatible with Hadoop 0.23. + +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <> | <> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Data type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ +| <<>> | Return type changes from <<>> to <<>> | +*-----------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------+ + +* {Malicious} + + For the users who are going to try <<>> on YARN, + please note that <<>> will still use + <<>>, which is installed together with + other MRv2 jars. By default Hadoop framework jars appear before the users' + jars in the classpath, such that the classes from the 2.x.x jar will still be + picked. Users should either remove <<>> + from the classpath or set <<>> and + <<>> to run their target + examples jar. diff --git a/hadoop-project/src/site/site.xml b/hadoop-project/src/site/site.xml index 63ba5f4749..ea20a4a4af 100644 --- a/hadoop-project/src/site/site.xml +++ b/hadoop-project/src/site/site.xml @@ -79,6 +79,7 @@ + diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm index 4fe7cb9245..1988c0bd7a 100644 --- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/index.apt.vm @@ -53,5 +53,6 @@ MapReduce NextGen aka YARN aka MRv2 * {{{../../hadoop-project-dist/hadoop-common/CLIMiniCluster.html}CLI MiniCluster}} - * {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html}Encrypted Shuffle}} + * {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html}Backward Compatibility between Apache Hadoop 1.x and 2.x for MapReduce}} + * {{{../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html}Encrypted Shuffle}}