e1fdf62123
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1370666 13f79535-47bb-0310-9956-ffa450edef68
78 lines
3.7 KiB
Plaintext
78 lines
3.7 KiB
Plaintext
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
|
~~ you may not use this file except in compliance with the License.
|
|
~~ You may obtain a copy of the License at
|
|
~~
|
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
|
~~
|
|
~~ Unless required by applicable law or agreed to in writing, software
|
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
~~ See the License for the specific language governing permissions and
|
|
~~ limitations under the License. See accompanying LICENSE file.
|
|
|
|
---
|
|
YARN
|
|
---
|
|
---
|
|
${maven.build.timestamp}
|
|
|
|
Apache Hadoop NextGen MapReduce (YARN)
|
|
|
|
MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have,
|
|
what we call, MapReduce 2.0 (MRv2) or YARN.
|
|
|
|
The fundamental idea of MRv2 is to split up the two major functionalities of
|
|
the JobTracker, resource management and job scheduling/monitoring, into
|
|
separate daemons. The idea is to have a global ResourceManager (<RM>) and
|
|
per-application ApplicationMaster (<AM>). An application is either a single
|
|
job in the classical sense of Map-Reduce jobs or a DAG of jobs.
|
|
|
|
The ResourceManager and per-node slave, the NodeManager (<NM>), form the
|
|
data-computation framework. The ResourceManager is the ultimate authority that
|
|
arbitrates resources among all the applications in the system.
|
|
|
|
The per-application ApplicationMaster is, in effect, a framework specific
|
|
library and is tasked with negotiating resources from the ResourceManager and
|
|
working with the NodeManager(s) to execute and monitor the tasks.
|
|
|
|
[./yarn_architecture.gif] MapReduce NextGen Architecture
|
|
|
|
The ResourceManager has two main components: Scheduler and
|
|
ApplicationsManager.
|
|
|
|
The Scheduler is responsible for allocating resources to the various running
|
|
applications subject to familiar constraints of capacities, queues etc. The
|
|
Scheduler is pure scheduler in the sense that it performs no monitoring or
|
|
tracking of status for the application. Also, it offers no guarantees about
|
|
restarting failed tasks either due to application failure or hardware
|
|
failures. The Scheduler performs its scheduling function based the resource
|
|
requirements of the applications; it does so based on the abstract notion of
|
|
a resource <Container> which incorporates elements such as memory, cpu, disk,
|
|
network etc. In the first version, only <<<memory>>> is supported.
|
|
|
|
The Scheduler has a pluggable policy plug-in, which is responsible for
|
|
partitioning the cluster resources among the various queues, applications etc.
|
|
The current Map-Reduce schedulers such as the CapacityScheduler and the
|
|
FairScheduler would be some examples of the plug-in.
|
|
|
|
The CapacityScheduler supports <<<hierarchical queues>>> to allow for more
|
|
predictable sharing of cluster resources
|
|
|
|
The ApplicationsManager is responsible for accepting job-submissions,
|
|
negotiating the first container for executing the application specific
|
|
ApplicationMaster and provides the service for restarting the
|
|
ApplicationMaster container on failure.
|
|
|
|
The NodeManager is the per-machine framework agent who is responsible for
|
|
containers, monitoring their resource usage (cpu, memory, disk, network) and
|
|
reporting the same to the ResourceManager/Scheduler.
|
|
|
|
The per-application ApplicationMaster has the responsibility of negotiating
|
|
appropriate resource containers from the Scheduler, tracking their status and
|
|
monitoring for progress.
|
|
|
|
MRV2 maintains <<API compatibility>> with previous stable release
|
|
(hadoop-0.20.205). This means that all Map-Reduce jobs should still run
|
|
unchanged on top of MRv2 with just a recompile.
|
|
|