2015-02-17 21:30:24 -10:00
|
|
|
<!---
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
|
you may not use this file except in compliance with the License.
|
|
|
|
You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License. See accompanying LICENSE file.
|
|
|
|
-->
|
|
|
|
|
|
|
|
Apache Hadoop ${project.version}
|
|
|
|
================================
|
|
|
|
|
2016-07-20 16:57:55 -07:00
|
|
|
Apache Hadoop ${project.version} incorporates a number of significant
|
|
|
|
enhancements over the previous major release line (hadoop-2.x).
|
|
|
|
|
|
|
|
This is an alpha release to facilitate testing and the collection of
|
|
|
|
feedback from downstream application developers and users. There are
|
|
|
|
no guarantees regarding API stability or quality.
|
|
|
|
|
|
|
|
Overview
|
|
|
|
========
|
|
|
|
|
|
|
|
Users are encouraged to read the full set of release notes.
|
|
|
|
This page provides an overview of the major changes.
|
|
|
|
|
|
|
|
Minimum required Java version increased from Java 7 to Java 8
|
|
|
|
------------------
|
|
|
|
|
|
|
|
All Hadoop JARs are now compiled targeting a runtime version of Java 8.
|
|
|
|
Users still using Java 7 or below must upgrade to Java 8.
|
|
|
|
|
|
|
|
Support for erasure encoding in HDFS
|
|
|
|
------------------
|
|
|
|
|
|
|
|
Erasure coding is a method for durably storing data with significant space
|
|
|
|
savings compared to replication. Standard encodings like Reed-Solomon (10,4)
|
|
|
|
have a 1.4x space overhead, compared to the 3x overhead of standard HDFS
|
|
|
|
replication.
|
|
|
|
|
|
|
|
Since erasure coding imposes additional overhead during reconstruction
|
|
|
|
and performs mostly remote reads, it has traditionally been used for
|
|
|
|
storing colder, less frequently accessed data. Users should consider
|
|
|
|
the network and CPU overheads of erasure coding when deploying this
|
|
|
|
feature.
|
|
|
|
|
|
|
|
More details are available in the
|
|
|
|
[HDFS Erasure Coding](./hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html)
|
|
|
|
documentation.
|
|
|
|
|
|
|
|
YARN Timeline Service v.2
|
|
|
|
-------------------
|
|
|
|
|
2017-08-21 20:29:05 -07:00
|
|
|
We are introducing an early preview (alpha 2) of a major revision of YARN
|
2016-07-20 16:57:55 -07:00
|
|
|
Timeline Service: v.2. YARN Timeline Service v.2 addresses two major
|
|
|
|
challenges: improving scalability and reliability of Timeline Service, and
|
|
|
|
enhancing usability by introducing flows and aggregation.
|
|
|
|
|
2017-08-21 20:29:05 -07:00
|
|
|
YARN Timeline Service v.2 alpha 2 is provided so that users and developers
|
2016-07-20 16:57:55 -07:00
|
|
|
can test it and provide feedback and suggestions for making it a ready
|
|
|
|
replacement for Timeline Service v.1.x. It should be used only in a test
|
2017-08-21 20:29:05 -07:00
|
|
|
capacity.
|
2016-07-20 16:57:55 -07:00
|
|
|
|
|
|
|
More details are available in the
|
|
|
|
[YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)
|
|
|
|
documentation.
|
|
|
|
|
|
|
|
Shell script rewrite
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
The Hadoop shell scripts have been rewritten to fix many long-standing
|
|
|
|
bugs and include some new features. While an eye has been kept towards
|
|
|
|
compatibility, some changes may break existing installations.
|
|
|
|
|
|
|
|
Incompatible changes are documented in the release notes, with related
|
|
|
|
discussion on [HADOOP-9902](https://issues.apache.org/jira/browse/HADOOP-9902).
|
|
|
|
|
|
|
|
More details are available in the
|
|
|
|
[Unix Shell Guide](./hadoop-project-dist/hadoop-common/UnixShellGuide.html)
|
|
|
|
documentation. Power users will also be pleased by the
|
|
|
|
[Unix Shell API](./hadoop-project-dist/hadoop-common/UnixShellAPI.html)
|
|
|
|
documentation, which describes much of the new functionality, particularly
|
|
|
|
related to extensibility.
|
|
|
|
|
2017-01-17 13:56:06 -08:00
|
|
|
Shaded client jars
|
|
|
|
------------------
|
|
|
|
|
|
|
|
The `hadoop-client` Maven artifact available in 2.x releases pulls
|
|
|
|
Hadoop's transitive dependencies onto a Hadoop application's classpath.
|
|
|
|
This can be problematic if the versions of these transitive dependencies
|
|
|
|
conflict with the versions used by the application.
|
|
|
|
|
|
|
|
[HADOOP-11804](https://issues.apache.org/jira/browse/HADOOP-11804) adds
|
|
|
|
new `hadoop-client-api` and `hadoop-client-runtime` artifacts that
|
|
|
|
shade Hadoop's dependencies into a single jar. This avoids leaking
|
|
|
|
Hadoop's dependencies onto the application's classpath.
|
|
|
|
|
|
|
|
Support for Opportunistic Containers and Distributed Scheduling.
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
A notion of `ExecutionType` has been introduced, whereby Applications can
|
|
|
|
now request for containers with an execution type of `Opportunistic`.
|
|
|
|
Containers of this type can be dispatched for execution at an NM even if
|
|
|
|
there are no resources available at the moment of scheduling. In such a
|
|
|
|
case, these containers will be queued at the NM, waiting for resources to
|
|
|
|
be available for it to start. Opportunistic containers are of lower priority
|
|
|
|
than the default `Guaranteed` containers and are therefore preempted,
|
|
|
|
if needed, to make room for Guaranteed containers. This should
|
|
|
|
improve cluster utilization.
|
|
|
|
|
|
|
|
Opportunistic containers are by default allocated by the central RM, but
|
|
|
|
support has also been added to allow opportunistic containers to be
|
|
|
|
allocated by a distributed scheduler which is implemented as an
|
|
|
|
AMRMProtocol interceptor.
|
|
|
|
|
|
|
|
Please see [documentation](./hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html)
|
|
|
|
for more details.
|
|
|
|
|
2016-07-20 16:57:55 -07:00
|
|
|
MapReduce task-level native optimization
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
MapReduce has added support for a native implementation of the map output
|
|
|
|
collector. For shuffle-intensive jobs, this can lead to a performance
|
|
|
|
improvement of 30% or more.
|
|
|
|
|
|
|
|
See the release notes for
|
|
|
|
[MAPREDUCE-2841](https://issues.apache.org/jira/browse/MAPREDUCE-2841)
|
|
|
|
for more detail.
|
|
|
|
|
|
|
|
Support for more than 2 NameNodes.
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
The initial implementation of HDFS NameNode high-availability provided
|
|
|
|
for a single active NameNode and a single Standby NameNode. By replicating
|
|
|
|
edits to a quorum of three JournalNodes, this architecture is able to
|
|
|
|
tolerate the failure of any one node in the system.
|
|
|
|
|
|
|
|
However, some deployments require higher degrees of fault-tolerance.
|
|
|
|
This is enabled by this new feature, which allows users to run multiple
|
|
|
|
standby NameNodes. For instance, by configuring three NameNodes and
|
|
|
|
five JournalNodes, the cluster is able to tolerate the failure of two
|
|
|
|
nodes rather than just one.
|
|
|
|
|
|
|
|
The [HDFS high-availability documentation](./hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html)
|
|
|
|
has been updated with instructions on how to configure more than two
|
|
|
|
NameNodes.
|
|
|
|
|
|
|
|
Default ports of multiple services have been changed.
|
|
|
|
------------------------
|
|
|
|
|
|
|
|
Previously, the default ports of multiple Hadoop services were in the
|
|
|
|
Linux ephemeral port range (32768-61000). This meant that at startup,
|
|
|
|
services would sometimes fail to bind to the port due to a conflict
|
|
|
|
with another application.
|
|
|
|
|
|
|
|
These conflicting ports have been moved out of the ephemeral range,
|
|
|
|
affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our
|
|
|
|
documentation has been updated appropriately, but see the release
|
|
|
|
notes for [HDFS-9427](https://issues.apache.org/jira/browse/HDFS-9427) and
|
|
|
|
[HADOOP-12811](https://issues.apache.org/jira/browse/HADOOP-12811)
|
|
|
|
for a list of port changes.
|
|
|
|
|
2017-01-17 13:56:06 -08:00
|
|
|
Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors
|
2016-07-20 16:57:55 -07:00
|
|
|
---------------------
|
|
|
|
|
2017-01-17 13:56:06 -08:00
|
|
|
Hadoop now supports integration with Microsoft Azure Data Lake and
|
|
|
|
Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
|
2016-07-20 16:57:55 -07:00
|
|
|
|
|
|
|
Intra-datanode balancer
|
|
|
|
-------------------
|
|
|
|
|
|
|
|
A single DataNode manages multiple disks. During normal write operation,
|
|
|
|
disks will be filled up evenly. However, adding or replacing disks can
|
|
|
|
lead to significant skew within a DataNode. This situation is not handled
|
|
|
|
by the existing HDFS balancer, which concerns itself with inter-, not intra-,
|
|
|
|
DN skew.
|
|
|
|
|
|
|
|
This situation is handled by the new intra-DataNode balancing
|
|
|
|
functionality, which is invoked via the `hdfs diskbalancer` CLI.
|
|
|
|
See the disk balancer section in the
|
|
|
|
[HDFS Commands Guide](./hadoop-project-dist/hadoop-hdfs/HDFSCommands.html)
|
|
|
|
for more information.
|
|
|
|
|
|
|
|
Reworked daemon and task heap management
|
|
|
|
---------------------
|
|
|
|
|
|
|
|
A series of changes have been made to heap management for Hadoop daemons
|
|
|
|
as well as MapReduce tasks.
|
|
|
|
|
|
|
|
[HADOOP-10950](https://issues.apache.org/jira/browse/HADOOP-10950) introduces
|
|
|
|
new methods for configuring daemon heap sizes.
|
|
|
|
Notably, auto-tuning is now possible based on the memory size of the host,
|
|
|
|
and the `HADOOP_HEAPSIZE` variable has been deprecated.
|
|
|
|
See the full release notes of HADOOP-10950 for more detail.
|
|
|
|
|
|
|
|
[MAPREDUCE-5785](https://issues.apache.org/jira/browse/MAPREDUCE-5785)
|
|
|
|
simplifies the configuration of map and reduce task
|
|
|
|
heap sizes, so the desired heap size no longer needs to be specified
|
|
|
|
in both the task configuration and as a Java option.
|
|
|
|
Existing configs that already specify both are not affected by this change.
|
|
|
|
See the full release notes of MAPREDUCE-5785 for more details.
|
2015-02-17 21:30:24 -10:00
|
|
|
|
|
|
|
Getting Started
|
|
|
|
===============
|
|
|
|
|
|
|
|
The Hadoop documentation includes the information you need to get started using
|
|
|
|
Hadoop. Begin with the
|
|
|
|
[Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html)
|
|
|
|
which shows you how to set up a single-node Hadoop installation.
|
|
|
|
Then move on to the
|
|
|
|
[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
|
|
|
|
to learn how to set up a multi-node Hadoop installation.
|