HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) Contributed by Shilun Fan.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>
This commit is contained in:
parent
caba9bbab3
commit
38f10c657e
@ -15,7 +15,7 @@
|
|||||||
Apache Hadoop ${project.version}
|
Apache Hadoop ${project.version}
|
||||||
================================
|
================================
|
||||||
|
|
||||||
Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch.
|
Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release branch.
|
||||||
|
|
||||||
Overview of Changes
|
Overview of Changes
|
||||||
===================
|
===================
|
||||||
@ -23,86 +23,124 @@ Overview of Changes
|
|||||||
Users are encouraged to read the full set of release notes.
|
Users are encouraged to read the full set of release notes.
|
||||||
This page provides an overview of the major changes.
|
This page provides an overview of the major changes.
|
||||||
|
|
||||||
Azure ABFS: Critical Stream Prefetch Fix
|
S3A: Upgrade AWS SDK to V2
|
||||||
----------------------------------------
|
----------------------------------------
|
||||||
|
|
||||||
The abfs has a critical bug fix
|
[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2
|
||||||
[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
|
|
||||||
*ABFS. Disable purging list of in-progress reads in abfs stream close().*
|
|
||||||
|
|
||||||
All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
|
This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to AWS SDK for Java V2.
|
||||||
or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
|
This is a significant change which offers a number of new features including the ability to work with Amazon S3 Express One Zone Storage - the new high performance, single AZ storage class.
|
||||||
|
|
||||||
Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
|
HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
|
||||||
*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
|
----------------------------------------
|
||||||
for root cause analysis, details on what is affected, and mitigations.
|
|
||||||
|
|
||||||
|
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks.
|
||||||
|
|
||||||
Vectored IO API
|
Throughput is one of the core performance evaluation for DataNode instance.
|
||||||
---------------
|
However, it does not reach the best performance especially for Federation deploy all the time although there are different improvement,
|
||||||
|
because of the global coarse-grain lock.
|
||||||
|
These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
|
||||||
|
try to split the global coarse-grain lock to fine-grain lock which is double level lock for blockpool and volume,
|
||||||
|
to improve the throughput and avoid lock impacts between blockpools and volumes.
|
||||||
|
|
||||||
[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
|
YARN Federation improvements
|
||||||
*High performance vectored read API in Hadoop*
|
----------------------------------------
|
||||||
|
|
||||||
The `PositionedReadable` interface has now added an operation for
|
[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements.
|
||||||
Vectored IO (also known as Scatter/Gather IO):
|
|
||||||
|
|
||||||
```java
|
We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
|
||||||
void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
|
1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
|
||||||
```
|
2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
|
||||||
|
3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
|
||||||
|
4. Audit logs and Metrics for Router received upgrades.
|
||||||
|
5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
|
||||||
|
6. The page function of the router has been enhanced.
|
||||||
|
7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
|
||||||
|
|
||||||
All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
|
HDFS RBF: Code Enhancements, New Features, and Bug Fixes
|
||||||
possibly in parallel, with results potentially coming in out-of-order.
|
----------------------------------------
|
||||||
|
|
||||||
1. The default implementation uses a series of `readFully()` calls, so delivers
|
The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature
|
||||||
equivalent performance.
|
improvements, new functionalities, and bug fixes.
|
||||||
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
|
Important features and improvements are as follows:
|
||||||
3. The S3A filesystem issues parallel HTTP GET requests in different threads.
|
|
||||||
|
|
||||||
Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
|
**Feature**
|
||||||
show significant improvements in query performance.
|
|
||||||
|
|
||||||
Further Reading:
|
[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS Federation balance tool introduces one tool to balance data across different namespace.
|
||||||
* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
|
|
||||||
* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
|
|
||||||
Apachecon 2022 talk.
|
|
||||||
|
|
||||||
Mapreduce: Manifest Committer for Azure ABFS and google GCS
|
**Improvement**
|
||||||
----------------------------------------------------------
|
|
||||||
|
|
||||||
The new _Intermediate Manifest Committer_ uses a manifest file
|
[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers.
|
||||||
to commit the work of successful task attempts, rather than
|
|
||||||
renaming directories.
|
|
||||||
Job commit is matter of reading all the manifests, creating the
|
|
||||||
destination directories (parallelized) and renaming the files,
|
|
||||||
again in parallel.
|
|
||||||
|
|
||||||
This is both fast and correct on Azure Storage and Google GCS,
|
The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is
|
||||||
and should be used there instead of the classic v1/v2 file
|
a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the
|
||||||
output committers.
|
resolution of HDFS-17128.
|
||||||
|
|
||||||
It is also safe to use on HDFS, where it should be faster
|
[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
|
||||||
than the v1 committer. It is however optimized for
|
|
||||||
cloud storage where list and rename operations are significantly
|
|
||||||
slower; the benefits may be less.
|
|
||||||
|
|
||||||
More details are available in the
|
SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL,
|
||||||
[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
|
faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database.
|
||||||
documentation.
|
This issue has been addressed by the resolution of HDFS-17148.
|
||||||
|
|
||||||
|
**Others**
|
||||||
|
|
||||||
HDFS: Dynamic Datanode Reconfiguration
|
Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document.
|
||||||
--------------------------------------
|
|
||||||
|
|
||||||
HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
|
HDFS EC: Code Enhancements and Bug Fixes
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
A number of Datanode configuration options can be changed without having to restart
|
HDFS EC has made code improvements and fixed some bugs.
|
||||||
the datanode. This makes it possible to tune deployment configurations without
|
|
||||||
cluster-wide Datanode Restarts.
|
|
||||||
|
|
||||||
See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
|
Important improvements and bugs are as follows:
|
||||||
for the list of dynamically reconfigurable attributes.
|
|
||||||
|
|
||||||
|
**Improvement**
|
||||||
|
|
||||||
|
[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks.
|
||||||
|
|
||||||
|
In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated
|
||||||
|
from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn.
|
||||||
|
The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit
|
||||||
|
the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new
|
||||||
|
configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit.
|
||||||
|
|
||||||
|
[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) EC: Allow block reconstruction pending timeout refreshable to increase decommission performance.
|
||||||
|
|
||||||
|
In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO
|
||||||
|
performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of
|
||||||
|
`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking
|
||||||
|
pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes.
|
||||||
|
In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
|
||||||
|
`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And
|
||||||
|
the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically.
|
||||||
|
|
||||||
|
**Bug**
|
||||||
|
|
||||||
|
[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication.
|
||||||
|
|
||||||
|
In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
|
||||||
|
- Enable EC policy, such as RS-6-3-1024k.
|
||||||
|
- The rack number in this cluster is equal with or less than the replication number(9)
|
||||||
|
- A rack only has one DN, and decommission this DN.
|
||||||
|
This issue has been addressed by the resolution of HDFS-16456.
|
||||||
|
|
||||||
|
[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes.
|
||||||
|
During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between
|
||||||
|
`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat,
|
||||||
|
this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices
|
||||||
|
array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
|
||||||
|
internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica.
|
||||||
|
This issue has been addressed by the resolution of HDFS-17094.
|
||||||
|
|
||||||
|
[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery.
|
||||||
|
Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration
|
||||||
|
parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks
|
||||||
|
being sent to the DataNodes, consequently occupying too much of their memory.
|
||||||
|
|
||||||
|
This issue has been addressed by the resolution of HDFS-17284.
|
||||||
|
|
||||||
|
**Others**
|
||||||
|
|
||||||
|
Other improvements and fixes for HDFS EC, Please refer to the release document.
|
||||||
|
|
||||||
Transitive CVE fixes
|
Transitive CVE fixes
|
||||||
--------------------
|
--------------------
|
||||||
@ -110,8 +148,8 @@ Transitive CVE fixes
|
|||||||
A lot of dependencies have been upgraded to address recent CVEs.
|
A lot of dependencies have been upgraded to address recent CVEs.
|
||||||
Many of the CVEs were not actually exploitable through the Hadoop
|
Many of the CVEs were not actually exploitable through the Hadoop
|
||||||
so much of this work is just due diligence.
|
so much of this work is just due diligence.
|
||||||
However applications which have all the library is on a class path may
|
However, applications which have all the library is on a class path may
|
||||||
be vulnerable, and the ugprades should also reduce the number of false
|
be vulnerable, and the upgrades should also reduce the number of false
|
||||||
positives security scanners report.
|
positives security scanners report.
|
||||||
|
|
||||||
We have not been able to upgrade every single dependency to the latest
|
We have not been able to upgrade every single dependency to the latest
|
||||||
@ -147,12 +185,12 @@ can, with care, keep data and computing resources private.
|
|||||||
1. Physical cluster: *configure Hadoop security*, usually bonded to the
|
1. Physical cluster: *configure Hadoop security*, usually bonded to the
|
||||||
enterprise Kerberos/Active Directory systems.
|
enterprise Kerberos/Active Directory systems.
|
||||||
Good.
|
Good.
|
||||||
1. Cloud: transient or persistent single or multiple user/tenant cluster
|
2. Cloud: transient or persistent single or multiple user/tenant cluster
|
||||||
with private VLAN *and security*.
|
with private VLAN *and security*.
|
||||||
Good.
|
Good.
|
||||||
Consider [Apache Knox](https://knox.apache.org/) for managing remote
|
Consider [Apache Knox](https://knox.apache.org/) for managing remote
|
||||||
access to the cluster.
|
access to the cluster.
|
||||||
1. Cloud: transient single user/tenant cluster with private VLAN
|
3. Cloud: transient single user/tenant cluster with private VLAN
|
||||||
*and no security at all*.
|
*and no security at all*.
|
||||||
Requires careful network configuration as this is the sole
|
Requires careful network configuration as this is the sole
|
||||||
means of securing the cluster..
|
means of securing the cluster..
|
||||||
|
Loading…
Reference in New Issue
Block a user