HADOOP-19039. Hadoop 3.4.0 Highlight big features and improvements. (#6462) Contributed by Shilun Fan.

Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-25 15:42:21 +08:00 · 2024-01-25 15:42:21 +08:00 · 38f10c657e
commit 38f10c657e
parent caba9bbab3
1 changed files with 99 additions and 61 deletions
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@ -15,7 +15,7 @@
 Apache Hadoop ${project.version}
 ================================
-Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch.
+Apache Hadoop ${project.version} is an update to the Hadoop 3.4.x release branch.
 Overview of Changes
 ===================
@ -23,86 +23,124 @@ Overview of Changes
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
-Azure ABFS: Critical Stream Prefetch Fix
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
-The abfs has a critical bug fix
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2
 [HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
 *ABFS. Disable purging list of in-progress reads in abfs stream close().*
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
+This release upgrade Hadoop's AWS connector S3A from AWS SDK for Java V1 to AWS SDK for Java V2.
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+This is a significant change which offers a number of new features including the ability to work with Amazon S3 Express One Zone Storage - the new high performance, single AZ storage class.
-Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
+----------------------------------------
 for root cause analysis, details on what is affected, and mitigations.
 [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks.
-Vectored IO API
+Throughput is one of the core performance evaluation for DataNode instance.
---------------
+However, it does not reach the best performance especially for Federation deploy all the time although there are different improvement,
 because of the global coarse-grain lock.
 These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).)
 try to split the global coarse-grain lock to fine-grain lock which is double level lock for blockpool and volume,
 to improve the throughput and avoid lock impacts between blockpools and volumes.
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
+YARN Federation improvements
-*High performance vectored read API in Hadoop*
+----------------------------------------
-The `PositionedReadable` interface has now added an operation for
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements.
 Vectored IO (also known as Scatter/Gather IO):
-```java
+We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
+1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
-```
+2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
 3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
 4. Audit logs and Metrics for Router received upgrades.
 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
 6. The page function of the router has been enhanced.
 7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
-All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
-possibly in parallel, with results potentially coming in out-of-order.
+----------------------------------------
-1. The default implementation uses a series of `readFully()` calls, so delivers
+The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature
-   equivalent performance.
+improvements, new functionalities, and bug fixes.
-2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
+Important features and improvements are as follows:
 3. The S3A filesystem issues parallel HTTP GET requests in different threads.
-Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
+**Feature**
 show significant improvements in query performance.
-Further Reading:
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) HDFS Federation balance tool introduces one tool to balance data across different namespace.
 * [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
 * [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
  Apachecon 2022 talk.
-Mapreduce: Manifest Committer for Azure ABFS and google GCS
+**Improvement**
 ----------------------------------------------------------
-The new _Intermediate Manifest Committer_ uses a manifest file
+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers.
 to commit the work of successful task attempts, rather than
 renaming directories.
 Job commit is matter of reading all the manifests, creating the
 destination directories (parallelized) and renaming the files,
 again in parallel.
-This is both fast and correct on Azure Storage and Google GCS,
+The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is
-and should be used there instead of the classic v1/v2 file
+a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the
-output committers.
+resolution of HDFS-17128.
-It is also safe to use on HDFS, where it should be faster
+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
 than the v1 committer. It is however optimized for
 cloud storage where list and rename operations are significantly
 slower; the benefits may be less.
-More details are available in the
+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL,
-[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
+faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database.
-documentation.
+This issue has been addressed by the resolution of HDFS-17148.
 **Others**
-HDFS: Dynamic Datanode Reconfiguration
+Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document.
 --------------------------------------
-HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
+HDFS EC: Code Enhancements and Bug Fixes
 ----------------------------------------
-A number of Datanode configuration options can be changed without having to restart
+HDFS EC has made code improvements and fixed some bugs.
 the datanode. This makes it possible to tune deployment configurations without
 cluster-wide Datanode Restarts.
-See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
+Important improvements and bugs are as follows:
 for the list of dynamically reconfigurable attributes.
 **Improvement**
 [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks.
 In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated
 from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn.
 The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit
 the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new
 configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit.
 [HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) EC: Allow block reconstruction pending timeout refreshable to increase decommission performance.
 In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO
 performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of
 `dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking
 pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes.
 In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
 `dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And
 the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically.
 **Bug**
 [HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication.
 In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
 - Enable EC policy, such as RS-6-3-1024k.
 - The rack number in this cluster is equal with or less than the replication number(9)
 - A rack only has one DN, and decommission this DN.
 This issue has been addressed by the resolution of HDFS-16456.
 [HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes.
 During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between
 `rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat,
 this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices
 array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
 internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica.
 This issue has been addressed by the resolution of HDFS-17094.
 [HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery.
 Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration
 parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks
 being sent to the DataNodes, consequently occupying too much of their memory.
 This issue has been addressed by the resolution of HDFS-17284.
 **Others**
 Other improvements and fixes for HDFS EC, Please refer to the release document.
 Transitive CVE fixes
 --------------------
@ -110,8 +148,8 @@ Transitive CVE fixes
 A lot of dependencies have been upgraded to address recent CVEs.
 Many of the CVEs were not actually exploitable through the Hadoop
 so much of this work is just due diligence.
-However applications which have all the library is on a class path may
+However, applications which have all the library is on a class path may
-be vulnerable, and the ugprades should also reduce the number of false
+be vulnerable, and the upgrades should also reduce the number of false
 positives security scanners report.
 We have not been able to upgrade every single dependency to the latest
@ -147,12 +185,12 @@ can, with care, keep data and computing resources private.
 1. Physical cluster: *configure Hadoop security*, usually bonded to the
   enterprise Kerberos/Active Directory systems.
   Good.
-1. Cloud: transient or persistent single or multiple user/tenant cluster
+2. Cloud: transient or persistent single or multiple user/tenant cluster
   with private VLAN *and security*.
   Good.
   Consider [Apache Knox](https://knox.apache.org/) for managing remote
   access to the cluster.
-1. Cloud: transient single user/tenant cluster with private VLAN
+3. Cloud: transient single user/tenant cluster with private VLAN
   *and no security at all*.
   Requires careful network configuration as this is the sole
   means of securing the cluster..