HADOOP-18470. Update index md with section on ABFS prefetching
This commit is contained in:
parent
223046cb64
commit
cda1d45a61
@ -23,11 +23,29 @@ Overview of Changes
|
||||
Users are encouraged to read the full set of release notes.
|
||||
This page provides an overview of the major changes.
|
||||
|
||||
Azure ABFS: Critical Stream Prefetch Fix
|
||||
---------------------------------------------
|
||||
|
||||
The abfs has a critical bug fix
|
||||
[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
|
||||
*ABFS. Disable purging list of in-progress reads in abfs stream close().*
|
||||
|
||||
All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
|
||||
or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
|
||||
|
||||
Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
|
||||
*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
|
||||
for root cause analysis, details on what is affected, and mitigations.
|
||||
|
||||
|
||||
Vectored IO API
|
||||
---------------
|
||||
|
||||
[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
|
||||
*High performance vectored read API in Hadoop*
|
||||
|
||||
The `PositionedReadable` interface has now added an operation for
|
||||
Vectored (also known as Scatter/Gather IO):
|
||||
Vectored IO (also known as Scatter/Gather IO):
|
||||
|
||||
```java
|
||||
void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
|
||||
@ -38,25 +56,25 @@ possibly in parallel, with results potentially coming in out-of-order.
|
||||
|
||||
1. The default implementation uses a series of `readFully()` calls, so delivers
|
||||
equivalent performance.
|
||||
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`
|
||||
2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
|
||||
3. The S3A filesystem issues parallel HTTP GET requests in different threads.
|
||||
|
||||
Benchmarking of (modified) ORC and Parquet clients through `file://` and `s3a://`
|
||||
show tangible improvements in query times.
|
||||
Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
|
||||
show significant improvements in query performance.
|
||||
|
||||
Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
|
||||
|
||||
Manifest Committer for Azure ABFS and google GCS performance
|
||||
------------------------------------------------------------
|
||||
Mapreduce: Manifest Committer for Azure ABFS and google GCS
|
||||
----------------------------------------------------------
|
||||
|
||||
A new "intermediate manifest committer" uses a manifest file
|
||||
The new _Intermediate Manifest Committer_ uses a manifest file
|
||||
to commit the work of successful task attempts, rather than
|
||||
renaming directories.
|
||||
Job commit is matter of reading all the manifests, creating the
|
||||
destination directories (parallelized) and renaming the files,
|
||||
again in parallel.
|
||||
|
||||
This is fast and correct on Azure Storage and Google GCS,
|
||||
This is both fast and correct on Azure Storage and Google GCS,
|
||||
and should be used there instead of the classic v1/v2 file
|
||||
output committers.
|
||||
|
||||
@ -69,24 +87,6 @@ More details are available in the
|
||||
[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
|
||||
documentation.
|
||||
|
||||
Transitive CVE fixes
|
||||
--------------------
|
||||
|
||||
A lot of dependencies have been upgraded to address recent CVEs.
|
||||
Many of the CVEs were not actually exploitable through the Hadoop
|
||||
so much of this work is just due diligence.
|
||||
However applications which have all the library is on a class path may
|
||||
be vulnerable, and the ugprades should also reduce the number of false
|
||||
positives security scanners report.
|
||||
|
||||
We have not been able to upgrade every single dependency to the latest
|
||||
version there is. Some of those changes are just going to be incompatible.
|
||||
If you have concerns about the state of a specific library, consult the apache JIRA
|
||||
issue tracker to see what discussions have taken place about the library in question.
|
||||
|
||||
As an open source project, contributions in this area are always welcome,
|
||||
especially in testing the active branches, testing applications downstream of
|
||||
those branches and of whether updated dependencies trigger regressions.
|
||||
|
||||
HDFS: Router Based Federation
|
||||
-----------------------------
|
||||
@ -96,7 +96,6 @@ A lot of effort has been invested into stabilizing/improving the HDFS Router Bas
|
||||
1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation.
|
||||
2. HDFS-13248: RBF supports Client Locality
|
||||
|
||||
|
||||
HDFS: Dynamic Datanode Reconfiguration
|
||||
--------------------------------------
|
||||
|
||||
@ -109,6 +108,29 @@ cluster-wide Datanode Restarts.
|
||||
See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
|
||||
for the list of dynamically reconfigurable attributes.
|
||||
|
||||
|
||||
Transitive CVE fixes
|
||||
--------------------
|
||||
|
||||
A lot of dependencies have been upgraded to address recent CVEs.
|
||||
Many of the CVEs were not actually exploitable through the Hadoop
|
||||
so much of this work is just due diligence.
|
||||
However applications which have all the library is on a class path may
|
||||
be vulnerable, and the ugprades should also reduce the number of false
|
||||
positives security scanners report.
|
||||
|
||||
We have not been able to upgrade every single dependency to the latest
|
||||
version there is. Some of those changes are just going to be incompatible.
|
||||
If you have concerns about the state of a specific library, consult the pache JIRA
|
||||
issue tracker to see whether a JIRA has been filed, discussions have taken place about
|
||||
the library in question, and whether or not there is already a fix in the pipeline.
|
||||
*Please don't file new JIRAs about dependency-X.Y.Z having a CVE without
|
||||
searching for any existing issue first*
|
||||
|
||||
As an open source project, contributions in this area are always welcome,
|
||||
especially in testing the active branches, testing applications downstream of
|
||||
those branches and of whether updated dependencies trigger regressions.
|
||||
|
||||
Getting Started
|
||||
===============
|
||||
|
||||
@ -119,3 +141,4 @@ which shows you how to set up a single-node Hadoop installation.
|
||||
Then move on to the
|
||||
[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
|
||||
to learn how to set up a multi-node Hadoop installation.
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user