Security hardening
+ Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings
can all be found in the toString() value of a raised exception
Contributed by PJ Fanning
ABFS has a client-side throttling mechanism which works on the metrics collected
from past requests
When requests are fail due to server-side throttling it updates its
metrics and recalculates any client side backoff.
The choice of which requests should be used to compute client side
backoff interval is based on the http status code:
- Status code in 2xx range: Successful Operations should contribute.
- Status code in 3xx range: Redirection Operations should not contribute.
- Status code in 4xx range: User Errors should not contribute.
- Status code is 503: Throttling Error should contribute only if they
are due to client limits breach as follows:
* 503, Ingress Over Account Limit: Should Contribute
* 503, Egress Over Account Limit: Should Contribute
* 503, TPS Over Account Limit: Should Contribute
* 503, Other Server Throttling: Should not Contribute.
- Status code in 5xx range other than 503: Should not Contribute.
- IOException and UnknownHostExceptions: Should not Contribute.
Contributed by Anuj Modi
This PR enables one to create the Hadoop
release tarball on Windows, complete with
the native binaries (including winutils.exe).
This PR contains the following changes -
* Prevents splitting during array element
expansion - this is needed since we need
to pass the arguments correctly to maven.
* Install Python 3.11.8 and pip to the
Windows docker image for building
Hadoop.
* pom file changes to get maven to invoke
the releasedocmaker script through
bash.exe on Windows.
This addresses two CVEs triggered by malformed archives
Important: Denial of Service CVE-2024-25710
Moderate: Denial of Service CVE-2024-26308
Contributed by PJ Fanning
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation
Contributed by Steve Loughran
Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
Clarifies behaviour of VectorIO methods with contract tests as well as specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation
Contributed by Steve Loughran