Security hardening
+ Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings
can all be found in the toString() value of a raised exception
Contributed by PJ Fanning
This PR enables one to create the Hadoop
release tarball on Windows, complete with
the native binaries (including winutils.exe).
This PR contains the following changes -
* Prevents splitting during array element
expansion - this is needed since we need
to pass the arguments correctly to maven.
* Install Python 3.11.8 and pip to the
Windows docker image for building
Hadoop.
* pom file changes to get maven to invoke
the releasedocmaker script through
bash.exe on Windows.
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation
Contributed by Steve Loughran
Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
Clarifies behaviour of VectorIO methods with contract tests as well as specification.
* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
(HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
toString() output logs this.
HDFS
* Add test TestHDFSContractVectoredRead
ABFS
* Add test ITestAbfsFileSystemContractVectoredRead
S3A
* checks for vector IO being stopped in all iterative
vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
which had not yet read data in.
* Improved logging.
readVectored()
* made synchronized. This is only for the invocation;
the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.
+ AbstractSTestS3AHugeFiles enhancements.
Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation
Contributed by Steve Loughran
This is a followup to #6406:
HADOOP-18980. S3A credential provider remapping: make extensible
It adds extra validation of key-value pairs in a configuration
option, with tests.
Contributed by Viraj Jasani
Spotbugs is mistaken here as it doesn't observer the read/write locks used
to manage exclusive access to the maps.
* cache the value between checks
* tag as @VisibleForTesting
Contributed by Steve Loughran
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607).
Log4j 1.2.17 has been replaced by reloadj 1.22.2
SLF4J is at 1.7.36
This is a followup to PR:
HADOOP-19045. S3A: Validate CreateSession Timeout Propagation (#6470)
Remove all declarations of fs.s3a.connection.request.timeout
in
- hadoop-common/src/main/resources/core-default.xml
- hadoop-aws/src/test/resources/core-site.xml
New test in TestAwsClientConfig to verify that the value
defined in fs.s3a.Constants class is used.
This is brittle to someone overriding it in their test setups,
but as this test is intended to verify that the option is not
explicitly set, there's no workaround.
Contributed by Steve Loughran
* HADOOP-19061 - Capture exception from rpcRequestSender.start() in IPC.Connection.run() and proper cleaning is followed if an exception is thrown.
---------
Co-authored-by: Xing Lin <xinglin@linkedin.com>
HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize the risk of Timeout waiting for connection from the pool
Contributed By: Mukund Thakur
Move the org.apache.hadoop.{oncrpc, portmap} packages from the hadoop-nfs module
to the hadoop-common module.
This allows for use of the protocol beyond just NFS -including within HDFS itself.
Contributed by Xing Lin