- restore old outcome: no-op
- test this
- update spec
This is a critical fix for vector IO and MUST be cherrypicked to all branches with
that feature
Contributed by Steve Loughran
It is now possible to provide a job ID in the maven "job.id" property
hadoop-aws test runs to isolate paths under a the test bucket
under which all tests will be executed.
This will allow independent builds *in different source trees*
to test against the same bucket in parallel, and is designed for
CI testing.
Example:
mvn verify -Dparallel-tests -Droot.tests.enabled=false -Djob.id=1
mvn verify -Droot.tests.enabled=false -Djob.id=2
- Root tests must be be disabled to stop them cleaning up
the test paths of other test runs.
- Do still regularly run the root tests just to force cleanup
of the output of any interrupted test suites.
Contributed by Steve Loughran
Add support for Azure Active Directory (Azure AD) workload identities which integrate with the Kubernetes's native capabilities to federate with any external identity provider.
Contributed By: Anuj Modi
Customer-provided-keys (CPK) configs are not allowed with non-hierarchal-namespace (non-HNS) accounts for ABFS. This patch aims to prevent ABFS initialization for non-HNS accounts if CPK configs are provided.
Contributed by: Pranav Saxena
* parameterize the test run rather than do it from within the test suite.
* log what the committer factory is up to (and improve its logging)
* close all filesystems, then create the test filesystem with cache enabled.
The cache is critical, we want the fs from cache to be used when querying
filesystem properties, rather than one created from the committer jobconf,
which will have the same options as the task committer, so not actually
validate the override logic.
Contributed by Steve Loughran
Addresses
* CVE-2024-29857 - Importing an EC certificate with specially crafted F2m parameters can cause high CPU usage during parameter evaluation.
* CVE-2024-30171 - Possible timing based leakage in RSA based handshakes due to exception processing eliminated.
* CVE-2024-30172 - Crafted signature and public key can be used to trigger an infinite loop in the Ed25519 verification code.
* CVE-2024-301XX - When endpoint identification is enabled and an SSL socket is not created with an explicit hostname (as happens with HttpsURLConnection), hostname verification could be performed against a DNS-resolved IP address.
Contributed by PJ Fanning
WrappedIO.bulkDelete_PageSize() => bulkDelete_pageSize()
Makes it consistent with the HADOOP-19131 naming scheme.
The name needs to be fixed before invoking it through reflection,
as once that is attempted the binding won't work at run time,
though compilation will be happy.
Contributed by Steve Loughran
Adds support for metric collection at the filesystem instance level.
Metrics are pushed to the store upon the closure of a filesystem instance, encompassing all operations
that utilized that specific instance.
Collected Metrics:
- Number of successful requests without any retries.
- Count of requests that succeeded after a specified number of retries (x retries).
- Request count subjected to throttling.
- Number of requests that failed despite exhausting all retry attempts. etc.
Implementation Details:
Incorporated logic in the AbfsClient to facilitate metric pushing through an additional request.
This occurs in scenarios where no requests are sent to the backend for a defined idle period.
By implementing these enhancements, we ensure comprehensive monitoring and analysis of filesystem interactions, enabling a deeper understanding of success rates, retry scenarios, throttling instances, and exhaustive failure scenarios. Additionally, the AbfsClient logic ensures that metrics are proactively pushed even during idle periods, maintaining a continuous and accurate representation of filesystem performance.
Contributed by Anmol Asrani
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.
This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.
Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.
The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.
To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.
Contributed by Mukund Thakur and Steve Loughran
* Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper
* Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile
* Point users to downloads.apache.org for JVSC
* Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile
Contributed by Christopher Tubbs