Commit Graph

27340 Commits

Author SHA1 Message Date
Steve Loughran
4c55adbb6b
HADOOP-19205. S3A: initialization/close slower than with v1 SDK (#6892)
Adds new ClientManager interface/implementation which provides on-demand
creation of synchronous and asynchronous s3 clients, s3 transfer manager,
and in close() terminates these.

S3A FS is modified to
* Create a ClientManagerImpl instance and pass down to its S3Store.
* Use the same ClientManager interface against S3Store to demand-create
  the services.
* Only create the async client as part of the transfer manager creation,
  which will take place during the first rename() operation.
* Statistics on client creation count and duration are recorded.
+ Statistics on the time to initialize and shutdown the S3A FS are collected
  in IOStatistics for reporting.

Adds to hadoop common class
  LazyAtomicReference<T> implements CallableRaisingIOE<T>, Supplier<T>
and subclass
  LazyAutoCloseableReference<T extends AutoCloseable>
    extends LazyAtomicReference<T> implements AutoCloseable

These evaluate the Supplier<T>/CallableRaisingIOE<T> they were
constructed with on the first (successful) read of the the value.
Any exception raised during this operation will be rethrown, and on future
evaluations the same operation retried.

These classes implement the Supplier and CallableRaisingIOE
interfaces so can actually be used for to implement lazy function evaluation
as Haskell and some other functional languages do.

LazyAutoCloseableReference is AutoCloseable; its close() method will
close the inner reference if it is set

This class is used in ClientManagerImpl for the lazy S3 Cliehnt creation
and closure.

Contributed by Steve Loughran.
2024-07-05 16:38:37 +01:00
huhaiyang
ae76e9475c
HDFS-17564. EC: Fix the issue of inaccurate metrics when decommission mark busy DN. (#6911). Contributed by Haiyang Hu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-07-05 20:45:01 +08:00
hfutatzhanghb
a57105462b
HADOOP-19215. Fix unit tests testSlowConnection and testBadSetup failed in TestRPC. (#6912). Contributed by farmmamba.
Reviewed-by: huhaiyang <huhaiyang926@126.com>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-07-05 12:11:39 +05:30
Steve Loughran
c33d868606
HADOOP-19210. S3A: Speed up some slow unit tests (#6907)
Speed up slow tests
* TestS3AAWSCredentialsProvider: decrease thread pool shutdown time
* TestS3AInputStreamRetry: reduce retry limit and intervals

Contributed by Steve Loughran
2024-07-02 11:34:45 +01:00
K0K0V0K
134dcf166f
YARN-11703. Validate accessibility of Node Manager working directories (#6903) 2024-06-27 16:21:28 +02:00
Yu Zhang
b4ddb2d3bb
HDFS-13603: do not propagate ExecutionException and add maxRetries limit to NameNode edek cache warmup (#6774) 2024-06-24 09:34:52 -07:00
HarshitGupta11
d3b98cb1b2
HADOOP-19194:Add test to find unshaded dependencies in the aws sdk (#6865)
The new test TestAWSV2SDK scans the aws sdk bundle.jar and prints out all classes
which are unshaded, so at risk of creating classpath problems

It does not fail the test if this holds, because the current SDKs
do ship with unshaded classes; the test would always fail.

The SDK upgrade process should include inspecting the output
of this test to see if it has got worse (do a before/after check).

Once the AWS SDK does shade everything, we can have this
test fail on any regression

Contributed by Harshit Gupta
2024-06-24 10:41:11 +01:00
Steve Loughran
8ac9c1839a
HADOOP-19203. WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions (#6885)
* WrappedIO methods raise UncheckedIOExceptions
*New class org.apache.hadoop.util.functional.FunctionalIO
 with wrap/unwrap and the ability to generate a
 java.util.function.Supplier around a CallableRaisingIOE.

Contributed by Steve Loughran
2024-06-19 18:47:29 +01:00
Hexiaoqiao
6545b7eeef
HDFS-17098. DatanodeManager does not handle null storage type properly. (#6840). Contributed by ConfX.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-06-19 20:58:57 +08:00
Steve Loughran
56c8aa5f1c
HADOOP-19204. VectorIO regression: empty ranges are now rejected (#6887)
- restore old outcome: no-op
- test this
- update spec

This is a critical fix for vector IO and MUST be cherrypicked to all branches with
that feature

Contributed by Steve Loughran
2024-06-19 12:05:24 +01:00
Tsz-Wo Nicholas Sze
1e6411c9ec
HDFS-17528. FsImageValidation: set txid when saving a new image (#6828) 2024-06-19 11:38:17 +08:00
slfan1989
9710a8d52f
YARN-11701. [Federation] Enhance Federation Cache Clean Conditions. (#6889) Contributed by Shilun Fan.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
2024-06-19 08:34:19 +08:00
Fateh Singh
90024d8cb1
HDFS-17439. Support -nonSuperUser for NNThroughputBenchmark: useful for testing auth frameworks such as Ranger (#6677) 2024-06-18 13:52:24 +01:00
Heagan A
2fbbfe3cc9
HDFS-17546. Implementing HostsFileReader timeout (#6873) 2024-06-14 20:47:21 -07:00
Steve Loughran
2d5fa9e016
HADOOP-18508. S3A: Support parallel integration test runs on same bucket (#5081)
It is now possible to provide a job ID in the maven "job.id" property
hadoop-aws test runs to isolate paths under a the test bucket
under which all tests will be executed.

This will allow independent builds *in different source trees*
to test against the same bucket in parallel, and is designed for
CI testing.

Example:

mvn verify -Dparallel-tests -Droot.tests.enabled=false -Djob.id=1
mvn verify -Droot.tests.enabled=false -Djob.id=2

- Root tests must be be disabled to stop them cleaning up
  the test paths of other test runs.
- Do still regularly run the root tests just to force cleanup
  of the output of any interrupted test suites.  

Contributed by Steve Loughran
2024-06-14 19:34:52 +01:00
Viraj Jasani
240fddcf17
HADOOP-18931. FileSystem.getFileSystemClass() to log the jar the .class came from (#6197)
Set the log level of logger org.apache.hadoop.fs.FileSystem to DEBUG to see this.

Contributed by Viraj Jasani
2024-06-14 19:14:54 +01:00
Cheng Pan
2bde5ccb81
HADOOP-19192. Log level is WARN when fail to load native hadoop libs (#6863)
Updates the documentation to be consistent with the logging.

Contributed by Cheng Pan
2024-06-14 19:05:27 +01:00
Tengting Xu
a1f5dc5865
Minor, fix cpu arch compare to use correct Dockerfile (#6852) 2024-06-13 00:37:28 +05:30
hfutatzhanghb
4b1b16a846
HDFS-17551. Fix unit test failure caused by HDFS-17464. (#6883). Contributed by farmmamba. 2024-06-12 22:21:15 +05:30
Mukund Thakur
06dd3bfee8
HADOOP-19196. Allow base path to be deleted as well using Bulk Delete. (#6872)
Contributed by: Mukund Thakur
2024-06-11 14:06:53 -05:00
Anuj Modi
005030f7a0
HADOOP-18610: [ABFS] OAuth2 Token Provider support for Azure Workload Identity (#6787)
Add support for Azure Active Directory (Azure AD) workload identities which integrate with the Kubernetes's native capabilities to federate with any external identity provider.

Contributed By: Anuj Modi
2024-06-11 13:06:39 -05:00
PJ Fanning
bb30545583
HADOOP-19163. Use hadoop-shaded-protobuf_3_25 (#6858)
Contributed by PJ Fanning
2024-06-11 17:10:00 +01:00
Felix Nguyen
776c0a3ab9
HDFS-17539. Make TestFileChecksum fields static (#6853) 2024-06-11 15:26:21 +08:00
Pranav Saxena
2e1deee87a
HADOOP-19137. [ABFS] Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given. (#6752)
Customer-provided-keys (CPK) configs are not allowed with non-hierarchal-namespace (non-HNS) accounts for ABFS. This patch aims to prevent ABFS initialization for non-HNS accounts if CPK configs are provided.

Contributed by: Pranav Saxena
2024-06-10 15:03:41 -05:00
slfan1989
10df59e421
Revert "HADOOP-19071. Update maven-surefire-plugin from 3.0.0 to 3.2.5. (#6664)" (#6875)
This reverts commit 88ad7db80d.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-06-08 14:51:28 +08:00
Steve Loughran
01d257d5aa
HADOOP-19189. ITestS3ACommitterFactory failing (#6857)
* parameterize the test run rather than do it from within the test suite.
* log what the committer factory is up to (and improve its logging)
* close all filesystems, then create the test filesystem with cache enabled.

The cache is critical, we want the fs from cache to be used when querying
filesystem properties, rather than one created from the committer jobconf,
which will have the same options as the task committer, so not actually
validate the override logic.

Contributed by Steve Loughran
2024-06-07 17:34:01 +01:00
Anuj Modi
bbb17e76a7
HADOOP-19178: [WASB Deprecation] Updating Documentation on Upcoming Plans for Hadoop-Azure (#6862)
Contributed by Anuj Modi
2024-06-07 14:28:24 +01:00
PJ Fanning
2ee0bf9534
HADOOP-19154. Upgrade bouncycastle to 1.78.1 due to CVEs (#6755)
Addresses

* CVE-2024-29857 - Importing an EC certificate with specially crafted F2m parameters can cause high CPU usage during parameter evaluation.
* CVE-2024-30171 - Possible timing based leakage in RSA based handshakes due to exception processing eliminated.
* CVE-2024-30172 - Crafted signature and public key can be used to trigger an infinite loop in the Ed25519 verification code.
* CVE-2024-301XX - When endpoint identification is enabled and an SSL socket is not created with an explicit hostname (as happens with HttpsURLConnection), hostname verification could be performed against a DNS-resolved IP address. 

Contributed by PJ Fanning
2024-06-05 15:31:23 +01:00
Cheng Pan
d8d3d538e4
HADOOP-19193. Create orphan commit for website deployment (#6864)
This stop gh-pages deployments from increasing the size of the git repository on every run

Contributed by Cheng Pan
2024-06-05 15:25:48 +01:00
Mukund Thakur
f92a8ab8ae
HADOOP-19190. Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms (#6859)
Follow up of HADOOP-19190
2024-06-03 12:00:31 -05:00
Yu Zhang
f1e2ceb823
HDFS-13603: Do not propagate ExecutionException while initializing EDEK queues for keys. (#6860) 2024-06-03 09:10:06 -07:00
Yang Jiandan
167d4c8447
YARN-11699. Diagnostics lacks userlimit info when user capacity has reached its maximum limit (#6849) Contributed by Jiandan Yang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-06-01 06:18:28 +08:00
slfan1989
9f6c997662
YARN-11471. [Federation] FederationStateStoreFacade Cache Support Caffeine. (#6795) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-06-01 06:15:20 +08:00
Anuj Modi
d8b485a512
HADOOP-18516: [ABFS][Authentication] Support Fixed SAS Token for ABFS Authentication (#6552)
Contributed by Anuj Modi
2024-05-30 20:46:19 +01:00
Steve Loughran
d00b3acd5e
HADOOP-18679. Followup: change method name case (#6854)
WrappedIO.bulkDelete_PageSize() => bulkDelete_pageSize()

Makes it consistent with the HADOOP-19131 naming scheme.
The name needs to be fixed before invoking it through reflection,
as once that is attempted the binding won't work at run time,
though compilation will be happy.

Contributed by Steve Loughran
2024-05-30 19:34:30 +01:00
Mukund Thakur
d107931fc7
HADOOP-19188. Fix TestHarFileSystem and TestFilterFileSystem failing after bulk delete API got added. (#6848)
Follow up to: HADOOP-18679 Add API for bulk/paged delete of files and objects

Contributed by Mukund Thakur
2024-05-29 17:27:09 +01:00
K0K0V0K
ccb8ff4360
YARN-11687. CGroupV2 resource calculator (#6835)
Co-authored-by: Benjamin Teke <brumi1024@users.noreply.github.com>
2024-05-29 17:20:23 +02:00
刘斌
6c08e8e2aa
HADOOP-19156. ZooKeeper based state stores use different ZK address configs. (#6767). Contributed by liu bin.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-05-29 20:44:36 +08:00
Mukund Thakur
f4fde40524
HADOOP-19184. S3A Fix TestStagingCommitter.testJobCommitFailure (#6843)
Follow up on HADOOP-18679

Contributed by: Mukund Thakur
2024-05-28 11:27:33 -05:00
Felix Nguyen
74d30a5dce
HDFS-17532. RBF: Allow router state store cache update to overwrite and delete in parallel (#6839) 2024-05-28 11:17:08 +08:00
Murali Krishna
1baf0e889f
HADOOP-18962. Upgrade kafka to 3.4.0 (#6247)
Upgrade Kafka Client due to CVEs

* CVE-2023-25194
* CVE-2021-38153
* CVE-2018-17196

Contributed by Murali Krishna
2024-05-24 17:40:37 +01:00
Felix Nguyen
f5c5d35eb0
HDFS-17529. RBF: Improve router state store cache entry deletion (#6833) 2024-05-24 09:41:08 +08:00
Anmol Asrani
d168d3ffee
HADOOP-18325: ABFS: Add correlated metric support for ABFS operations (#6314)
Adds support for metric collection at the filesystem instance level.
Metrics are pushed to the store upon the closure of a filesystem instance, encompassing all operations
that utilized that specific instance.

Collected Metrics:

- Number of successful requests without any retries.
- Count of requests that succeeded after a specified number of retries (x retries).
- Request count subjected to throttling.
- Number of requests that failed despite exhausting all retry attempts. etc.
Implementation Details:

Incorporated logic in the AbfsClient to facilitate metric pushing through an additional request.
This occurs in scenarios where no requests are sent to the backend for a defined idle period.
By implementing these enhancements, we ensure comprehensive monitoring and analysis of filesystem interactions, enabling a deeper understanding of success rates, retry scenarios, throttling instances, and exhaustive failure scenarios. Additionally, the AbfsClient logic ensures that metrics are proactively pushed even during idle periods, maintaining a continuous and accurate representation of filesystem performance.

Contributed by Anmol Asrani
2024-05-23 15:10:10 +01:00
Benjamin Teke
d876505b67
YARN-11681. Update the cgroup documentation with v2 support (#6834)
Co-authored-by: Benjamin Teke <bteke@cloudera.com>
Co-authored-by: K0K0V0K <109747532+K0K0V0K@users.noreply.github.com>
2024-05-21 17:41:32 +02:00
hfutatzhanghb
fb156e8f05
HDFS-17464. Improve some logs output in class FsDatasetImpl (#6724) 2024-05-21 09:46:21 +08:00
slfan1989
be28467374
Revert "Bump org.apache.derby:derby in /hadoop-project (#6816)" (#6841)
This reverts commit b5a90d9500.
2024-05-21 08:46:14 +08:00
Sebb
f11a8cfa6e
HADOOP-13147. Constructors must not call overrideable methods in PureJavaCrc32C (#6408). Contributed by Sebb. 2024-05-21 00:08:08 +05:30
Mukund Thakur
47be1ab3b6
HADOOP-18679. Add API for bulk/paged delete of files (#6726)
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
2024-05-20 17:05:25 +01:00
Kaiyao Ke
41eacf4914
MAPREDUCE-7475. Fix non-idempotent unit tests (#6785)
Contributed by Kaiyao Ke
2024-05-17 14:51:47 +01:00
LiuGuH
8f92cda35c
HDFS-17509. RBF: Fix ClientProtocol.concat will throw NPE if tgr is a empty file. (#6784) 2024-05-17 10:37:50 +08:00