Commit Graph

27415 Commits

Author SHA1 Message Date
slfan1989
652908519e
YARN-11588. [Federation] [Addendum] Fix uncleaned threads in yarn router thread pool executor. (#6222) 2023-10-26 13:39:06 -07:00
Wei-Chiu Chuang
821ed83873
HDFS-15273. CacheReplicationMonitor hold lock for long time and lead to NN out of service. Contributed by Xiaoqiao He. 2023-10-26 10:35:10 -07:00
slfan1989
d18410221b
YARN-11593. [Federation] Improve command line help information. (#6199) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-26 08:22:18 +08:00
Steve Loughran
8bd1f65efc
HADOOP-18948. S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete (#6218)
S3A directory delete and rename will optionally abort all pending multipart uploads
in their under their to-be-deleted paths when.

fs.s3a.directory.operations.purge.upload is true

It is off by default.

The filesystems hasPathCapability("fs.s3a.directory.operations.purge.upload")
probe will return true when this feature is enabled.

Multipart uploads may accrue from interrupted data writes, uncommitted 
staging/magic committer jobs and other operations/applications. On AWS S3
lifecycle rules are the recommended way to clean these; this change improves
support for stores which lack these rules.

Contributed by Steve Loughran
2023-10-25 17:39:16 +01:00
PJ Fanning
bbf905dc99
HADOOP-18933. upgrade to netty 4.1.100 due to CVE (#6173)
Mitigates Netty security advisory GHSA-xpw8-rcwv-8f8p
"HTTP/2 Rapid Reset Attack - DDoS vector in the HTTP/2 protocol due RST frames"

Contributed by PJ Fanning
2023-10-25 14:06:13 +01:00
huhaiyang
f85ac5b60d
HADOOP-18920. RPC Metrics : Optimize logic for log slow RPCs (#6146) 2023-10-25 13:56:39 +08:00
gp1314
a170d58501
HDFS-17231. HA: Safemode should exit when resources are from low to available. (#6207). Contributed by Gu Peng.
Reviewed-by: Xing Lin <xinglin@linkedin.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-10-25 11:43:12 +08:00
Stephen O'Donnell
882f08b4bc
HDFS-17237. Remove IPCLoggerChannelMetrics when the logger is closed (#6217) 2023-10-24 21:39:03 +01:00
Steve Loughran
8b974bcc1f
HADOOP-18889. Third party storage followup. (#6186)
Followup to HADOOP-18889 third party store support;

Fix some minor review comments which came in after the merge.
2023-10-24 18:17:52 +01:00
PJ Fanning
0042544bf2
HADOOP-18949. upgrade maven dependency plugin due to CVE-2021-26291. (#6219)
Addresses CVE-2021-26291. "Origin Validation Error in Apache Maven"

Contributed by PJ Fanning.
2023-10-24 12:28:40 +01:00
slfan1989
9c7e5b66fa
YARN-11576. Improve FederationInterceptorREST AuditLog. (#6117) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-24 09:36:06 +08:00
slfan1989
80a22a736e
YARN-11500. [Addendum] Fix typos in hadoop-yarn-server-common#federation. (#6212) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-24 09:28:05 +08:00
huhaiyang
9d48af8d70
HADOOP-18868. Optimize the configuration and use of callqueue overflow trigger failover (#5998) 2023-10-23 14:06:02 -07:00
Zita Dombi
4c04818d3d
HADOOP-18919. Zookeeper SSL/TLS support in HDFS ZKFC (#6194) 2023-10-23 11:03:15 -07:00
huhaiyang
5eeab5e1b9
HDFS-17235. Fix javadoc errors in BlockManager (#6214). Contributed by Haiyang Hu. 2023-10-23 20:12:39 +05:30
jianghuazhu
6e13e4addc
HDFS-17228. Improve documentation related to BlockManager. (#6195). Contributed by JiangHua Zhu.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-23 20:05:33 +05:30
Ayush Saxena
fbd653be9b
Revert "HDFS-17228. Improve documentation related to BlockManager. (#6195). Contributed by JiangHua Zhu."
This reverts commit 81ba2e8484.
2023-10-23 19:35:12 +05:30
Steve Loughran
3e0fcda7a5
HADOOP-18945. S3A. IAMInstanceCredentialsProvider failing. (#6202)
This restores asynchronous retrieval/refresh of any AWS credentials provided by the
EC2 instance/container in which the process is running.

Contributed by Steve Loughran
2023-10-23 14:24:30 +01:00
slfan1989
d7d772d684
YARN-11595. Fix hadoop-yarn-client#java.lang.NoClassDefFoundError (#6210) Contributed by Shilun Fan.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-22 22:22:14 +08:00
Masatake Iwasaki
24fe1ef4dd HADOOP-18942 addendum. update LICENSE-binary. 2023-10-22 22:22:56 +09:00
Viraj Jasani
acaf8ef3ca
HADOOP-18918. ITestS3GuardTool fails if SSE/DSSE encryption is used (#6165)
HADOOP-18918. ITestS3GuardTool fails if SSE/DSSE encryption is used.

Contributed by Viraj Jasani.
2023-10-20 10:47:44 +01:00
Steve Loughran
215cb15beb
HADOOP-18946. TestErrorTranslation failure (#6205)
Fixes TestErrorTranslation.testMultiObjectExceptionFilledIn() failure
which came in with HADOOP-18939.

Contributed by Steve Loughran
2023-10-20 10:13:05 +01:00
PeterWright
9a411fcf9d
HADOOP-18941. Modify HBase version in BUILDING.txt (#6206) 2023-10-20 16:20:17 +09:00
Masatake Iwasaki
8bf72346a5
HADOOP-18942. Upgrade ZooKeeper to 3.7.2. (#6200)
Signed-off-by: Masatake Iwasaki <iwasakims@apache.org>
2023-10-19 18:47:45 +09:00
GuoPhilipse
615a2a42cf
HDFS-17220. Fix same available space policy in AvailableSpaceVolumeChoosingPolicy (#6174)
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: zhangshuyan <zqingchai@gmail.com>
Signed-off-by: Tao Li <tomscut@apache.org>
2023-10-18 13:01:32 +08:00
Masatake Iwasaki
13843f4a88
HADOOP-18867. Upgrade ZooKeeper to 3.6.4. (#5988) 2023-10-18 10:31:41 +09:00
jianghuazhu
81ba2e8484
HDFS-17228. Improve documentation related to BlockManager. (#6195). Contributed by JiangHua Zhu.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-18 05:05:33 +05:30
Steve Loughran
e0563fed50
HADOOP-18908. Improve S3A region handling. (#6187)
S3A region logic improved for better inference and
to be compatible with previous releases

1. If you are using an AWS S3 AccessPoint, its region is determined
   from the ARN itself.
2. If fs.s3a.endpoint.region is set and non-empty, it is used.
3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, 
   the region is determined by by parsing the URL 
   Note: vpce endpoints are not handled by this.
4. If fs.s3a.endpoint.region==null, and none could be determined
   from the endpoint, use us-east-2 as default.
5. If fs.s3a.endpoint.region=="" then it is handed off to
   The default AWS SDK resolution process.

Consult the AWS SDK documentation for the details on its resolution
process, knowing that it is complicated and may use environment variables,
entries in ~/.aws/config, IAM instance information within
EC2 deployments and possibly even JSON resources on the classpath.
Put differently: it is somewhat brittle across deployments.

Contributed by Ahmar Suhail
2023-10-17 15:37:36 +01:00
Steve Loughran
e5eb404bb3
HADOOP-18939. NPE in AWS v2 SDK RetryOnErrorCodeCondition.shouldRetry() (#6193)
MultiObjectDeleteException to fill in the error details

See also: https://github.com/aws/aws-sdk-java-v2/issues/4600

Contributed by Steve Loughran
2023-10-17 15:17:16 +01:00
Steve Loughran
42e695d510
HADOOP-18932. S3A. upgrade AWS v2 SDK to 2.20.160 and v1 to 1.12.565 (#6178)
v1 => 1.12.565
v2 => 2.20.160
Only the v2 one is distributed; v1 is needed in deployments only to support v1 credential providers

Contributed by Steve Loughran
2023-10-17 12:59:50 +01:00
Szilard Nemeth
2736f88561 YARN.11590. RM process stuck after calling confStore.format() when ZK SSL/TLS is enabled, as netty thread waits indefinitely. Contributed by Ferenc Erdelyi 2023-10-16 15:17:58 -04:00
GuoPhilipse
c8abca3004
HDFS-17210. Optimize AvailableSpaceBlockPlacementPolicy. (#6113). Contributed by GuoPhilipse.
Reviewed-by:  He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2023-10-16 16:34:40 +08:00
slfan1989
00f8cdcb0f
YARN-11571. [GPG] Add Information About YARN GPG in Federation.md (#6158) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-14 10:00:28 +08:00
jianghuazhu
8963b25ab3
HADOOP-18926.Add some comments related to NodeFencer. (#6162) 2023-10-13 15:34:44 -07:00
PJ Fanning
3d7b58d8a5
HADOOP-18916. Exclude all module-info classes from uber jars (#6131)
Removes java9 and java11 from all modules pulled into the hadoop-client
and hadoop-client-minicluster modules.

Contributed by PJ Fanning
2023-10-13 20:01:44 +01:00
Steve Loughran
9bc159f4ac
HADOOP-18487. Make protobuf 2.5 an optional runtime dependency. (#4996)
Protobuf 2.5 JAR is no longer needed at runtime. 

The option common.protobuf.scope defines whether the protobuf 2.5.0
dependency is marked as provided or not.

* New package org.apache.hadoop.ipc.internal for internal only protobuf classes
  ...with a ShadedProtobufHelper in there which has shaded protobuf refs
  only, so guaranteed not to need protobuf-2.5 on the CP
* All uses of org.apache.hadoop.ipc.ProtobufHelper have
  been replaced by uses of org.apache.hadoop.ipc.internal.ShadedProtobufHelper
* The scope of protobuf-2.5 is set by the option common.protobuf2.scope
  In this patch is it is still "compile"
* There is explicit reference to it in modules where it may be needed.
*  The maven scope of the dependency can be set with the common.protobuf2.scope
   option. It can be set to "provided" in a build:
       -Dcommon.protobuf2.scope=provided
* Add new ipc(callable) method to catch and convert shaded protobuf
  exceptions raised during invocation of the supplied lambda expression
* This is adopted in the code where the migration is not traumatically
  over-complex. RouterAdminProtocolTranslatorPB is left alone for this
  reason.

Contributed by Steve Loughran
2023-10-13 13:48:38 +01:00
Steve Loughran
81edbebdd8
HADOOP-18889. S3A v2 SDK third party support (#6141)
Tune AWS v2 SDK changes based on testing with third party stores
including GCS. 

Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs

* Changes needed to work with multiple third party stores
* New third_party_stores document on how to bind to and test
  third party stores, including google gcs (which works!)
* Troubleshooting docs mostly updated for v2 SDK

Exception translation/resilience

* New AWSUnsupportedFeatureException for unsupported/unavailable errors
* Handle 501 method unimplemented as one of these
* Error codes > 500 mapped to the AWSStatus500Exception if no explicit
  handler.
* Precondition errors handled a bit better
* GCS throttle exception also recognized.
* GCS raises 404 on a delete of a file which doesn't exist: swallow it.
* Error translation uses reflection to create IOE of the right type.
  All IOEs at the bottom of an AWS stack chain are regenerated.
  then a new exception of that specific type is created, with the top level ex
  its cause. This is done to retain the whole stack chain.
* Reduce the number of retries within the AWS SDK
* And those of s3a code.
* S3ARetryPolicy explicitly declare SocketException as connectivity failure
  but subclasses BindException
* SocketTimeoutException also considered connectivity  
* Log at debug whenever retry policies looked up
* Reorder exceptions to alphabetical order, with commentary
* Review use of the Invoke.retry() method 

 The reduction in retries is because its clear when you try to create a bucket
 which doesn't resolve that the time for even an UnknownHostException to
 eventually fail over 90s, which then hit the s3a retry code.
 - Reducing the SDK retries means these escalate to our code better.
 - Cutting back on our own retries makes it a bit more responsive for most real
 deployments.
 - maybeTranslateNetworkException() and s3a retry policy means that
   unknown host exception is recognised and fails fast.

Contributed by Steve Loughran
2023-10-12 17:47:44 +01:00
huhaiyang
0ed484ac62
HDFS-17208. Add the metrics PendingAsyncDiskOperations in datanode (#6109). Contributed by Haiyang Hu.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-10-12 23:27:15 +08:00
Kevin Risden
5c22934d90
HADOOP-18922. Race condition in ZKDelegationTokenSecretManager creating znode (#6150). Contributed by Kevin Risden.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-10-12 23:21:26 +08:00
jchanggg
bd28ba385a
YARN-11588. [Federation] Fix uncleaned threads in yarn router thread pool executor (#6159) Contributed by Jeffrey Chang.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-12 19:13:44 +08:00
PJ Fanning
732d4e72a6
HADOOP-18929. Exclude commons-compress module-info.class (#6170)
Contributed By: PJ Fanning
2023-10-11 12:50:37 -05:00
Steve Vaughan
73eccd6d7c
HDFS-16740. Mini cluster test flakiness (#4835) 2023-10-10 13:51:46 -07:00
huhaiyang
85af6c3a28
HDFS-17217. Add lifeline RPC start up log when NameNode#startCommonServices (#6154). Contributed by Haiyang Hu.
Reviewed-by:  Shilun Fan <slfan1989@apache.org>
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2023-10-10 10:20:07 +08:00
slfan1989
b00d605832
YARN-9048. Add znode hierarchy in Federation ZK State Store. (#6016) 2023-10-09 14:06:41 -07:00
Anuj Modi
594e9f29f5
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)
Contributed by Anuj Modi
2023-10-09 20:05:23 +01:00
Steve Loughran
882378c3e9
Revert "HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)"
This reverts commit 6c6df40d35.

...so as to give the correct credit
2023-10-09 20:05:07 +01:00
Anuj Modi
6c6df40d35
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)
Contributed by  Anmol Asrani
2023-10-09 20:01:56 +01:00
Anmol Asrani
9c621fcea7
HADOOP-18861. ABFS: Fix failing tests for CPK (#5979)
Contributed by Anmol Asrani
2023-10-09 17:40:15 +01:00
Anmol Asrani
666af58700
HADOOP-18876. ABFS: Change default for fs.azure.data.blocks.buffer to bytebuffer (#6009)
The default value for fs.azure.data.blocks.buffer is changed from "disk" to "bytebuffer"

This will speed up writing to azure storage, at the risk of running out of memory
-especially if there are many threads writing to abfs at the same time and the
upload bandwidth is limited.

If jobs do run out of memory writing to abfs, change the option back to "disk"

Contributed by Anmol Asrani
2023-10-09 16:51:12 +01:00
hfutatzhanghb
ea3cb12ec8
HDFS-17171. CONGESTION_RATIO should be configurable (#5996)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Tao Li <tomscut@apache.org>
2023-10-08 10:36:09 +08:00