Commit Graph

26979 Commits

Author SHA1 Message Date
Steve Loughran
81edbebdd8
HADOOP-18889. S3A v2 SDK third party support (#6141)
Tune AWS v2 SDK changes based on testing with third party stores
including GCS. 

Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs

* Changes needed to work with multiple third party stores
* New third_party_stores document on how to bind to and test
  third party stores, including google gcs (which works!)
* Troubleshooting docs mostly updated for v2 SDK

Exception translation/resilience

* New AWSUnsupportedFeatureException for unsupported/unavailable errors
* Handle 501 method unimplemented as one of these
* Error codes > 500 mapped to the AWSStatus500Exception if no explicit
  handler.
* Precondition errors handled a bit better
* GCS throttle exception also recognized.
* GCS raises 404 on a delete of a file which doesn't exist: swallow it.
* Error translation uses reflection to create IOE of the right type.
  All IOEs at the bottom of an AWS stack chain are regenerated.
  then a new exception of that specific type is created, with the top level ex
  its cause. This is done to retain the whole stack chain.
* Reduce the number of retries within the AWS SDK
* And those of s3a code.
* S3ARetryPolicy explicitly declare SocketException as connectivity failure
  but subclasses BindException
* SocketTimeoutException also considered connectivity  
* Log at debug whenever retry policies looked up
* Reorder exceptions to alphabetical order, with commentary
* Review use of the Invoke.retry() method 

 The reduction in retries is because its clear when you try to create a bucket
 which doesn't resolve that the time for even an UnknownHostException to
 eventually fail over 90s, which then hit the s3a retry code.
 - Reducing the SDK retries means these escalate to our code better.
 - Cutting back on our own retries makes it a bit more responsive for most real
 deployments.
 - maybeTranslateNetworkException() and s3a retry policy means that
   unknown host exception is recognised and fails fast.

Contributed by Steve Loughran
2023-10-12 17:47:44 +01:00
huhaiyang
0ed484ac62
HDFS-17208. Add the metrics PendingAsyncDiskOperations in datanode (#6109). Contributed by Haiyang Hu.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-10-12 23:27:15 +08:00
Kevin Risden
5c22934d90
HADOOP-18922. Race condition in ZKDelegationTokenSecretManager creating znode (#6150). Contributed by Kevin Risden.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-10-12 23:21:26 +08:00
jchanggg
bd28ba385a
YARN-11588. [Federation] Fix uncleaned threads in yarn router thread pool executor (#6159) Contributed by Jeffrey Chang.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-12 19:13:44 +08:00
PJ Fanning
732d4e72a6
HADOOP-18929. Exclude commons-compress module-info.class (#6170)
Contributed By: PJ Fanning
2023-10-11 12:50:37 -05:00
Steve Vaughan
73eccd6d7c
HDFS-16740. Mini cluster test flakiness (#4835) 2023-10-10 13:51:46 -07:00
huhaiyang
85af6c3a28
HDFS-17217. Add lifeline RPC start up log when NameNode#startCommonServices (#6154). Contributed by Haiyang Hu.
Reviewed-by:  Shilun Fan <slfan1989@apache.org>
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2023-10-10 10:20:07 +08:00
slfan1989
b00d605832
YARN-9048. Add znode hierarchy in Federation ZK State Store. (#6016) 2023-10-09 14:06:41 -07:00
Anuj Modi
594e9f29f5
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)
Contributed by Anuj Modi
2023-10-09 20:05:23 +01:00
Steve Loughran
882378c3e9
Revert "HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)"
This reverts commit 6c6df40d35.

...so as to give the correct credit
2023-10-09 20:05:07 +01:00
Anuj Modi
6c6df40d35
HADOOP-18869: [ABFS] Fix behavior of a File System APIs on root path (#6003)
Contributed by  Anmol Asrani
2023-10-09 20:01:56 +01:00
Anmol Asrani
9c621fcea7
HADOOP-18861. ABFS: Fix failing tests for CPK (#5979)
Contributed by Anmol Asrani
2023-10-09 17:40:15 +01:00
Anmol Asrani
666af58700
HADOOP-18876. ABFS: Change default for fs.azure.data.blocks.buffer to bytebuffer (#6009)
The default value for fs.azure.data.blocks.buffer is changed from "disk" to "bytebuffer"

This will speed up writing to azure storage, at the risk of running out of memory
-especially if there are many threads writing to abfs at the same time and the
upload bandwidth is limited.

If jobs do run out of memory writing to abfs, change the option back to "disk"

Contributed by Anmol Asrani
2023-10-09 16:51:12 +01:00
hfutatzhanghb
ea3cb12ec8
HDFS-17171. CONGESTION_RATIO should be configurable (#5996)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Tao Li <tomscut@apache.org>
2023-10-08 10:36:09 +08:00
slfan1989
42b32fbbdc
YARN-11583. Improve Node Link for YARN Federation Web Page. (#6145) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-08 08:20:11 +08:00
Colm O hEigeartaigh
ee1ebbe5f9
HADOOP-18923. Switch to SPDX identifier for license name (#6149). Contributed by Colm O hEigeartaigh.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-07 22:50:38 +05:30
huangzhaobo
daa78adc88
HDFS-17200. Add some datanode related metrics to Metrics.md. (#6099). Contributed by huangzhaobo99
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-06 12:40:44 +05:30
huhaiyang
4c408a557f
HDFS-17205. HdfsServerConstants.MIN_BLOCKS_FOR_WRITE should be configurable (#6112). Contributed by Haiyang Hu
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-06 12:39:23 +05:30
PJ Fanning
57100bba1b
HADOOP-18917. Addendum: Upgrade to commons-io 2.14.0 (#6152). Contributed by PJ Fanning
Co-authored-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-06 09:40:32 +05:30
PJ Fanning
2bf5a9ed11
HADOOP-18917. Upgrade to commons-io 2.14.0 (#6133). Contributed by PJ Fanning
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-06 01:58:21 +05:30
slfan1989
f3a27f2b22
YARN-11579. Fix 'Physical Mem Used' and 'Physical VCores Used' are not displaying data. (#6123) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-05 14:20:40 +08:00
Anmol Asrani
ababe3d9b0
HADOOP-18875. ABFS: Add sendMs and recvMs information for each AbfsHttpOperation by default. (#6008)
Contributed By: Anmol Asrani
2023-10-04 13:55:03 -05:00
huhaiyang
5edd21bc85
HDFS-17194. Enhance the log message for striped block recovery (#6094) 2023-10-04 11:22:59 -07:00
xiaojunxiang
0cfffb3012
HDFS-17214. RBF: The Quota class' andByStorageType method res has an incorrect initial value. (#6135)
Co-authored-by: xiaojunxiang <xiaojunxiang@kingsoft.com>
2023-10-03 08:26:03 -07:00
slfan1989
fe3984aa01
YARN-11580. YARN Router Web supports displaying information for Non-Federation. (#6127) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-10-03 18:21:52 +08:00
Tamas Domok
a04a9e107b
YARN-11578. Cache fs supports chmod in LogAggregationFileController. (#6120) 2023-10-02 15:20:47 +02:00
Wang Yu
b87180568b
HDFS-17209. Correct comments to align with the code (#6110). Contributed by Yu Wang.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-01 17:30:59 +05:30
zhengchenyu
b8815fe68b
MAPREDUCE-7453. Revert HADOOP-18649. (#6102). Contributed by zhengchenyu.
In container-log4j.properties, log4j.appender.{APPENDER}.MaxFileSize is set to ${yarn.app.container.log.filesize}, but yarn.app.container.log.filesize is 0 in default. So log is missing. This log is always rolling and only show the latest log.
2023-10-01 17:25:32 +05:30
slfan1989
5f47f091a2
YARN-11537. [Addendum][Federation] Router CLI Supports List SubClusterPolicyConfiguration Of Queues. (#6121) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-09-29 16:59:01 +08:00
ConfX
8931393302
HDFS-17133: TestFsDatasetImpl missing null check when cleaning up (#6079). Contributed by ConfX.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-09-29 12:06:24 +05:30
xiaojunxiang
390cd294f8
HDFS-17211. Fix comments in the RemoteParam class. (#6124). Contributed hellosrc.
Reviewed-by: Xing Lin <linxingnku@gmail.com>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-09-29 11:55:59 +05:30
slfan1989
1d2afc5cf6
YARN-8862. [BackPort] [GPG] Add Yarn Registry cleanup in ApplicationCleaner. (#6083) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-09-29 07:15:53 +08:00
PJ Fanning
35c42e4039
HADOOP-18912. upgrade snappy-java to 1.1.10.4 (#6115). Contributed by PJ Fanning.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-09-28 11:22:31 +05:30
Szilard Nemeth
2d871fab78 MAPREDUCE-7456. Extend add-opens flag to container launch commands on JDK17 nodes. Contributed by Peter Szucs 2023-09-27 22:33:45 -04:00
Masatake Iwasaki
0c153fe465
YARN-11558. Fix dependency convergence error on hbase2 profile. (#6017) 2023-09-28 10:17:29 +09:00
Szilard Nemeth
d9cb76ac98 YARN-11468. Zookeeper SSL/TLS support. Contributed by Ferenc Erdelyi 2023-09-27 18:21:45 -04:00
Tamas Domok
f232eec490
YARN-11522. Update the documentation with the YARN-11000 changes. (#5870) 2023-09-27 16:43:28 +02:00
zhangshuyan
26a5f38250
HDFS-17204. EC: Reduce unnecessary log when processing excess redundancy. (#6107). Contributed by Shuyan Zhang.
Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-09-26 18:49:47 +08:00
slfan1989
3de66f5c40
YARN-11547. [Federation] Router Supports Remove individual application records from FederationStateStore. (#6055) 2023-09-25 13:52:57 -07:00
Benjamin Teke
f51162d70b
YARN-11514. Extend SchedulerResponse with capacityVector (#5989)
Co-authored-by: Benjamin Teke <bteke@cloudera.com>
2023-09-25 16:24:03 +02:00
slfan1989
bf9975a1b3
YARN-9586. Need more doc for yarn.federation.policy-manager-params when LoadBasedRouterPolicy is used. (#6085) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-09-25 13:23:02 +08:00
zhangshuyan
ecee022e49
HDFS-17197. Show file replication when listing corrupt files. (#6095). Contributed by Shuyan Zhang.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-09-25 13:01:25 +08:00
Szilard Nemeth
13c5825c00
YARN-11573. Add config option to make container allocation prefer nodes without reserved containers (#6098) 2023-09-22 20:00:50 +02:00
K0K0V0K
0780710f25
YARN-11567 - Aggregate container launch debug artifacts on error (#6053) 2023-09-22 15:09:17 +02:00
huhaiyang
cc66683b1a
HDFS-17184. Improve BlockReceiver to throws DiskOutOfSpaceException when initialize. (#6044). Contributed by Haiyang Hu.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-09-21 21:45:30 +08:00
Viraj Jasani
27cb551821
HADOOP-18829. S3A prefetch LRU cache eviction metrics (#5893)
Contributed by: Viraj Jasani
2023-09-21 14:31:44 +05:30
slfan1989
42b8e6faa7
YARN-11570. Add YARN_GLOBALPOLICYGENERATOR_HEAPSIZE to yarn-env for GPG. (#6086) 2023-09-20 17:11:59 -07:00
Jian Zhang
d273c13ab5
HDFS-17198. RBF: fix bug of getRepresentativeQuorum when records have same dateModified (#6096) 2023-09-20 10:04:29 -07:00
Syed Shameerur Rahman
5512c9f924
HADOOP-18797. Support Concurrent Writes With S3A Magic Committer (#6006)
Jobs which commit their work to S3 thr
magic committer now use a unique magic
containing the job ID:
 __magic_job-${jobid}

This allows for multiple jobs to write
to the same destination simultaneously.

Contributed by Syed Shameerur Rahman
2023-09-20 11:26:42 +01:00
Pranav Saxena
f24b73e5f3
HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. (#6010)
AbfsOutputStream to close the dataBlock object created for the upload.

Contributed By: Pranav Saxena
2023-09-20 14:24:36 +05:30