Commit Graph

27116 Commits

Author SHA1 Message Date
Ferenc Erdelyi
8243da8cb0
YARN-11639. CME and NPE in PriorityUtilizationQueueOrderingPolicy (#6455) 2024-01-22 15:41:48 +01:00
hfutatzhanghb
54f7a6b127
HDFS-17293. First packet data + checksum size will be set to 516 bytes when writing to a new block. (#6368). Contributed by farmmamba.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-01-22 11:50:51 +08:00
Pranav Saxena
7dc166ddc7
HADOOP-18883. [ABFS]: Expect-100 JDK bug resolution: prevent multiple server calls (#6022)
Address JDK bug JDK-8314978 related to handling of HTTP 100
responses. 

https://bugs.openjdk.org/browse/JDK-8314978

In the AbfsHttpOperation, after sendRequest() we call processResponse()
method from AbfsRestOperation.
Even if the conn.getOutputStream() fails due to expect-100 error, 
we consume the exception and let the code go ahead.
This may call getHeaderField() / getHeaderFields() / getHeaderFieldLong() after
getOutputStream() has failed. These invocation all lead to server calls.

This commit aims to prevent this.
If connection.getOutputStream() fails due to an Expect-100 error,
the ABFS client does not invoke getHeaderField(), getHeaderFields(),
getHeaderFieldLong() or getInputStream().

getResponseCode() is safe as on the failure it sets the
responseCode variable in HttpUrlConnection object.

Contributed by Pranav Saxena
2024-01-21 19:14:54 +00:00
Steve Loughran
d274f778c1
HADOOP-19046. S3A: update AWS V2 SDK to 2.23.5; v1 to 1.12.599 (#6467)
This update ensures that the timeout set in fs.s3a.connection.request.timeout is passed down
to calls to CreateSession made in the AWS SDK to get S3 Express session tokens.

Contributed by Steve Loughran
2024-01-21 19:00:34 +00:00
PJ Fanning
76691dfa14
HADOOP-18894: upgrade sshd-core due to CVEs (#6060) Contributed by PJ Fanning.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-21 08:13:25 +08:00
LiuGuH
2a1ee8dfcd
HDFS-17311. RBF: ConnectionManager creatorQueue should offer a pool that is not already in creatorQueue. (#6392) Contributed by liuguanghua.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-20 07:55:23 +08:00
slfan1989
15e1789baf
Revert "HDFS-16016. BPServiceActor to provide new thread to handle IBR (#2998)" (#6457) Contributed by Shilun Fan.
This reverts commit c1bf3cb0.

Reviewed-by: Takanobu Asanuma <tasanuma@apache.org>
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-20 07:51:55 +08:00
Susheel Gupta
d0df0689b4
YARN-11607: TestTimelineAuthFilterForV2 fails intermittently (#6459) Contributed by Susheel Gupta.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-20 07:42:08 +08:00
Jian Zhang
1036544480
HDFS-17302. RBF: ProportionRouterRpcFairnessPolicyController-Sharing and isolation. (#6380) 2024-01-19 14:02:21 -08:00
slfan1989
8444f69511
Preparing for 3.5.0 development (#6411)
Co-authored-by: slfan1989 <slfan1989@apache.org>
2024-01-19 15:05:22 +08:00
Xing Lin
27ecc23ae7
HDFS-17332 DFSInputStream: avoid logging stacktrace until when we really need to fail a read request with a MissingBlockException (#6446)
Print a warn log message for read retries and only print the full stack trace for a read request failure.

Contributed by: Xing Lin
2024-01-18 18:03:28 -08:00
Lei313
cc4c4be1b7
HDFS-17331:Fix Blocks are always -1 and DataNode version are always UNKNOWN in federationhealth.html (#6429). Contributed by lei w.
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-01-18 21:10:54 +08:00
slfan1989
4c3d4e6a57
HADOOP-19038. Improve create-release RUN script. (#6448) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-18 19:12:12 +08:00
PJ Fanning
04e447cfa7
YARN-11647. use StandardCharsets.UTF_8 (#6447) Contributed by PJ Fanning.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-18 13:53:18 +08:00
hfutatzhanghb
ba6ada73ac
HDFS-17337. RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. (#6439). Contributed by farmmamba.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-01-18 11:10:05 +08:00
Steve Loughran
eeb657e85f
HADOOP-19033. S3A: disable checksums when fs.s3a.checksum.validation = false (#6441)
Add new option fs.s3a.checksum.validation, default false, which
is used when creating s3 clients to enable/disable checksum
validation.

When false, GET response processing is measurably faster.

Contributed by Steve Loughran.
2024-01-17 18:34:14 +00:00
Hexiaoqiao
9634bd31e6
HADOOP-19031. Enhance access control for RunJar. (#6427). Contributed by He Xiaoqiao.
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2024-01-17 15:00:06 +08:00
Mukund Thakur
7b1570e2f1
HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool. (#6372)
HADOOP-19015.  Increase fs.s3a.connection.maximum to 500 to minimize the risk of Timeout waiting for connection from the pool

Contributed By: Mukund Thakur
2024-01-16 17:06:28 -06:00
Steve Loughran
d378853790
HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints (#6277)
Adds a new option `fs.s3a.endpoint.fips` to switch the SDK client to use
FIPS endpoints, as an alternative to explicitly declaring them.


* The option is available as a path capability for probes.
* SDK v2 itself doesn't know that some regions don't have FIPS endpoints
* SDK only fails with endpoint + fips flag as a retried exception; wit this
  change the S3A client should fail fast.
  PR fails fast.
* Adds a new "connecting.md" doc; moves existing docs there and restructures.
* New Tests in ITestS3AEndpointRegion

bucket-info command support:

* added to list of path capabilities
* added -fips flag and test for explicit probe
* also now prints bucket region
* and removed some of the obsolete s3guard options
* updated docs

Contributed by Steve Loughran
2024-01-16 14:16:12 +00:00
Steve Loughran
36198b5edf
HADOOP-19027. S3A: S3AInputStream doesn't recover from HTTP/channel exceptions (#6425)
Differentiate from "EOF out of range/end of GET" from
"EOF channel problems" through
two different subclasses of EOFException and input streams to always
retry on http channel errors; out of range GET requests are not retried.
Currently an EOFException is always treated as a fail-fast call in read()

This allows for all existing external code catching EOFException to handle
both, but S3AInputStream to cleanly differentiate range errors (map to -1)
from channel errors (retry)

- HttpChannelEOFException is subclass of EOFException, so all code
  which catches EOFException is still happy.
  retry policy: connectivityFailure
- RangeNotSatisfiableEOFException is the subclass of EOFException
  raised on 416 GET range errors.
  retry policy: fail
- Method ErrorTranslation.maybeExtractChannelException() to create this
  from shaded/unshaded NoHttpResponseException, using string match to
  avoid classpath problems.
- And do this for SdkClientExceptions with OpenSSL error code WFOPENSSL0035.
  We believe this is the OpenSSL equivalent.
- ErrorTranslation.maybeExtractIOException() to perform this translation as
  appropriate.

S3AInputStream.reopen() code retries on EOF, except on
 RangeNotSatisfiableEOFException,
 which is converted to a -1 response to the caller
 as is done historically.

S3AInputStream knows to handle these with
 read(): HttpChannelEOFException: stream aborting close then retry
 lazySeek(): Map RangeNotSatisfiableEOFException to -1, but do not map
  any other EOFException class raised.

This means that
* out of range reads map to -1
* channel problems in reopen are retried
* channel problems in read() abort the failed http connection so it
  isn't recycled

Tests for this using/abusing mocking.

Testing through actually raising 416 exceptions and verifying that
readFully(), char read() and vector reads are all good.

There is no attempt to recover within a readFully(); there's
a boolean constant switch to turn this on, but if anyone does
it a test will spin forever as the inner PositionedReadable.read(position, buffer, len)
downgrades all EOF exceptions to -1.
A new method would need to be added which controls whether to downgrade/rethrow
exceptions.

What does that mean? Possibly reduced resilience to non-retried failures
on the inner stream, even though more channel exceptions are retried on.

Contributed by Steve Loughran
2024-01-16 14:14:03 +00:00
slfan1989
6652922333
HADOOP-19040. mvn site commands fails due to MetricsSystem And MetricsSystemImpl changes. (#6450) Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-16 22:11:16 +08:00
slfan1989
827e33601e
YARN-11638. [GPG] GPG Support CLI. (#6396) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-16 21:49:51 +08:00
Benjamin Teke
f6fea5da2a
MAPREDUCE-7468. [Addendum] Fix TestMapReduceChildJVM unit tests. (#6451) 2024-01-15 14:24:56 +01:00
slfan1989
6ebce65ae8
YARN-11634. [Addendum] Speed-up TestTimelineClient. (#6419)
Co-authored-by: slfan1989 <slfan1989@apache.org>
2024-01-15 08:44:17 +01:00
slfan1989
0f8b74b03f
HADOOP-19034. Fix Download Maven Url Not Found. (#6438). Contributed by Shilun Fan.
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2024-01-14 18:30:40 +08:00
hfutatzhanghb
a30681077b
HDFS-17291. DataNode metric bytesWritten is not totally accurate in some situations. (#6360). Contributed by farmmamba.
Reviewed-by: huangzhaobo <huangzhaobo99@126.com>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-01-13 20:45:00 +08:00
hfutatzhanghb
ead7b7f565
HDFS-17289. Considering the size of non-lastBlocks equals to complete block size can cause append failure. (#6357). Contributed by farmmamba.
Reviewed-by: Haiyang Hu <haiyang.hu@shopee.com>
Reviewed-by: huangzhaobo <huangzhaobo99@126.com>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-01-13 20:34:02 +08:00
Steve Loughran
2f1e1558b6
HADOOP-19004. S3A: Support Authentication through HttpSigner API (#6324)
Move to the new auth flow based signers for aws. * Implement a new Signer Initialization Chain
* Add a new instantiation method
* Add a new test
* Fix Reflection Code for SignerInitialization

Contributed by Harshit Gupta
2024-01-11 17:13:31 +00:00
Xing Lin
453e264eb4
HADOOP-18981. Move oncrpc and portmap packages to hadoop-common (#6280)
Move the org.apache.hadoop.{oncrpc, portmap} packages from the hadoop-nfs module
to the hadoop-common module.

This allows for use of the protocol beyond just NFS -including within HDFS itself.

Contributed by Xing Lin
2024-01-11 14:06:15 +00:00
Benjamin Teke
ef636c4278
MAPREDUCE-7468: Change add-opens flag's default value from true to false (#6436)
Co-authored-by: Benjamin Teke <bteke@cloudera.com>
2024-01-11 14:51:59 +01:00
hfutatzhanghb
6a053765ee
HDFS-17312. packetsReceived metric should ignore heartbeat packet. (#6394)
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2024-01-11 22:08:37 +09:00
Tamas Domok
55b9f87698
YARN-11646. Do not ignore zero memory capacity config in QueueCapacityConfigParser. (#6433) 2024-01-11 13:47:00 +01:00
slfan1989
bc159b5a87
YARN-10125. [Federation] Kill application from client does not kill Unmanaged AM's and containers launched by Unmanaged AM. (#6363) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-11 20:01:59 +08:00
xuzifu666
99a59ae9e6
HDFS-17317. Improve the resource release for metaOut in DebugAdmin (#6402). Contributed by xy.
Reviewed-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org
2024-01-07 00:59:31 +05:30
slfan1989
64beecb7cb
YARN-11631. [GPG] Add GPGWebServices. (#6354) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-06 17:50:20 +08:00
slfan1989
60033fd581
YARN-11642. Fix Flaky Test TestTimelineAuthFilterForV2#testPutTimelineEntities. (#6417) Contributed by Shilun Fan.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-06 16:26:01 +08:00
huangzhaobo
08713665c0
HDFS-17315. Optimize the namenode format code logic. (#6400)
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2024-01-06 01:47:17 +09:00
LiuGuH
5f9932acc4
HDFS-17325. Fix the documentation of fs expunge command in FileSystemShell.md. (#6413) Contributed by liuguanghua.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-05 18:42:55 +08:00
LiuGuH
2369f0cddb
HDFS-17309. RBF: Fix Router Safemode check condition error (#6390) Contributed by liuguanghua.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Reviewed-by: Simbarashe Dzinamarira <sdzinamarira@linkedin.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-05 18:36:28 +08:00
Lei Yang
661c784662
HDFS-17290: Adds disconnected client rpc backoff metrics (#6359) 2024-01-04 20:24:10 -08:00
LiuGuH
7d3b6a36b8
HDFS-17306. RBF: Router should not return nameservices that does not enable observer nodes in RpcResponseHeaderProto (#6385) 2024-01-04 14:43:11 -08:00
hfutatzhanghb
8c26d4e9e0
HDFS-17322. Renames RetryCache#MAX_CAPACITY to be MIN_CAPACITY to fit usage. 2024-01-04 14:31:53 -08:00
hfutatzhanghb
d5468d84ba
HDFS-17283. Change the name of variable SECOND in HdfsClientConfigKeys. (#6339). Contributed by farmmamba.
Reviewed-by: Xing Lin <xinglin@linkedin.com>
Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>
2024-01-04 19:53:47 +08:00
huhaiyang
7a7db7f0dc
HDFS-17310. DiskBalancer: Enhance the log message for submitPlan (#6391) Contributed by Haiyang Hu.
Reviewed-by: Ashutosh Gupta <ashugpt@amazon.com>
Reviewed-by: Takanobu Asanuma <tasanuma@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-04 00:07:51 +08:00
LiuGuH
335587df9e
Add synchronized on lockLeakCheck() because threadCountMap is not thread safe. (#6029)
Co-authored-by: lgh <liuguanghua@kanzhun.com>
2024-01-03 21:12:38 +08:00
Anuj Modi
e3c135b0b3
HADOOP-18971. [ABFS] Read and cache file footer with fs.azure.footer.read.request.size (#6270)
The option fs.azure.footer.read.request.size sets the size of the footer to
read and cache; the default value of 524288 has been measured to
be good for most workloads running on parquet, ORC and similar file formats.

Contributed by Anuj Modi
2024-01-03 12:49:52 +00:00
slfan1989
556fbcf025
YARN-11632. [Doc] Add allow-partial-result description to Yarn Federation documentation. (#6340) Contributed by Shilun Fan.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-03 07:17:37 +08:00
Pranav Saxena
0b43026cab
HADOOP-17912. ABFS: Support for Encryption Context (#6221)
Contributed by Pranav Saxena and others.
2024-01-01 19:09:44 +00:00
Murali Krishna
9edcf42c78
HADOOP-18540. Upgrade Bouncy Castle to 1.70 (#5166)
This addresses
- [sonatype-2021-4916] CWE-327: Use of a Broken or Risky Cryptographic Algorithm
- [sonatype-2019-0673] CWE-400: Uncontrolled Resource Consumption ('Resource Exhaustion')

Contributed by Murali Krishna
2024-01-01 19:04:06 +00:00
Ayush Saxena
9a4d10763c
HADOOP-19020. Update the year to 2024. (#6397). Contributed by Ayush Saxena.
Reviewed-by: Ashutosh Gupta <ashugpt@amazon.com>
Reviewed-by: Shilun Fan <slfan1989@apache.org>
2024-01-01 12:51:54 +05:30