5743 Commits

Author SHA1 Message Date
Viraj Jasani
afb863acf4
HADOOP-18740. S3A prefetch cache blocks should be accessed by RW locks (#5675)
Contributed by Viraj Jasani
2023-06-08 16:34:41 +01:00
Doroszlai, Attila
60b37bbdf7
HADOOP-18433. Fix main thread name for . (#4838) (#5692)
(cherry picked from commit f68f1a45783d7eb3f982fadef800b58ab8b763f3)

Co-authored-by: zhengchenyu <zhengchenyu16@gmail.com>
2023-06-05 07:51:53 +02:00
Steve Loughran
c9f2c45209
HADOOP-18755. openFile builder new optLong() methods break hbase-filesystem (#5704)
This is a followup to
HADOOP-18724. Open file fails with NumberFormatException for S3AFileSystem

Contributed by Steve Loughran
2023-06-01 14:32:08 +01:00
Patrick GRANDJEAN
9029bba5dc
HADOOP-18652. Path.suffix raises NullPointerException (#5653). Contributed by Patrick Grandjean.
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-05-19 05:17:40 +05:30
Steve Loughran
ab594ec77e
HADOOP-18724. Open file fails with NumberFormatException for S3AFileSystem (#5611)
This:

1. Adds optLong, optDouble, mustLong and mustDouble
   methods to the FSBuilder interface to let callers explicitly
   passin long and double arguments.
2. The opt() and must() builder calls which take float/double values
   now only set long values instead, so as to avoid problems
   related to overloaded methods resulting in a ".0" being appended
   to a long value.
3. All of the relevant opt/must calls in the hadoop codebase move to
   the new methods
4. And the s3a code is resilient to parse errors in is numeric options
   -it will downgrade to the default.

This is nominally incompatible, but the floating-point builder methods
were never used: nothing currently expects floating point numbers.

For anyone who wants to safely set numeric builder options across all compatible
releases, convert the number to a string and then use the opt(String, String)
and must(String, String) methods.

Contributed by Steve Loughran
2023-05-16 13:41:17 +01:00
Viraj Jasani
949d5ca20b
HADOOP-18688. S3A audit header to include count of items in delete ops (#5621)
The auditor-generated http referrer URL now includes the count of keys
to delete in the "ks" query parameter

Contributed by Viraj Jasani
2023-05-16 10:41:52 +01:00
rohit-kb
771c89a83a
HADOOP-18687. Remove json-smart dependency. (#5549 + #5524)
Contains 

* HADOOP-18687. hadoop-auth: remove unnecessary dependency on json-smart (#5524)
 Contributed by Michiel de Jong
* HADOOP-18687. Remove json-smart dependency. (#5549).
  Contributed by PJ Fanning.
2023-05-09 17:34:36 +01:00
Wei-Chiu Chuang
99312bdfdb
HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553) (#5619)
* HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553)

The HDFS lease APIs have been replicated as interfaces in hadoop-common so other filesystems can
also implement them.  Applications which use the leasing APIs should migrate to the new
interface where possible.

Contributed by Stephen Wu

(cherry picked from commit 0e46388474f52cd38578d946c26ba762be05ef1f)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
	hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestViewDistributedFileSystem.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithAcl.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRetryCacheMetrics.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestFSImageWithOrderedSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOrderedSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForErasureCodingPolicy.java

Co-authored-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
2023-05-09 06:20:56 +08:00
Viraj Jasani
0ad7d7c677
HADOOP-18697. S3A prefetch: failure of ITestS3APrefetchingInputStream#testRandomReadLargeFile (#5580)
Contributed by Viraj Jasani
2023-05-02 15:45:37 +01:00
Ayush Saxena
a226016c52
HADOOP-18662. ListFiles with recursive fails with FNF. (#5477). Contributed by Ayush Saxena.
Reviewed-by: Steve Loughran <stevel@apache.org>
2023-05-02 20:12:22 +05:30
Pralabh Kumar
6b6bd82bf0
HADOOP-18715. Add debug log for getting details of tokenKindMap (#5608). Contributed by Pralabh Kumar.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-29 17:30:00 +05:30
Viraj Jasani
05edfee1f3
HADOOP-18399. S3A Prefetch - SingleFilePerBlockCache to use LocalDirAllocator (#5054)
Contributed by Viraj Jasani
2023-04-28 12:03:30 +01:00
Ankit Saurabh
312b776833
HADOOP-18351. Reduce excess logging of errors during S3A prefetching reads (#5274)
Contributed by Ankit Saurabh
2023-04-28 12:03:30 +01:00
Alessandro Passaro
0f1a3f23a5
HADOOP-18378. Implement lazy seek in S3A prefetching. (#4955)
Make S3APrefetchingInputStream.seek() completely lazy. Calls to seek() will not affect the current buffer nor interfere with prefetching, until read() is called.

This change allows various usage patterns to benefit from prefetching, e.g. when calling readFully(position, buffer) in a loop for contiguous positions the intermediate internal calls to seek() will be noops and prefetching will have the same performance as in a sequential read.

Contributed by Alessandro Passaro.
2023-04-28 12:03:30 +01:00
Viraj Jasani
f07be3bec2
HADOOP-18455. S3A prefetching executor should be closed (#4879)
follow-on patch to HADOOP-18186. 

Contributed by: Viraj Jasani
2023-04-28 12:03:30 +01:00
Steve Loughran
4ce763a322
HADOOP-18028. High performance S3A input stream (#4752)
This is the the preview release of the HADOOP-18028 S3A performance input stream.
It is still stabilizing, but ready to test.

Contains

HADOOP-18028. High performance S3A input stream (#4109)
	Contributed by Bhalchandra Pandit.

HADOOP-18180. Replace use of twitter util-core with java futures (#4115)
	Contributed by PJ Fanning.

HADOOP-18177. Document prefetching architecture. (#4205)
	Contributed by Ahmar Suhail

HADOOP-18175. fix test failures with prefetching s3a input stream (#4212)
 Contributed by Monthon Klongklaew

HADOOP-18231.  S3A prefetching: fix failing tests & drain stream async.  (#4386)

	* adds in new test for prefetching input stream
	* creates streamStats before opening stream
	* updates numBlocks calculation method
	* fixes ITestS3AOpenCost.testOpenFileLongerLength
	* drains stream async
	* fixes failing unit test

	Contributed by Ahmar Suhail

HADOOP-18254. Disable S3A prefetching by default. (#4469)
	Contributed by Ahmar Suhail

HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458)

	This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with
	new statistic names in StreamStatistics.

	This stream is not (yet) IOStatisticsContext aware.

	Contributed by Ahmar Suhail

HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk
HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums.
HADOOP-18318. Update class names to be clear they belong to S3A prefetching
	Contributed by Steve Loughran
2023-04-28 12:03:29 +01:00
Sebastian Baunsgaard
919c3f615b
HADOOP-18660. Filesystem Spelling Mistake (#5475).
Contributed by Sebastian Baunsgaard.

Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-25 19:59:54 +01:00
Steve Loughran
21cf507db3
HADOOP-17450. Add Public IOStatistics API -missed backport (#5590)
This cherrypicks SemaphoredDelegatingExecutor HADOOP-17450 changes
from trunk somehow they didn't get into the main IOStatistics backport
to branch-3.3
2023-04-25 15:02:56 +01:00
Doroszlai, Attila
13d3cfd311
HADOOP-18714. Wrong StringUtils.join() called in AbstractContractRootDirectoryTest (#5588)
(cherry picked from commit 5b23224970b48d41adf96b3f5a520411792fe696)
2023-04-24 15:49:20 +02:00
Nikita Eshkeev
7a32e7cc38
HADOOP-18597. Simplify single node instructions for creating directories for Map Reduce. (#5305)
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-24 01:14:09 +05:30
Christos Bisias
57ff8bdb67 HADOOP-18691. Add a CallerContext getter on the Schedulable interface (#5540) 2023-04-20 10:13:33 -07:00
Steve Loughran
a505940a2f
HADOOP-18470. Hadoop 3.3.5 release wrap-up (#5558)
Post-release updates of the branches

* Add jdiff xml files from 3.3.5 release.
* Declare 3.3.5 as the latest stable release.
* Copy release notes.
2023-04-18 10:12:41 +01:00
Viraj Jasani
20d3b9cc46
HADOOP-18620 Avoid using grizzly-http-* APIs (#5356) (#5374) 2023-03-30 07:13:10 +08:00
Steve Loughran
0dd4e500b0
HADOOP-18661. Fix bin/hadoop usage script terminology. (#5473)
Followup to HADOOP-13209: s/slaves/r/workers in
the usage message you get when you type "bin/hadoop"

Contributed by Steve Loughran
2023-03-13 12:24:10 +00:00
rdingankar
94b3c6dd90 HDFS-16917 Add transfer rate quantile metrics for DataNode reads (#5397)
Co-authored-by: Ravindra Dingankar <rdingankar@linkedin.com>
2023-02-27 15:49:26 -08:00
Simbarashe Dzinamarira
5fe19a0f01 HDFS-16901: RBF: Propagates real user's username via the caller context, when a proxy user is being used. (#5346) 2023-02-24 13:32:23 -08:00
hchaverr
eab7215354
HADOOP-18535. Implement token storage solution based on MySQL
Fixes #1240

Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
2023-02-22 14:02:13 -08:00
Steve Loughran
ee71318d72
HADOOP-18636 LocalDirAllocator cannot recover from directory tree deletion (#5412)
Even though DiskChecker.mkdirsWithExistsCheck() will create the directory tree,
it is only called *after* the enumeration of directories with available
space has completed.

Directories which don't exist are reported as having 0 space, therefore
the mkdirs code is never reached.

Adding a simple mkdirs() -without bothering to check the outcome-
ensures that if a dir has been deleted then it will be reconstructed
if possible. If it can't it will still have 0 bytes of space
reported and so be excluded from the allocation.

Contributed by Steve Loughran
2023-02-22 11:50:17 +00:00
Arnout Engelen
477f17be97
HADOOP-18627. Add stronger wording in 'secure mode' introduction (#5406)
Make it more clear that when deploying Hadoop 'secure mode' is generally not optional.

Contributed by Arnout Engelen
2023-02-17 16:31:21 +00:00
Bryan Beaudreault
aa6c51364a HADOOP-18215. Enhance WritableName to be able to return aliases for classes that use serializers (#4215) 2023-02-16 11:38:20 -08:00
Viraj Jasani
8c9c68c19e
HADOOP-18628. IPC Server Connection should log host name before returning VersionMismatch error (#5385)
Contributed by Viraj Jasani
2023-02-15 18:23:44 +00:00
Steve Loughran
cd2401d2cc
HADOOP-18470. More in the 3.3.5 index.html about security (#5383)
Expands on the comments in cluster config to tell people
they shouldn't be running a cluster without a private VLAN
in cloud, that Knox is good here, and unsecured clusters
without a VLAN are just computation-as-a-service to crypto miners

Contributed by Steve Loughran
2023-02-14 17:25:20 +00:00
Owen O'Malley
9e7a9fd46d HDFS-18324. Fix race condition in closing IPC connections. (#5371) 2023-02-10 13:56:52 -08:00
huhaiyang
de08baded6
HADOOP-18625. Fix method name of RPC.Builder#setnumReaders (#5301)
Changes method name of RPC.Builder#setnumReaders to setNumReaders()

The original method is still there, just marked deprecated.
It is the one which should be used when working with older branches.

Contributed by Haiyang Hu
2023-02-09 13:29:47 +00:00
gardenia
752f6d8213
HADOOP-18621. Resource leak in CryptoOutputStream.close() (#5347)
When closing we need to wrap the flush() in a try .. finally, otherwise
when flush throws it will stop completion of the remainder of the
close activities and in particular the close of the underlying wrapped
stream object resulting in a resource leak.

Contributed by Colm Dougan
2023-02-07 12:04:00 +00:00
Steve Vaughan
221221d6fb
HADOOP-18612. Avoid mixing canonical and non-canonical when performing comparisons (#5339)
Contributed by Steve Vaughan Jr
2023-02-06 18:30:45 +00:00
Steve Vaughan
7b6a69faaa
HADOOP-18279. Cancel fileMonitoringTimer even if trustManager isn't defined (#4789)
Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
2023-02-01 13:33:34 -08:00
Viraj Jasani
f3fa4af5dc
HADOOP-18592. Sasl connection failure should log remote address. (#5294)
Contributed by Viraj Jasani <vjasani@apache.org>

Signed-off-by: Chris Nauroth <cnauroth@apache.org>
Signed-off-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Mingliang Liu <liuml07@apache.org>
2023-02-01 10:16:42 -08:00
Wei-Chiu Chuang
4836f1ec37
HADOOP-18584. [NFS GW] Fix regression after netty4 migration. (#5252)
Reviewed-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>
(cherry picked from commit 9d47108b50fb0cd79ca48e82077e57572d8873e6)
2023-02-01 05:33:01 -08:00
Ayush Saxena
73f3196db5
HADOOP-18604. Add compile platform in the hadoop version output. (#5327). Contributed by Ayush Saxena.
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2023-01-28 14:20:27 +05:30
PJ Fanning
ada06aa22e
HADOOP-18575: followup: try to avoid repeatedly hitting exceptions when transformer factories do not support attributes (#5253)
Part of HADOOP-18469 and the hardening of XML/XSL parsers.
Followup to the main HADOOP-18575 patch, to improve performance when
working with xml/xsl engines which don't support the relevant attributes.

Include this change when backporting.

Contributed by PJ Fanning.
2023-01-16 15:48:15 +00:00
huangxiaoping
f5e9901e6d HADOOP-18591. Fix a typo in Trash (#5291)
Signed-off-by: Tao Li <tomscut@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit a90e424d9ff30a0510e7a29adc01ebdc7754a20e)
2023-01-12 21:22:25 +00:00
Chengbing Liu
af96e0f5b3 HDFS-16872. Fix log throttling by declaring LogThrottlingHelper as static members (#5246)
Co-authored-by: Chengbing Liu <liuchengbing@qiyi.com>
Signed-off-by: Erik Krogen <xkrogen@apache.org>

(cherry picked from commit 4cf304de4520bac5be265501fdb056114b4154f5)
2023-01-10 10:04:05 -08:00
PJ Fanning
29b6df563b
HADOOP-18575. Make XML transformer factory more lenient (#5224)
Due diligence followup to
HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940)

Contributed by P J Fanning
2022-12-18 12:26:11 +00:00
Chengbing Liu
bfc916e7b0 HADOOP-18567. LogThrottlingHelper: properly trigger dependent recorders in cases of infrequent logging (#5215)
Signed-off-by: Erik Krogen <xkrogen@apache.org>
Co-authored-by: Chengbing Liu <liuchengbing@qiyi.com>

(cherry picked from commit ca3526da9283500643479e784a779fb7898b6627)
2022-12-16 09:16:51 -08:00
Steve Loughran
65892a7759
HADOOP-18573. Improve error reporting on non-standard kerberos names (#5221)
The kerberos RPC does not declare any restriction on
characters used in kerberos names, though
implementations MAY be more restrictive.

If the kerberos controller supports use non-conventional
principal names *and the kerberos admin chooses to use them*
this can confuse some of the parsing.

The obvious solution is for the enterprise admins to "not do that"
as a lot of things break, bits of hadoop included.

Harden the hadoop code slightly so at least we fail more gracefully,
so people can then get in touch with their sysadmin and tell them
to stop it.
2022-12-15 11:44:12 +00:00
Mehakmeet Singh
1009d2560f
HADOOP-18574. Changing log level of IOStatistics increment to make the DEBUG logs less noisy (#5223)
Contributed by: Mehakmeet Singh
2022-12-15 11:43:49 +00:00
Steve Loughran
ba55f370a9
HADOOP-18526. Leak of S3AInstrumentation instances via hadoop Metrics references (#5144)
This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()

*  Makes sure instrumentation is closed when the FS is closed.
*  Uses a weak reference from metrics to instrumentation, so even
   if the FS wasn't closed (see HADOOP-18478), this back reference
   would not cause the S3AInstrumentation reference to be retained.
*  If S3AFileSystem is configured to log at TRACE it will log the
   calling stack of initialize(), so help identify where the
   instance is being created. This should help track down
   the cause of instance leakage.

Contributed by Steve Loughran.
2022-12-14 18:23:04 +00:00
Steve Loughran
654082773c
HADOOP-18183. s3a audit logs to publish range start/end of GET requests. (#5110)
The start and end of the range is set in a new audit param "rg",
e.g "?rg=100-200"

Contributed by Ankit Saurabh
2022-12-14 16:51:46 +00:00
Doroszlai, Attila
6202348502
HADOOP-18569. NFS Gateway may release buffer too early (#5207) (#5211)
(cherry picked from commit df4812df65d01889ba93bce1415e01461500208d)
2022-12-14 15:56:16 +01:00