Commit Graph

26146 Commits

Author SHA1 Message Date
Samrat
7eefdf8642
YARN-11195. Adding document to enable numa (#4501)
Contributed by Samrat Deb.
2022-06-28 17:46:43 +05:30
Ashutosh Gupta
a177232ebc
YARN-9822.TimelineCollectorWebService#putEntities blocked when ATSV2 HBase is down (#4492)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-28 09:32:07 +05:30
swamirishi
43112bd472
HADOOP-18306: Warnings should not be shown on cli console when linux user not present on client (#4474). Contributed by swamirishi. 2022-06-27 17:20:58 -07:00
Mehakmeet Singh
823f5ee0d4
HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state (#4331)
ABFS rename fails intermittently when the Storage-blob tracking
metadata is in an incomplete state. This surfaces as the error code
404 and an error message of "RenameDestinationParentPathNotFound"

To mitigate this issue, when a request fails with this response.
the ABFS client issues a HEAD call on the source file
and then retries the rename operation again

ABFS filesystem statistics track when this occurs with new counters
  rename_recovery
  metadata_incomplete_rename_failures
  rename_path_attempts

This is very rare occurrence and appears to be triggered under certain
heavy load conditions, just as with HADOOP-18163.

Contributed by Mehakmeet Singh.
2022-06-27 19:06:59 +01:00
Colm O hEigeartaigh
25f8bdcd21
HADOOP-18308 - Update to Apache LDAP API 2.0.x (#4477)
Update the dependencies of the LDAP libraries used for testing:

ldap-api.version = 2.0.0
apacheds.version = 2.0.0.AM26

Contributed by Colm O hEigeartaigh.
2022-06-27 11:15:18 +01:00
Ashutosh Gupta
b7edc6c60c
HDFS-16633. Fixing when Reserved Space For Replicas is not released on some cases (#4452)
* HDFS-16633.Reserved Space For Replicas is not released on some cases

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-24 18:35:00 +05:30
Ashutosh Gupta
734b6f19ad
YARN-9874.Remove unnecessary LevelDb write call in LeveldbConfigurationStore#confirmMutation (#4487)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-23 21:59:27 +05:30
Ashutosh Gupta
4abb2ba58c
YARN-10320.Replace FSDataInputStream#read with readFully in Log Aggregation (#4486)
* YARN-10320.Replace FSDataInputStream#read with readFully in Log Aggregation

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-23 21:58:32 +05:30
slfan1989
0af4bb3b42
YARN-11192. TestRouterWebServicesREST failing after YARN-9827. (#4484). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-06-23 13:21:36 +05:30
Ashutosh Gupta
dd819f7904
HADOOP-18271.Remove unused Imports in Hadoop Common project (#4392)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-23 12:30:28 +05:30
Igor Dvorzhak
77d1b194c7
HADOOP-18300. Upgrade Gson dependency to version 2.9.0 (#4454)
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2022-06-22 16:37:22 -07:00
Steve Loughran
e1842b2a74
HADOOP-18103. Add a high-performance vectored read API. (#4476)
This feature adds methods for ranged vectored read operations
in PositionedReadable.

All stream which implement that interface support the new API.

The default implementation reads each range in the vector
sequentially.

However, specific implementations may provide higher performance
versions. This is done in two places

* Local FileSystem/Checksum FileSystem
* The S3A client.

The S3A client first coalesces adjacent and "nearby" ranges
together, then fetches each range in separate HTTP GET requests,
executed in parallel. As such it delivers significant speedups
to applications reading separate blocks of data from the same
file, columnar data format libraries in particular.

This is the merge commit of the feature branch; the work is in

HADOOP-11867. Add a high-performance vectored read API.
HADOOP-18104. S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads.
HADOOP-18107. Adding scale test for vectored reads for large file
HADOOP-18105. Implement buffer pooling with weak references.
HADOOP-18106. Handle memory fragmentation in S3A Vectored IO.

Contributed By: Owen O'Malley and Mukund Thakur
2022-06-22 18:19:23 +01:00
Mukund Thakur
4d1f6f9b99 HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445)
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
0d49bd2004 HADOOP-18105 Implement buffer pooling with weak references (#4263)
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
1408dd89a7 HADOOP-18107 Adding scale test for vectored reads for large file (#4273)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
5db0f34e29 HADOOP-18104: S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads (#3964)
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
2daf0a814f HADOOP-11867. Add a high-performance vectored read API. (#3904)
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.

Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.

Contributed By: Owen O'Malley and Mukund Thakur
2022-06-22 17:29:32 +01:00
9uapaw
e6ecc4f3e4 YARN-11188. Only files belong to the first file controller are removed even if multiple log aggregation file controllers are configured. Contributed by Szilard Nemeth. 2022-06-22 14:40:20 +02:00
Steve Loughran
c9ddbd210c
MAPREDUCE-7391. TestLocalDistributedCacheManager failing after HADOOP-16202 (#4472)
Fixing a mockito-based test which broke when HADOOP-16202
changed the methods being invoked.

Contributed by Steve Loughran
2022-06-22 12:52:41 +01:00
Samrat
e8fd914c58
HDFS-16616. remove use of org.apache.hadoop.util.Sets (#4400)
Co-Authored by: Samrat Deb
2022-06-22 10:17:36 +05:30
Ashutosh Gupta
cbdabe9ec8
YARN-9971.YARN Native Service HttpProbe logs THIS_HOST in error messages (#4436)
* YARN-9971.YARN Native Service HttpProbe logs THIS_HOST in error messages

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-22 09:38:19 +05:30
Christian Bartolomäus
ef36457b53
MAPREDUCE-7389. Fix typo in description of property (#4440). Contributed by Christian Bartolomaus. 2022-06-21 19:24:11 +05:30
Viraj Jasani
7a1d811197
HDFS-16637. TestHDFSCLI#testAll consistently failing (#4466). Contributed by Viraj Jasani.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-06-21 13:42:43 +05:30
Szilard Nemeth
3a66348fda YARN-11185. Pending app metrics are increased doubly when a queue reaches its max-parallel-apps limit. Contributed by Andras Gyori 2022-06-20 15:03:58 +02:00
9uapaw
5d08ffa769 YARN-11182. Refactor TestAggregatedLogDeletionService: 2nd phase. Contributed by Szilard Nemeth. 2022-06-20 14:12:51 +02:00
Ashutosh Gupta
36c4be819f
MAPREDUCE-7369. Fixed MapReduce tasks timing out when spends more time on MultipleOutputs#close (#4247)
Contributed by Ravuri Sushma sree.

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-20 17:01:01 +09:00
slfan1989
10fc865d3c
MAPREDUCE-7387. Fix TestJHSSecurity#testDelegationToken AssertionError due to HDFS-16563 (#4428). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-06-20 12:14:04 +05:30
Samrat
477b67a335
HADOOP-18266. Using HashSet/ TreeSet Constructor for hadoop-common (#4365)
* HADOOP-18266. Using HashSet/ TreeSet Constructor for hadoop-common

Co-authored-by: Deb <dbsamrat@3c22fba1b03f.ant.amazon.com>
2022-06-20 12:11:04 +05:30
Ashutosh Gupta
efc2761d32
HDFS-16635.Fixed javadoc error in Java 11 (#4451)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-20 11:14:31 +05:30
Ashutosh Gupta
a77d52284f
HADOOP-18255. Fix fsdatainputstreambuilder.md reference to hadoop branch-3.3 (#4378)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-20 10:54:21 +05:30
Ashutosh Gupta
4f425b641c
YARN-9827.Fix Http Response code in GenericExceptionHandler (#4393)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>

Reviewed by Akira Ajisaka.
2022-06-20 10:44:47 +05:30
KevinWikant
cfceaebde6
HDFS-16064. Determine when to invalidate corrupt replicas based on number of usable replicas (#4410)
Co-authored-by: Kevin Wikant <wikak@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-06-20 11:20:24 +09:00
Viraj Jasani
cb0421095b
HDFS-16634. Dynamically adjust slow peer report size on JMX metrics (#4448)
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-20 09:21:00 +08:00
Viraj Jasani
e38e13be03
HADOOP-18288. Total requests and total requests per sec served by RPC servers (#4431)
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-18 12:17:20 +08:00
slfan1989
62e4476102
YARN-10122. Support signalToContainer API for Federation. (#4421) 2022-06-17 16:38:36 -07:00
zhengchenyu
80446dcd08
YARN-11172. Fix TestClientRMTokens#testDelegationToken introduced by HDFS-16563. (#4408)
Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.

Contributed by zhengchenyu
2022-06-17 19:49:36 +01:00
Steve Loughran
e199da3fae
HADOOP-17833. Improve Magic Committer performance (#3289)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
	created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.  	

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename. 

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.


Contributed by Steve Loughran.
2022-06-17 19:11:35 +01:00
Benjamin Teke
020201cb65 Queue filter in CS UI v1 does not work as expected. Contributed by Chengbing Liu. 2022-06-17 19:28:32 +02:00
xuzq
4893f00395
HDFS-16600. Fix deadlock of fine-grain lock for FsDatastImpl of DataNode. (#4367). Contributed by ZanderXu.
Reviewed-by: Mingxiang Li <liaiphag0@gmail.com>
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2022-06-17 22:05:33 +08:00
slfan1989
7bfff63774
HADOOP-18289. Remove WhiteBox in hadoop-kms module. (#4433)
Co-authored-by: slfan1989 <louj1988@@>
2022-06-17 09:13:16 +08:00
caozhiqiang
9e3fc40ecb
HDFS-16613. EC: Improve performance of decommissioning dn with many ec blocks (#4398) 2022-06-17 02:11:25 +08:00
jianghuazhu
6cbeae2e52
HDFS-16581.Print node status when executing printTopology. (#4321)
Reviewed-by: Viraj Jasani <vjasani@apache.org>
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-16 19:18:58 +08:00
André Fonseca
1b25851ad9
HADOOP-18159. Bump cos_api-bundle to 5.6.69 to update public-suffix-list.txt (#4444)
Bump cos_api-bundle to 5.6.69

All copies of httpclient, including shaded ones in libraries used
by the s3a, gs and cos cloud connectors, turn out to load their
TLD list from the same resource mozilla/public-suffix-list.txt 

Updating the hadoop-cos dependency ensures that its version 
of public-suffix-list.txt is up to date -and so the s3a connector 
able to talk to s3 resources if the cos-api-bundle JAR is where
the resource is loaded from.

Contributed by André Fonseca
2022-06-15 20:03:26 +01:00
章锡平
f8c7e67fdc
HDFS-16628 RBF: Correct target directory when move to trash for kerberos login user. (#4424). Contributed by Xiping Zhang. 2022-06-15 21:16:24 +08:00
Gautham B A
dc5460d525
YARN-11078. Set env vars in a cross platform compatible way (#4432)
* Maven runs the ember build script.
  The environment variable TMPDIR was
  set as per bash syntax.
* This failed on Windows since the
  Windows command prompt doesn't
  support bash syntax.
* We're now detecting the OS and
  setting a Maven property
  "emberBuildScript" in a cross
  platform compatible way.
2022-06-15 15:29:55 +05:30
Gautham B A
5a40224b53
HDFS-16469. Locate protoc-gen-hrpc across platforms (#4434)
* We use the TARGET_FILE CMake
  generator expression to get
  the location of the
  protoc-gen-hrpc CMake target.
2022-06-15 15:28:10 +05:30
9uapaw
75bc6cfced YARN-11176. Refactor TestAggregatedLogDeletionService. Contributed by Szilard Nemeth. 2022-06-14 16:14:44 +02:00
xuzq
d0715b1024
HDFS-16598. Fix DataNode FsDatasetImpl lock issue without GS checks. (#4366). Contributed by ZanderXu.
Reviewed-by: Mingxiang Li <liaiphag0@gmail.com>
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2022-06-14 21:48:03 +08:00
Steve Vaughan
bebf03a66e
HDFS-16625. Check assumption about PMDK availability (#4414)
Signed-off-by: Ashutosh Gupta <ashutosh.gupta@st.niituniversity.in>
Signed-off-by: stack <stack@apache.org>
2022-06-13 23:42:18 -07:00
9uapaw
c9a174a260 YARN-11175. Refactor LogAggregationFileControllerFactory 2022-06-13 13:58:13 +02:00