Commit Graph

504 Commits

Author SHA1 Message Date
zhtttylz
daafc8a0b8
HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735) Contributed by Hualong Zhang.
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-04-27 20:36:11 +08:00
Steve Loughran
87fb977777
HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604
Clarifies behaviour of VectorIO methods with contract tests as well as
specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.
+ ADDENDUM: test fix in ITestS3AContractVectoredRead

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback
implementation

Contributed by Steve Loughran

Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432
2024-04-03 13:17:52 +01:00
Steve Loughran
b4f9d8e6fa
Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently."
This reverts commit ba7faf90c8.
2024-04-03 13:15:05 +01:00
Steve Loughran
ba7faf90c8
HADOOP-19098. Vector IO: Specify and validate ranges consistently.
Clarifies behaviour of VectorIO methods with contract tests as well as specification.

* Add precondition range checks to all implementations
* Identify and fix bug where direct buffer reads was broken
  (HADOOP-19101; this surfaced in ABFS contract tests)
* Logging in VectoredReadUtils.
* TestVectoredReadUtils verifies validation logic.
* FileRangeImpl toString() improvements
* CombinedFileRange tracks bytes in range which are wanted;
   toString() output logs this.

HDFS
* Add test TestHDFSContractVectoredRead

ABFS
* Add test ITestAbfsFileSystemContractVectoredRead

S3A
* checks for vector IO being stopped in all iterative
  vector operations, including draining
* maps read() returning -1 to failure
* passes in file length to validation
* Error reporting to only completeExceptionally() those ranges
  which had not yet read data in.
* Improved logging.  

readVectored()
* made synchronized. This is only for the invocation;
  the actual async retrieves are unsynchronized.
* closes input stream on invocation
* switches to random IO, so avoids keeping any long-lived connection around.

+ AbstractSTestS3AHugeFiles enhancements.

Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation

Contributed by Steve Loughran
2024-04-02 20:16:38 +01:00
slfan1989
ff3f2255d2
HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640) Contributed by Shilun Fan.
Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-03-19 20:08:03 +08:00
hfutatzhanghb
7012986fc3
HDFS-17345. Add a metrics to record block report generating cost time. (#6475). Contributed by farmmamba.
Reviewed-by:  Shuyan Zhang <zhangshuyan@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2024-03-06 16:59:00 +08:00
DieterDP
be13e94843
HADOOP-18987. Various fixes to FileSystem API docs (#6292)
Contributed by Dieter De Paepe
2024-02-02 11:49:31 +00:00
LiuGuH
5f9932acc4
HDFS-17325. Fix the documentation of fs expunge command in FileSystemShell.md. (#6413) Contributed by liuguanghua.
Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2024-01-05 18:42:55 +08:00
Lei Yang
661c784662
HDFS-17290: Adds disconnected client rpc backoff metrics (#6359) 2024-01-04 20:24:10 -08:00
huangzhaobo
e26139beaa
HDFS-17301. Add read and write dataXceiver threads count metrics to datanode. (#6377)
Reviewed-by: hfutatzhanghb <hfutzhanghb@163.com>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2023-12-29 12:43:46 +09:00
hfutatzhanghb
e91daae318
HDFS-17152. Fix the documentation of count command in FileSystemShell.md. (#5939). Contributed by farmmamba.
Reviewed-by: Shilun Fan <slfan1989@apache.org>
Signed-off-by:  Shuyan Zhang <zhangshuyan@apache.org>
2023-12-11 16:53:37 +08:00
caozhiqiang
37d6cada14
HDFS-17272. NNThroughputBenchmark should support specifying the base directory for multi-client test (#6319). Contributed by caozhiqiang.
Reviewed-by: Tao Li <tomscut@apache.org>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-12-10 13:43:04 +05:30
zhangshuyan
809ae58e71
HADOOP-18982. Fix doc about loading native libraries. (#6281). Contributed by Shuyan Zhang.
Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>
2023-12-06 21:24:14 +08:00
Tom
f58945d7d1
HDFS-16791. Add getEnclosingRoot() API to filesystem interface and implementations (#6198)
The enclosing root path is a common ancestor that should be used for temp and staging dirs
as well as within encryption zones and other restricted directories.

Contributed by Tom McCormick
2023-11-08 14:25:21 +00:00
Steve Loughran
7ec636deec
HADOOP-18930. Make fs.s3a.create.performance a bucket-wide setting. (#6168)
If fs.s3a.create.performance is set on a bucket
- All file overwrite checks are skipped, even if the caller says
  otherwise.
- All directory existence checks are skipped.
- Marker deletion is *always* skipped.

This eliminates a HEAD and a LIST for every creation.

* New path capability "fs.s3a.create.performance.enabled" true
  if the option is enabled.
* Parameterize ITestS3AContractCreate to expect the different
  outcomes
* Parameterize ITestCreateFileCost similarly, with
  changed cost assertions there.
* create(/) raises an IOE. existing bug only noticed here.

Contributed by Steve Loughran
2023-10-27 12:23:55 +01:00
huangzhaobo
daa78adc88
HDFS-17200. Add some datanode related metrics to Metrics.md. (#6099). Contributed by huangzhaobo99
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-10-06 12:40:44 +05:30
huhaiyang
2831c7ce26
HADOOP-18880. Add some rpc related metrics to Metrics.md (#6015) Contributed by Yanghai Hu.
Reviewed-by: Inigo Goiri <inigoiri@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
2023-09-05 17:34:05 +08:00
Wei-Chiu Chuang
e239d40ab1 Post release update
* Add jdiff xml files from 3.3.6 release.
* Declare 3.3.6 as the latest stable release.
* Copy release notes.

(cherry picked from commit 7db9895000)
(cherry picked from commit cc121e2124aa01458dc296a060edc5e21a295268)
2023-06-26 16:08:24 +00:00
Xing Lin
427366b73b
HDFS-17042 Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode (#5730) 2023-06-15 13:59:58 -07:00
Steve Loughran
e76c09ac3b
HADOOP-18724. Open file fails with NumberFormatException for S3AFileSystem (#5611)
This:

1. Adds optLong, optDouble, mustLong and mustDouble
   methods to the FSBuilder interface to let callers explicitly
   passin long and double arguments.
2. The opt() and must() builder calls which take float/double values
   now only set long values instead, so as to avoid problems
   related to overloaded methods resulting in a ".0" being appended
   to a long value.
3. All of the relevant opt/must calls in the hadoop codebase move to
   the new methods
4. And the s3a code is resilient to parse errors in is numeric options
   -it will downgrade to the default.

This is nominally incompatible, but the floating-point builder methods
were never used: nothing currently expects floating point numbers.

For anyone who wants to safely set numeric builder options across all compatible
releases, convert the number to a string and then use the opt(String, String)
and must(String, String) methods.

Contributed by Steve Loughran
2023-05-11 17:57:25 +01:00
Tak Lon (Stephen) Wu
0e46388474
HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553)
The HDFS lease APIs have been replicated as interfaces in hadoop-common so other filesystems can
also implement them.  Applications which use the leasing APIs should migrate to the new
interface where possible.

Contributed by Stephen Wu
2023-05-03 11:05:55 +01:00
Sebastian Baunsgaard
6aac6cb212
HADOOP-18660. Filesystem Spelling Mistake (#5475). Contributed by Sebastian Baunsgaard.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-25 21:44:04 +05:30
Nikita Eshkeev
d07356e60e
HADOOP-18597. Simplify single node instructions for creating directories for Map Reduce. (#5305)
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-20 16:12:44 +05:30
Steve Loughran
405ed1dde6
HADOOP-18470. Hadoop 3.3.5 release wrap-up (#5558)
Post-release updates of the branches

* Add jdiff xml files from 3.3.5 release.
* Declare 3.3.5 as the latest stable release.
* Copy release notes.
2023-04-18 10:12:07 +01:00
Viraj Jasani
b4bcbb9515
HDFS-16959. RBF: State store cache loading metrics (#5497) 2023-03-29 10:43:13 -07:00
rdingankar
0ca5686034
HDFS-16917 Add transfer rate quantile metrics for DataNode reads (#5397)
Co-authored-by: Ravindra Dingankar <rdingankar@linkedin.com>
2023-02-27 18:26:32 +00:00
Arnout Engelen
02fd87a4d8
HADOOP-18627. Add stronger wording in 'secure mode' introduction (#5406)
Make it more clear that when deploying Hadoop 'secure mode' is generally not optional.

Contributed by Arnout Engelen
2023-02-17 16:30:41 +00:00
Steve Loughran
d56977e909
HADOOP-18470. More in the 3.3.5 index.html about security (#5383)
Expands on the comments in cluster config to tell people
they shouldn't be running a cluster without a private VLAN
in cloud, that Knox is good here, and unsecured clusters
without a VLAN are just computation-as-a-service to crypto miners

Contributed by Steve Loughran
2023-02-14 17:22:59 +00:00
Nikita Eshkeev
4de31123ce
Fix "the the" and friends typos (#5267)
Signed-off-by: Nikita Eshkeev <neshkeev@yandex.ru>
2023-01-17 03:33:59 +08:00
Steve Loughran
84b33b897c
HADOOP-18470. index.md update for 3.3.5 release 2022-12-05 16:13:24 +00:00
GuoPhilipse
069bd973d8
HADOOP-18532. Update command usage in FileSystemShell.md (#5141)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-11-21 15:55:46 +09:00
Ashutosh Gupta
a48e8c9beb
MAPREDUCE-5608. Replace and deprecate mapred.tasktracker.indexcache.mb (#5014)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-11-14 11:07:40 +09:00
Steve Loughran
38b2ed2151
HADOOP-18442. Remove openstack support (#4855)
Contributed by Steve Loughran
2022-10-06 11:49:38 +01:00
Ayush Saxena
cc41ad63f9
HADOOP-18388. Allow dynamic groupSearchFilter in LdapGroupsMapping. (#4798)
* HADOOP-18388. Allow dynamic groupSearchFilter in LdapGroupsMapping.
2022-09-06 18:38:51 -04:00
Mukund Thakur
231e095802
HADOOP-18407. Improve readVectored() api spec (#4760)
part of HADOOP-18103.

Contributed By: Mukund Thakur
2022-08-22 23:19:29 +05:30
Steve Loughran
62dbefd8f2
HADOOP-18305. Release Hadoop 3.3.4: upstream changelog and jdiff files
Add the r3.3.4 changelog, release notes and jdiff xml files.
2022-08-05 14:06:22 +01:00
Masatake Iwasaki
3cce41a1f6 Make upstream aware of 3.2.4 release.
(cherry picked from commit e1637a57dfd41385dbce5de90620c48a45abb263)
2022-07-22 02:27:19 +00:00
Mukund Thakur
4d1f6f9b99 HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445)
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
5db0f34e29 HADOOP-18104: S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads (#3964)
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.

Contributed By: Mukund Thakur
2022-06-22 17:29:32 +01:00
Mukund Thakur
2daf0a814f HADOOP-11867. Add a high-performance vectored read API. (#3904)
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.

Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.

Contributed By: Owen O'Malley and Mukund Thakur
2022-06-22 17:29:32 +01:00
Ashutosh Gupta
a77d52284f
HADOOP-18255. Fix fsdatainputstreambuilder.md reference to hadoop branch-3.3 (#4378)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-20 10:54:21 +05:30
Viraj Jasani
e38e13be03
HADOOP-18288. Total requests and total requests per sec served by RPC servers (#4431)
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Tao Li <tomscut@apache.org>
2022-06-18 12:17:20 +08:00
Steve Loughran
e199da3fae
HADOOP-17833. Improve Magic Committer performance (#3289)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
	created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.  	

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename. 

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.


Contributed by Steve Loughran.
2022-06-17 19:11:35 +01:00
Masatake Iwasaki
3228142f53 Make upstream aware of 2.10.2 release.
(cherry picked from commit 3dcb6367edd44c04a5ae0d195d851abac80d10a0)

Conflicts:
	hadoop-project-dist/pom.xml
2022-06-01 00:53:01 +09:00
Steve Loughran
6f261ed4a2
HADOOP-18198. Release 3.3.3: release notes and jdiff files.
* Add the changelog and release notes
* add all jdiff XML files
* update the project pom with the new stable version

Change-Id: Iaea846c3e451bbd446b45de146845a48953d580d
2022-05-17 19:00:54 +01:00
Ashutosh Gupta
40a8b9a6a5
HADOOP-17479. Fix the examples of hadoop config prefix (#4197)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-05-08 08:09:24 +09:00
Steve Loughran
1b4dba99b5
HADOOP-16202. Enhanced openFile(): hadoop-common changes. (#2584/1)
This defines standard option and values for the
openFile() builder API for opening a file:

fs.option.openfile.read.policy
 A list of the desired read policy, in preferred order.
 standard values are
 adaptive, default, random, sequential, vector, whole-file

fs.option.openfile.length
 How long the file is.

fs.option.openfile.split.start
 start of a task's split

fs.option.openfile.split.end
 end of a task's split

These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data

The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.

Contributed by Steve Loughran.

Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
2022-04-24 17:33:04 +01:00
Ashutosh Gupta
f84b88dd6b
HADOOP-17564. Fix typo in UnixShellGuide.html (#4195)
contributed by Ashutosh Gupta
2022-04-22 17:59:41 +01:00
Renukaprasad C
4ff8a5dc73
HDFS-16526. Addendum Add metrics for slow DataNode (#4191) 2022-04-20 18:57:43 +05:30
Renukaprasad C
f14f305051
HDFS-16526. Add metrics for slow DataNode (#4162) 2022-04-15 21:37:05 +05:30