hadoop

Author	SHA1	Message	Date
Ayush Saxena	28538d628e	HADOOP-19164. Hadoop CLI MiniCluster is broken (#7050 ). Contributed by Ayush Saxena. Reviewed-by: Vinayakumar B <vinayakumarb@apache.org>	2024-09-21 21:26:51 +05:30
Steve Loughran	55a576906d	HADOOP-19131. Assist reflection IO with WrappedOperations class (#6686 ) 1. The class WrappedIO has been extended with more filesystem operations - openFile() - PathCapabilities - StreamCapabilities - ByteBufferPositionedReadable All these static methods raise UncheckedIOExceptions rather than checked ones. 2. The adjacent class org.apache.hadoop.io.wrappedio.WrappedStatistics provides similar access to IOStatistics/IOStatisticsContext classes and operations. Allows callers to: * Get a serializable IOStatisticsSnapshot from an IOStatisticsSource or IOStatistics instance * Save an IOStatisticsSnapshot to file * Convert an IOStatisticsSnapshot to JSON * Given an object which may be an IOStatisticsSource, return an object whose toString() value is a dynamically generated, human readable summary. This is for logging. * Separate getters to the different sections of IOStatistics. * Mean values are returned as a Map.Pair<Long, Long> of (samples, sum) from which means may be calculated. There are examples of the dynamic bindings to these classes in: org.apache.hadoop.io.wrappedio.impl.DynamicWrappedIO org.apache.hadoop.io.wrappedio.impl.DynamicWrappedStatistics These use DynMethods and other classes in the package org.apache.hadoop.util.dynamic which are based on the Apache Parquet equivalents. This makes re-implementing these in that library and others which their own fork of the classes (example: Apache Iceberg) 3. The openFile() option "fs.option.openfile.read.policy" has added specific file format policies for the core filetypes * avro * columnar * csv * hbase * json * orc * parquet S3A chooses the appropriate sequential/random policy as a A policy `parquet, columnar, vector, random, adaptive` will use the parquet policy for any filesystem aware of it, falling back to the first entry in the list which the specific version of the filesystem recognizes 4. New Path capability fs.capability.virtual.block.locations Indicates that locations are generated client side and don't refer to real hosts. Contributed by Steve Loughran	2024-08-14 14:43:00 +01:00
Viraj Jasani	321a6cc55e	HADOOP-19072. S3A: expand optimisations on stores with "fs.s3a.performance.flags" for mkdir (#6543 ) If the flag list in fs.s3a.performance.flags includes "mkdir" then the safety check of a walk up the tree to look for a parent directory, -done to verify a directory isn't being created under a file- are skipped. This saves the cost of multiple list operations. Contributed by Viraj Jasani	2024-08-08 17:48:51 +01:00
gavin.wang	783a852029	HDFS-17555. Fix NumberFormatException of NNThroughputBenchmark when configured dfs.blocksize. (#6894 ). Contributed by wangzhongwei Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2024-07-09 13:52:15 +05:30
Steve Loughran	56c8aa5f1c	HADOOP-19204. VectorIO regression: empty ranges are now rejected (#6887 ) - restore old outcome: no-op - test this - update spec This is a critical fix for vector IO and MUST be cherrypicked to all branches with that feature Contributed by Steve Loughran	2024-06-19 12:05:24 +01:00
Fateh Singh	90024d8cb1	HDFS-17439. Support -nonSuperUser for NNThroughputBenchmark: useful for testing auth frameworks such as Ranger (#6677 )	2024-06-18 13:52:24 +01:00
Cheng Pan	2bde5ccb81	HADOOP-19192. Log level is WARN when fail to load native hadoop libs (#6863 ) Updates the documentation to be consistent with the logging. Contributed by Cheng Pan	2024-06-14 19:05:27 +01:00
Mukund Thakur	06dd3bfee8	HADOOP-19196. Allow base path to be deleted as well using Bulk Delete. (#6872 ) Contributed by: Mukund Thakur	2024-06-11 14:06:53 -05:00
Mukund Thakur	47be1ab3b6	HADOOP-18679. Add API for bulk/paged delete of files (#6726 ) Applications can create a BulkDelete instance from a BulkDeleteSource; the BulkDelete interface provides the pageSize(): the maximum number of entries which can be deleted, and a bulkDelete(Collection paths) method which can take a collection up to pageSize() long. This is optimized for object stores with bulk delete APIs; the S3A connector will offer the page size of fs.s3a.bulk.delete.page.size unless bulk delete has been disabled. Even with a page size of 1, the S3A implementation is more efficient than delete(path) as there are no safety checks for the path being a directory or probes for the need to recreate directories. The interface BulkDeleteSource is implemented by all FileSystem implementations, with a page size of 1 and mapped to delete(pathToDelete, false). This means that callers do not need to have special case handling for object stores versus classic filesystems. To aid use through reflection APIs, the class org.apache.hadoop.io.wrappedio.WrappedIO has been created with "reflection friendly" methods. Contributed by Mukund Thakur and Steve Loughran	2024-05-20 17:05:25 +01:00
Felix Nguyen	fb0519253d	HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759 )	2024-05-11 15:37:43 +08:00
zhtttylz	daafc8a0b8	HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735 ) Contributed by Hualong Zhang. Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-04-27 20:36:11 +08:00
Steve Loughran	87fb977777	HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604 Clarifies behaviour of VectorIO methods with contract tests as well as specification. * Add precondition range checks to all implementations * Identify and fix bug where direct buffer reads was broken (HADOOP-19101; this surfaced in ABFS contract tests) * Logging in VectoredReadUtils. * TestVectoredReadUtils verifies validation logic. * FileRangeImpl toString() improvements * CombinedFileRange tracks bytes in range which are wanted; toString() output logs this. HDFS * Add test TestHDFSContractVectoredRead ABFS * Add test ITestAbfsFileSystemContractVectoredRead S3A * checks for vector IO being stopped in all iterative vector operations, including draining * maps read() returning -1 to failure * passes in file length to validation * Error reporting to only completeExceptionally() those ranges which had not yet read data in. * Improved logging. readVectored() * made synchronized. This is only for the invocation; the actual async retrieves are unsynchronized. * closes input stream on invocation * switches to random IO, so avoids keeping any long-lived connection around. + AbstractSTestS3AHugeFiles enhancements. + ADDENDUM: test fix in ITestS3AContractVectoredRead Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation Contributed by Steve Loughran Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432	2024-04-03 13:17:52 +01:00
Steve Loughran	b4f9d8e6fa	Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently." This reverts commit `ba7faf90c8`.	2024-04-03 13:15:05 +01:00
Steve Loughran	ba7faf90c8	HADOOP-19098. Vector IO: Specify and validate ranges consistently. Clarifies behaviour of VectorIO methods with contract tests as well as specification. * Add precondition range checks to all implementations * Identify and fix bug where direct buffer reads was broken (HADOOP-19101; this surfaced in ABFS contract tests) * Logging in VectoredReadUtils. * TestVectoredReadUtils verifies validation logic. * FileRangeImpl toString() improvements * CombinedFileRange tracks bytes in range which are wanted; toString() output logs this. HDFS * Add test TestHDFSContractVectoredRead ABFS * Add test ITestAbfsFileSystemContractVectoredRead S3A * checks for vector IO being stopped in all iterative vector operations, including draining * maps read() returning -1 to failure * passes in file length to validation * Error reporting to only completeExceptionally() those ranges which had not yet read data in. * Improved logging. readVectored() * made synchronized. This is only for the invocation; the actual async retrieves are unsynchronized. * closes input stream on invocation * switches to random IO, so avoids keeping any long-lived connection around. + AbstractSTestS3AHugeFiles enhancements. Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation Contributed by Steve Loughran	2024-04-02 20:16:38 +01:00
slfan1989	ff3f2255d2	HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640 ) Contributed by Shilun Fan. Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-03-19 20:08:03 +08:00
hfutatzhanghb	7012986fc3	HDFS-17345. Add a metrics to record block report generating cost time. (#6475 ). Contributed by farmmamba. Reviewed-by: Shuyan Zhang <zhangshuyan@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>	2024-03-06 16:59:00 +08:00
DieterDP	be13e94843	HADOOP-18987. Various fixes to FileSystem API docs (#6292 ) Contributed by Dieter De Paepe	2024-02-02 11:49:31 +00:00
LiuGuH	5f9932acc4	HDFS-17325. Fix the documentation of fs expunge command in FileSystemShell.md. (#6413 ) Contributed by liuguanghua. Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-01-05 18:42:55 +08:00
Lei Yang	661c784662	HDFS-17290: Adds disconnected client rpc backoff metrics (#6359 )	2024-01-04 20:24:10 -08:00
huangzhaobo	e26139beaa	HDFS-17301. Add read and write dataXceiver threads count metrics to datanode. (#6377 ) Reviewed-by: hfutatzhanghb <hfutzhanghb@163.com> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>	2023-12-29 12:43:46 +09:00
hfutatzhanghb	e91daae318	HDFS-17152. Fix the documentation of count command in FileSystemShell.md. (#5939 ). Contributed by farmmamba. Reviewed-by: Shilun Fan <slfan1989@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>	2023-12-11 16:53:37 +08:00
caozhiqiang	37d6cada14	HDFS-17272. NNThroughputBenchmark should support specifying the base directory for multi-client test (#6319 ). Contributed by caozhiqiang. Reviewed-by: Tao Li <tomscut@apache.org> Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-12-10 13:43:04 +05:30
zhangshuyan	809ae58e71	HADOOP-18982. Fix doc about loading native libraries. (#6281 ). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2023-12-06 21:24:14 +08:00
Tom	f58945d7d1	HDFS-16791. Add getEnclosingRoot() API to filesystem interface and implementations (#6198 ) The enclosing root path is a common ancestor that should be used for temp and staging dirs as well as within encryption zones and other restricted directories. Contributed by Tom McCormick	2023-11-08 14:25:21 +00:00
Steve Loughran	7ec636deec	HADOOP-18930. Make fs.s3a.create.performance a bucket-wide setting. (#6168 ) If fs.s3a.create.performance is set on a bucket - All file overwrite checks are skipped, even if the caller says otherwise. - All directory existence checks are skipped. - Marker deletion is always skipped. This eliminates a HEAD and a LIST for every creation. * New path capability "fs.s3a.create.performance.enabled" true if the option is enabled. * Parameterize ITestS3AContractCreate to expect the different outcomes * Parameterize ITestCreateFileCost similarly, with changed cost assertions there. * create(/) raises an IOE. existing bug only noticed here. Contributed by Steve Loughran	2023-10-27 12:23:55 +01:00
huangzhaobo	daa78adc88	HDFS-17200. Add some datanode related metrics to Metrics.md. (#6099 ). Contributed by huangzhaobo99 Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-10-06 12:40:44 +05:30
huhaiyang	2831c7ce26	HADOOP-18880. Add some rpc related metrics to Metrics.md (#6015 ) Contributed by Yanghai Hu. Reviewed-by: Inigo Goiri <inigoiri@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2023-09-05 17:34:05 +08:00
Wei-Chiu Chuang	e239d40ab1	Post release update * Add jdiff xml files from 3.3.6 release. * Declare 3.3.6 as the latest stable release. * Copy release notes. (cherry picked from commit `7db9895000`) (cherry picked from commit cc121e2124aa01458dc296a060edc5e21a295268)	2023-06-26 16:08:24 +00:00
Xing Lin	427366b73b	HDFS-17042 Add rpcCallSuccesses and OverallRpcProcessingTime to RpcMetrics for Namenode (#5730 )	2023-06-15 13:59:58 -07:00
Steve Loughran	e76c09ac3b	HADOOP-18724. Open file fails with NumberFormatException for S3AFileSystem (#5611 ) This: 1. Adds optLong, optDouble, mustLong and mustDouble methods to the FSBuilder interface to let callers explicitly passin long and double arguments. 2. The opt() and must() builder calls which take float/double values now only set long values instead, so as to avoid problems related to overloaded methods resulting in a ".0" being appended to a long value. 3. All of the relevant opt/must calls in the hadoop codebase move to the new methods 4. And the s3a code is resilient to parse errors in is numeric options -it will downgrade to the default. This is nominally incompatible, but the floating-point builder methods were never used: nothing currently expects floating point numbers. For anyone who wants to safely set numeric builder options across all compatible releases, convert the number to a string and then use the opt(String, String) and must(String, String) methods. Contributed by Steve Loughran	2023-05-11 17:57:25 +01:00
Tak Lon (Stephen) Wu	0e46388474	HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553 ) The HDFS lease APIs have been replicated as interfaces in hadoop-common so other filesystems can also implement them. Applications which use the leasing APIs should migrate to the new interface where possible. Contributed by Stephen Wu	2023-05-03 11:05:55 +01:00
Sebastian Baunsgaard	6aac6cb212	HADOOP-18660. Filesystem Spelling Mistake (#5475 ). Contributed by Sebastian Baunsgaard. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-04-25 21:44:04 +05:30
Nikita Eshkeev	d07356e60e	HADOOP-18597. Simplify single node instructions for creating directories for Map Reduce. (#5305 ) Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-04-20 16:12:44 +05:30
Steve Loughran	405ed1dde6	HADOOP-18470. Hadoop 3.3.5 release wrap-up (#5558 ) Post-release updates of the branches * Add jdiff xml files from 3.3.5 release. * Declare 3.3.5 as the latest stable release. * Copy release notes.	2023-04-18 10:12:07 +01:00
Viraj Jasani	b4bcbb9515	HDFS-16959. RBF: State store cache loading metrics (#5497 )	2023-03-29 10:43:13 -07:00
rdingankar	0ca5686034	HDFS-16917 Add transfer rate quantile metrics for DataNode reads (#5397 ) Co-authored-by: Ravindra Dingankar <rdingankar@linkedin.com>	2023-02-27 18:26:32 +00:00
Arnout Engelen	02fd87a4d8	HADOOP-18627. Add stronger wording in 'secure mode' introduction (#5406 ) Make it more clear that when deploying Hadoop 'secure mode' is generally not optional. Contributed by Arnout Engelen	2023-02-17 16:30:41 +00:00
Steve Loughran	d56977e909	HADOOP-18470. More in the 3.3.5 index.html about security (#5383 ) Expands on the comments in cluster config to tell people they shouldn't be running a cluster without a private VLAN in cloud, that Knox is good here, and unsecured clusters without a VLAN are just computation-as-a-service to crypto miners Contributed by Steve Loughran	2023-02-14 17:22:59 +00:00
Nikita Eshkeev	4de31123ce	Fix "the the" and friends typos (#5267 ) Signed-off-by: Nikita Eshkeev <neshkeev@yandex.ru>	2023-01-17 03:33:59 +08:00
Steve Loughran	84b33b897c	HADOOP-18470. index.md update for 3.3.5 release	2022-12-05 16:13:24 +00:00
GuoPhilipse	069bd973d8	HADOOP-18532. Update command usage in FileSystemShell.md (#5141 ) Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-11-21 15:55:46 +09:00
Ashutosh Gupta	a48e8c9beb	MAPREDUCE-5608. Replace and deprecate mapred.tasktracker.indexcache.mb (#5014 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-11-14 11:07:40 +09:00
Steve Loughran	38b2ed2151	HADOOP-18442. Remove openstack support (#4855 ) Contributed by Steve Loughran	2022-10-06 11:49:38 +01:00
Ayush Saxena	cc41ad63f9	HADOOP-18388. Allow dynamic groupSearchFilter in LdapGroupsMapping. (#4798 ) * HADOOP-18388. Allow dynamic groupSearchFilter in LdapGroupsMapping.	2022-09-06 18:38:51 -04:00
Mukund Thakur	231e095802	HADOOP-18407. Improve readVectored() api spec (#4760 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-22 23:19:29 +05:30
Steve Loughran	62dbefd8f2	HADOOP-18305. Release Hadoop 3.3.4: upstream changelog and jdiff files Add the r3.3.4 changelog, release notes and jdiff xml files.	2022-08-05 14:06:22 +01:00
Masatake Iwasaki	3cce41a1f6	Make upstream aware of 3.2.4 release. (cherry picked from commit e1637a57dfd41385dbce5de90620c48a45abb263)	2022-07-22 02:27:19 +00:00
Mukund Thakur	4d1f6f9b99	HADOOP-18106: Handle memory fragmentation in S3A Vectored IO. (#4445 ) part of HADOOP-18103. Handling memory fragmentation in S3A vectored IO implementation by allocating smaller user range requested size buffers and directly filling them from the remote S3 stream and skipping undesired data in between ranges. This patch also adds aborting active vectored reads when stream is closed or unbuffer() is called. Contributed By: Mukund Thakur	2022-06-22 17:29:32 +01:00
Mukund Thakur	5db0f34e29	HADOOP-18104: S3A: Add configs to configure minSeekForVectorReads and maxReadSizeForVectorReads (#3964 ) Part of HADOOP-18103. Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size to configure min seek and max read during a vectored IO operation in S3A connector. These properties actually define how the ranges will be merged. To completely disable merging set fs.s3a.max.readsize.vectored.read to 0. Contributed By: Mukund Thakur	2022-06-22 17:29:32 +01:00
Mukund Thakur	2daf0a814f	HADOOP-11867. Add a high-performance vectored read API. (#3904 ) part of HADOOP-18103. Add support for multiple ranged vectored read api in PositionedReadable. The default iterates through the ranges to read each synchronously, but the intent is that FSDataInputStream subclasses can make more efficient readers especially in object stores implementation. Also added implementation in S3A where smaller ranges are merged and sliced byte buffers are returned to the readers. All the merged ranged are fetched from S3 asynchronously. Contributed By: Owen O'Malley and Mukund Thakur	2022-06-22 17:29:32 +01:00

1 2 3 4 5 ...

514 Commits