hadoop

Author	SHA1	Message	Date
Cheng Pan	2bde5ccb81	HADOOP-19192. Log level is WARN when fail to load native hadoop libs (#6863 ) Updates the documentation to be consistent with the logging. Contributed by Cheng Pan	2024-06-14 19:05:27 +01:00
Mukund Thakur	06dd3bfee8	HADOOP-19196. Allow base path to be deleted as well using Bulk Delete. (#6872 ) Contributed by: Mukund Thakur	2024-06-11 14:06:53 -05:00
PJ Fanning	bb30545583	HADOOP-19163. Use hadoop-shaded-protobuf_3_25 (#6858 ) Contributed by PJ Fanning	2024-06-11 17:10:00 +01:00
Yu Zhang	f1e2ceb823	HDFS-13603: Do not propagate ExecutionException while initializing EDEK queues for keys. (#6860 )	2024-06-03 09:10:06 -07:00
Steve Loughran	d00b3acd5e	HADOOP-18679. Followup: change method name case (#6854 ) WrappedIO.bulkDelete_PageSize() => bulkDelete_pageSize() Makes it consistent with the HADOOP-19131 naming scheme. The name needs to be fixed before invoking it through reflection, as once that is attempted the binding won't work at run time, though compilation will be happy. Contributed by Steve Loughran	2024-05-30 19:34:30 +01:00
Mukund Thakur	d107931fc7	HADOOP-19188. Fix TestHarFileSystem and TestFilterFileSystem failing after bulk delete API got added. (#6848 ) Follow up to: HADOOP-18679 Add API for bulk/paged delete of files and objects Contributed by Mukund Thakur	2024-05-29 17:27:09 +01:00
刘斌	6c08e8e2aa	HADOOP-19156. ZooKeeper based state stores use different ZK address configs. (#6767 ). Contributed by liu bin. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2024-05-29 20:44:36 +08:00
Sebb	f11a8cfa6e	HADOOP-13147. Constructors must not call overrideable methods in PureJavaCrc32C (#6408 ). Contributed by Sebb.	2024-05-21 00:08:08 +05:30
Mukund Thakur	47be1ab3b6	HADOOP-18679. Add API for bulk/paged delete of files (#6726 ) Applications can create a BulkDelete instance from a BulkDeleteSource; the BulkDelete interface provides the pageSize(): the maximum number of entries which can be deleted, and a bulkDelete(Collection paths) method which can take a collection up to pageSize() long. This is optimized for object stores with bulk delete APIs; the S3A connector will offer the page size of fs.s3a.bulk.delete.page.size unless bulk delete has been disabled. Even with a page size of 1, the S3A implementation is more efficient than delete(path) as there are no safety checks for the path being a directory or probes for the need to recreate directories. The interface BulkDeleteSource is implemented by all FileSystem implementations, with a page size of 1 and mapped to delete(pathToDelete, false). This means that callers do not need to have special case handling for object stores versus classic filesystems. To aid use through reflection APIs, the class org.apache.hadoop.io.wrappedio.WrappedIO has been created with "reflection friendly" methods. Contributed by Mukund Thakur and Steve Loughran	2024-05-20 17:05:25 +01:00
skyskyhu	3c00093cb5	HADOOP-19167 Bug Fix: Change of Codec configuration does not work (#6807 )	2024-05-17 10:27:39 +08:00
Vikas Kumar	f8dce6c501	HADOOP-18851. Performance improvement for DelegationTokenSecretManager (#6803 )	2024-05-16 12:30:52 +08:00
Christopher Tubbs	2e77b7b02c	[HADOOP-18786] Use CDN instead of ASF archive (#5789 ) * Use Yetus 0.14.1 from downloads.apache.org in yetus-wrapper * Use Maven 3.8.8 from downloads.apache.org in Win 10 Dockerfile * Point users to downloads.apache.org for JVSC * Use Solr 8.11.2 from downloads.apache.org in YARN Dockerfile Contributed by Christopher Tubbs	2024-05-14 20:09:52 +01:00
zhihui wang	39dee8ea19	HADOOP-18958. Improve UserGroupInformation debug log. (#6255 ) Contributed by zhihui wang	2024-05-14 20:03:49 +01:00
Tsz-Wo Nicholas Sze	bda7045070	HADOOP-19152. Do not hard code security providers. (#6739 )	2024-05-14 11:19:57 -07:00
zhengchenyu	4cb4d5dd08	HADOOP-19170. Fixes compilation issues on non-Linux systems (#6822 ) Reviewed-by: Steve Loughran <stevel@apache.org> Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>	2024-05-13 20:04:01 -07:00
Felix Nguyen	fb0519253d	HDFS-17488. DN can fail IBRs with NPE when a volume is removed (#6759 )	2024-05-11 15:37:43 +08:00
Sammi Chen	43e8ca428e	Revert "HADOOP-18851: Performance improvement for DelegationTokenSecretManager. (#6001 ). Contributed by Vikas Kumar." This reverts commit `e283375cdf`.	2024-05-07 13:29:32 +08:00
Doroszlai, Attila	2645898450	HADOOP-19160. hadoop-auth should not depend on kerb-simplekdc (#6788 )	2024-05-03 12:57:26 +02:00
Tsz-Wo Nicholas Sze	78987a71a6	HADOOP-19151. Support configurable SASL mechanism. (#6740 )	2024-04-29 10:02:23 -07:00
zhtttylz	daafc8a0b8	HDFS-17367. Add PercentUsed for Different StorageTypes in JMX (#6735 ) Contributed by Hualong Zhang. Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-04-27 20:36:11 +08:00
Pranav Saxena	6404692c09	HADOOP-19102. [ABFS] FooterReadBufferSize should not be greater than readBufferSize (#6617 ) Contributed by Pranav Saxena	2024-04-22 18:36:12 +01:00
zj619	922c44a339	HADOOP-19130. FTPFileSystem rename with full qualified path broken (#6678 ). Contributed by shawn Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2024-04-17 23:12:38 +05:30
PJ Fanning	d194ad0242	HADOOP-19079. HttpExceptionUtils to verify that loaded class is really an exception before instantiation (#6557 ) Security hardening + Adds new interceptAndValidateMessageContains() method in LambdaTestUtils to verify a list of strings can all be found in the toString() value of a raised exception Contributed by PJ Fanning	2024-04-11 19:38:15 +01:00
Gautham B A	f7bb4f1595	HADOOP-18135. Produce Windows binaries of Hadoop (#6673 ) This PR enables one to create the Hadoop release tarball on Windows, complete with the native binaries (including winutils.exe). This PR contains the following changes - * Prevents splitting during array element expansion - this is needed since we need to pass the arguments correctly to maven. * Install Python 3.11.8 and pip to the Windows docker image for building Hadoop. * pom file changes to get maven to invoke the releasedocmaker script through bash.exe on Windows.	2024-04-09 22:15:05 +05:30
Steve Loughran	87fb977777	HADOOP-19098. Vector IO: Specify and validate ranges consistently. #6604 Clarifies behaviour of VectorIO methods with contract tests as well as specification. * Add precondition range checks to all implementations * Identify and fix bug where direct buffer reads was broken (HADOOP-19101; this surfaced in ABFS contract tests) * Logging in VectoredReadUtils. * TestVectoredReadUtils verifies validation logic. * FileRangeImpl toString() improvements * CombinedFileRange tracks bytes in range which are wanted; toString() output logs this. HDFS * Add test TestHDFSContractVectoredRead ABFS * Add test ITestAbfsFileSystemContractVectoredRead S3A * checks for vector IO being stopped in all iterative vector operations, including draining * maps read() returning -1 to failure * passes in file length to validation * Error reporting to only completeExceptionally() those ranges which had not yet read data in. * Improved logging. readVectored() * made synchronized. This is only for the invocation; the actual async retrieves are unsynchronized. * closes input stream on invocation * switches to random IO, so avoids keeping any long-lived connection around. + AbstractSTestS3AHugeFiles enhancements. + ADDENDUM: test fix in ITestS3AContractVectoredRead Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation Contributed by Steve Loughran Change-Id: Ia4ed71864c595f175c275aad83a2ff5741693432	2024-04-03 13:17:52 +01:00
Steve Loughran	b4f9d8e6fa	Revert "HADOOP-19098. Vector IO: Specify and validate ranges consistently." This reverts commit `ba7faf90c8`.	2024-04-03 13:15:05 +01:00
Steve Loughran	ba7faf90c8	HADOOP-19098. Vector IO: Specify and validate ranges consistently. Clarifies behaviour of VectorIO methods with contract tests as well as specification. * Add precondition range checks to all implementations * Identify and fix bug where direct buffer reads was broken (HADOOP-19101; this surfaced in ABFS contract tests) * Logging in VectoredReadUtils. * TestVectoredReadUtils verifies validation logic. * FileRangeImpl toString() improvements * CombinedFileRange tracks bytes in range which are wanted; toString() output logs this. HDFS * Add test TestHDFSContractVectoredRead ABFS * Add test ITestAbfsFileSystemContractVectoredRead S3A * checks for vector IO being stopped in all iterative vector operations, including draining * maps read() returning -1 to failure * passes in file length to validation * Error reporting to only completeExceptionally() those ranges which had not yet read data in. * Improved logging. readVectored() * made synchronized. This is only for the invocation; the actual async retrieves are unsynchronized. * closes input stream on invocation * switches to random IO, so avoids keeping any long-lived connection around. + AbstractSTestS3AHugeFiles enhancements. Contains: HADOOP-19101. Vectored Read into off-heap buffer broken in fallback implementation Contributed by Steve Loughran	2024-04-02 20:16:38 +01:00
PJ Fanning	f7d1ec2d9e	HADOOP-19077. Remove use of javax.ws.rs.core.HttpHeaders (#6554 ). Contributed by PJ Fanning Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2024-04-01 12:43:39 +05:30
PJ Fanning	06db6289cb	HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410 ). Contributed	2024-03-30 19:58:12 +05:30
PJ Fanning	97c5a6efba	HADOOP-19041. Use StandardCharsets in more places (#6449 )	2024-03-28 23:17:18 -04:00
Viraj Jasani	9fe371aa15	HADOOP-18980. Invalid inputs for getTrimmedStringCollectionSplitByEquals (ADDENDUM) (#6546 ) This is a followup to #6406: HADOOP-18980. S3A credential provider remapping: make extensible It adds extra validation of key-value pairs in a configuration option, with tests. Contributed by Viraj Jasani	2024-03-26 11:18:03 +00:00
Gautham B A	44a249e32a	Exclude files from Apache RAT (#6671 )	2024-03-25 09:13:57 -07:00
Alex	5c7e40f910	HADOOP-19111. Removing redundant debug message about client info (#6666 ). Contributed by Zhongkun Wu. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2024-03-25 11:44:33 +08:00
yu liang	55dca911cc	HADOOP-19052.Hadoop use Shell command to get the count of the hard link which takes a lot of time (#6587 ) Contributed by liangyu. Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-03-24 10:14:27 +08:00
Peter Szucs	a957cd5049	YARN-5305. Allow log aggregation to discard expired delegation tokens (#6625 )	2024-03-20 15:33:10 +01:00
Steve Loughran	705fb8323b	HADOOP-19119. Spotbugs: possible NPE in org.apache.hadoop.crypto.key.kms.ValueQueue.getSize() (#6642 ) Spotbugs is mistaken here as it doesn't observer the read/write locks used to manage exclusive access to the maps. * cache the value between checks * tag as @VisibleForTesting Contributed by Steve Loughran	2024-03-19 17:18:07 +00:00
slfan1989	ff3f2255d2	HADOOP-19112. Hadoop 3.4.0 release wrap-up. (#6640 ) Contributed by Shilun Fan. Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-03-19 20:08:03 +08:00
Vinayakumar B	0f51d2a4ec	HADOOP-14451. Deadlock in NativeIO (#6632 )	2024-03-18 10:53:21 +05:30
PJ Fanning	fc166d3aec	HADOOP-19090. Use protobuf-java 3.23.4. (#6593 ). Contributed by PJ Fanning.	2024-03-07 15:09:01 +05:30
hfutatzhanghb	7012986fc3	HDFS-17345. Add a metrics to record block report generating cost time. (#6475 ). Contributed by farmmamba. Reviewed-by: Shuyan Zhang <zhangshuyan@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>	2024-03-06 16:59:00 +08:00
Steve Loughran	095229fefb	HADOOP-19097. S3A: Set fs.s3a.connection.establish.timeout to 30s (#6601 ) This is consistent with the value in the hadoop-aws source code Contributed by Steve Loughran	2024-03-05 10:10:27 +00:00
Jian Zhang	a6aa2925fb	HDFS-17333. DFSClient supports lazy resolution from hostname to IP. (#6430 ) Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>	2024-03-02 21:35:24 +09:00
Steve Loughran	095dfcca30	HADOOP-18088. Replace log4j 1.x with reload4j. (#4052 ) Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org> Includes HADOOP-18354. Upgrade reload4j to 1.22.2 due to XXE vulnerability (#4607). Log4j 1.2.17 has been replaced by reloadj 1.22.2 SLF4J is at 1.7.36	2024-02-13 16:33:51 +00:00
slfan1989	8011b21c52	HADOOP-19069. Use hadoop-thirdparty 1.2.0. (#6533 ) Contributed by Shilun Fan Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-02-08 19:18:04 +08:00
Steve Loughran	3f98cb6741	HADOOP-19045. CreateSession Timeout - followup (#6532 ) This is a followup to PR: HADOOP-19045. S3A: Validate CreateSession Timeout Propagation (#6470) Remove all declarations of fs.s3a.connection.request.timeout in - hadoop-common/src/main/resources/core-default.xml - hadoop-aws/src/test/resources/core-site.xml New test in TestAwsClientConfig to verify that the value defined in fs.s3a.Constants class is used. This is brittle to someone overriding it in their test setups, but as this test is intended to verify that the option is not explicitly set, there's no workaround. Contributed by Steve Loughran	2024-02-07 12:07:54 +00:00
Jia Fan	4f0f5a546c	HADOOP-19049. Fix StatisticsDataReferenceCleaner classloader leak (#6488 ) Contributed by Jia Fan	2024-02-03 14:48:52 +00:00
Xing Lin	d74e5160cd	HADOOP-19061 Capture exception from rpcRequestSender.start() in IPC.Connection.run() (#6519 ) * HADOOP-19061 - Capture exception from rpcRequestSender.start() in IPC.Connection.run() and proper cleaning is followed if an exception is thrown. --------- Co-authored-by: Xing Lin <xinglin@linkedin.com>	2024-02-02 16:22:16 -08:00
Viraj Jasani	7504b8505f	HADOOP-18980. S3A credential provider remapping: make extensible (#6406 ) Contributed by Viraj Jasani	2024-02-02 17:02:48 +00:00
DieterDP	be13e94843	HADOOP-18987. Various fixes to FileSystem API docs (#6292 ) Contributed by Dieter De Paepe	2024-02-02 11:49:31 +00:00
Tsz-Wo Nicholas Sze	da34ecdb83	HADOOP-19035. CrcUtil/CrcComposer should not throw IOException for non-IO. (#6443 )	2024-01-25 10:35:32 -08:00
PJ Fanning	76691dfa14	HADOOP-18894: upgrade sshd-core due to CVEs (#6060 ) Contributed by PJ Fanning. Reviewed-by: He Xiaoqiao <hexiaoqiao@apache.org> Reviewed-by: Steve Loughran <stevel@cloudera.com> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-01-21 08:13:25 +08:00
slfan1989	8444f69511	Preparing for 3.5.0 development (#6411 ) Co-authored-by: slfan1989 <slfan1989@apache.org>	2024-01-19 15:05:22 +08:00
hfutatzhanghb	ba6ada73ac	HDFS-17337. RPC RESPONSE time seems not exactly accurate when using FSEditLogAsync. (#6439 ). Contributed by farmmamba. Reviewed-by: Tao Li <tomscut@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>	2024-01-18 11:10:05 +08:00
Hexiaoqiao	9634bd31e6	HADOOP-19031. Enhance access control for RunJar. (#6427 ). Contributed by He Xiaoqiao. Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org> Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2024-01-17 15:00:06 +08:00
Mukund Thakur	7b1570e2f1	HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool. (#6372 ) HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize the risk of Timeout waiting for connection from the pool Contributed By: Mukund Thakur	2024-01-16 17:06:28 -06:00
slfan1989	6652922333	HADOOP-19040. mvn site commands fails due to MetricsSystem And MetricsSystemImpl changes. (#6450 ) Contributed by Shilun Fan. Reviewed-by: Steve Loughran <stevel@cloudera.com> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-01-16 22:11:16 +08:00
Xing Lin	453e264eb4	HADOOP-18981. Move oncrpc and portmap packages to hadoop-common (#6280 ) Move the org.apache.hadoop.{oncrpc, portmap} packages from the hadoop-nfs module to the hadoop-common module. This allows for use of the protocol beyond just NFS -including within HDFS itself. Contributed by Xing Lin	2024-01-11 14:06:15 +00:00
LiuGuH	5f9932acc4	HDFS-17325. Fix the documentation of fs expunge command in FileSystemShell.md. (#6413 ) Contributed by liuguanghua. Reviewed-by: Ayush Saxena <ayushsaxena@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2024-01-05 18:42:55 +08:00
Lei Yang	661c784662	HDFS-17290: Adds disconnected client rpc backoff metrics (#6359 )	2024-01-04 20:24:10 -08:00
hfutatzhanghb	8c26d4e9e0	HDFS-17322. Renames RetryCache#MAX_CAPACITY to be MIN_CAPACITY to fit usage.	2024-01-04 14:31:53 -08:00
huangzhaobo	e26139beaa	HDFS-17301. Add read and write dataXceiver threads count metrics to datanode. (#6377 ) Reviewed-by: hfutatzhanghb <hfutzhanghb@163.com> Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>	2023-12-29 12:43:46 +09:00
Mukund Thakur	01bde4afff	Revert "HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool" Pushed it by mistake. So sorry. This reverts commit `e28f83a1eb`.	2023-12-19 14:12:21 -06:00
Mukund Thakur	e28f83a1eb	HADOOP-19015. Increase fs.s3a.connection.maximum to 500 to minimize risk of Timeout waiting for connection from pool	2023-12-19 14:04:07 -06:00
Anika Kelhanka	62cc673d00	[HADOOP-19010] - NullPointerException in Hadoop Credential Check CLI (#6351 )	2023-12-16 12:23:52 +05:30
hfutatzhanghb	e91daae318	HDFS-17152. Fix the documentation of count command in FileSystemShell.md. (#5939 ). Contributed by farmmamba. Reviewed-by: Shilun Fan <slfan1989@apache.org> Signed-off-by: Shuyan Zhang <zhangshuyan@apache.org>	2023-12-11 16:53:37 +08:00
caozhiqiang	37d6cada14	HDFS-17272. NNThroughputBenchmark should support specifying the base directory for multi-client test (#6319 ). Contributed by caozhiqiang. Reviewed-by: Tao Li <tomscut@apache.org> Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-12-10 13:43:04 +05:30
zhangshuyan	809ae58e71	HADOOP-18982. Fix doc about loading native libraries. (#6281 ). Contributed by Shuyan Zhang. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2023-12-06 21:24:14 +08:00
Steve Loughran	e221231e81	HADOOP-18996. S3A to provide full support for S3 Express One Zone (#6308 ) This adds borad support for Amazon S3 Express One Zone to the S3A connector, particularly resilience of other parts of the codebase to LIST operations returning paths under which only in-progress uploads are taking place. hadoop-common and hadoop-mapreduce treewalking routines all cope with this; distcp is left alone. There are still some outstanding followup issues, and we expect more to surface with extended use. Contains HADOOP-18955. AWS SDK v2: add path capability probe "fs.s3a.capability.aws.v2 * lets us probe for AWS SDK version * bucket-info reports it Contains HADOOP-18961 S3A: add s3guard command "bucket" hadoop s3guard bucket -create -region us-west-2 -zone usw2-az2 \ s3a://stevel--usw2-az2--x-s3/ * requires -zone if bucket is zonal * rejects it if not * rejects zonal bucket suffixes if endpoint is not aws (safety feature) * imperfect, but a functional starting point. New path capability "fs.s3a.capability.zonal.storage" * Used in tests to determine whether pending uploads manifest paths * cli tests can probe for this * bucket-info reports it * some tests disable/change assertions as appropriate ---- Shell commands fail on S3Express buckets if pending uploads. New path capability in hadoop-common "fs.capability.directory.listing.inconsistent" 1. S3AFS returns true on a S3 Express bucket 2. FileUtil.maybeIgnoreMissingDirectory(fs, path, fnfe) decides whether to swallow the exception or not. 3. This is used in: Shell, FileInputFormat, LocatedFileStatusFetcher Fixes with tests * fs -ls -R * fs -du * fs -df * fs -find * S3AFS.getContentSummary() (maybe...should discuss) * mapred LocatedFileStatusFetcher * Globber, HADOOP-15478 already fixed that when dealing with S3 inconsistencies * FileInputFormat S3Express CreateSession request is permitted outside audit spans. S3 Bulk Delete calls request the store to return the list of deleted objects if RequestFactoryImpl is set to trace. log4j.logger.org.apache.hadoop.fs.s3a.impl.RequestFactoryImpl=TRACE Test Changes * ITestS3AMiscOperations removes all tests which require unencrypted buckets. AWS S3 defaults to SSE-S3 everywhere. * ITestBucketTool to test new tool without actually creating new buckets. * S3ATestUtils add methods to skip test suites/cases if store is/is not S3Express * Cutting down on "is this a S3Express bucket" logic to trailing --x-s3 string and not worrying about AZ naming logic. commented out relevant tests. * ITestTreewalkProblems validated against standard and S3Express stores Outstanding * Distcp: tests show it fails. Proposed: release notes. --- x-amz-checksum header not found when signing S3Express messages This modifies the custom signer in ITestCustomSigner to be a subclass of AwsS3V4Signer with a goal of preventing signing problems with S3 Express stores. ---- RemoteFileChanged renaming multipart file Maps 412 status code to RemoteFileChangedException Modifies huge file tests -Adds a check on etag match for stat vs list -ITestS3AHugeFilesByteBufferBlocks renames parent dirs, rather than files, to replicate distcp better. ---- S3Express custom Signing cannot handle bulk delete Copy custom signer into production JAR, so enable downstream testing Extend ITestCustomSigner to cover more filesystem operations - PUT - POST - COPY - LIST - Bulk delete through delete() and rename() - list + abort multipart uploads Suite is parameterized on bulk delete enabled/disabled. To use the new signer for a full test run: <property> <name>fs.s3a.custom.signers</name> <value>CustomSdkSigner:org.apache.hadoop.fs.s3a.auth.CustomSdkSigner</value> </property> <property> <name>fs.s3a.s3.signing-algorithm</name> <value>CustomSdkSigner</value> </property>	2023-12-01 14:16:33 +00:00
Steve Loughran	5cda162a80	HADOOP-18915. Tune/extend S3A http connection and thread pool settings (#6180 ) Increases existing pool sizes, as with server scale and vector IO, larger pools are needed fs.s3a.connection.maximum 200 fs.s3a.threads.max 96 Adds new configuration options for v2 sdk internal timeouts, both with default of 60s: fs.s3a.connection.acquisition.timeout fs.s3a.connection.idle.time All the pool/timoeut options are covered in performance.md Moves all timeout/duration options in the s3a FS to taking temporal units (h, m, s, ms,...); retaining the previous default unit (normally millisecond) Adds a minimum duration for most of these, in order to recover from deployments where a timeout has been set on the assumption the unit was seconds, not millis. Uses java.time.Duration throughout the codebase; retaining the older numeric constants in org.apache.hadoop.fs.s3a.Constants for backwards compatibility; these are now deprecated. Adds new class AWSApiCallTimeoutException to be raised on sdk-related methods and also gateway timeouts. This is a subclass of org.apache.hadoop.net.ConnectTimeoutException to support existing retry logic. + reverted default value of fs.s3a.create.performance to false; inadvertently set to true during testing. Contributed by Steve Loughran.	2023-11-29 15:12:44 +00:00
Viraj Jasani	f1e4376626	HADOOP-18959. Use builder for prefetch CachingBlockManager. (#6240 ) Contributed by Viraj Jasani	2023-11-23 11:07:44 +00:00
PJ Fanning	f609460bda	HADOOP-18957. Use StandardCharsets.UTF_8 (#6231 ). Contributed by PJ Fanning. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-11-20 23:44:48 +05:30
Istvan Fajth	7a55442297	HADOOP-18956. Zookeeper SSL/TLS support in ZKDelegationTokenSecretManager and ZKSignerSecretProvider (#6263 )	2023-11-17 01:51:43 -08:00
K0K0V0K	a32097a921	HADOOP-18954. Filter NaN values from JMX json interface. (#6229 ). Reviewed-by: Ferenc Erdelyi Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2023-11-09 17:14:14 +08:00
Tom	f58945d7d1	HDFS-16791. Add getEnclosingRoot() API to filesystem interface and implementations (#6198 ) The enclosing root path is a common ancestor that should be used for temp and staging dirs as well as within encryption zones and other restricted directories. Contributed by Tom McCormick	2023-11-08 14:25:21 +00:00
Viraj Jasani	cf3a4b3bb7	HADOOP-18850. S3A: Enable dual-layer server-side encryption with AWS KMS keys (#6140 ) Contributed by Viraj Jasani	2023-11-01 13:30:35 +00:00
ConfX	7c6af6a5f6	HADOOP-18905. Negative timeout in ZKFailovercontroller due to overflow. (#6092 ). Contributed by ConfX. Reviewed-by: Inigo Goiri <inigoiri@apache.org> Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-10-29 13:30:28 +05:30
Steve Loughran	7ec636deec	HADOOP-18930. Make fs.s3a.create.performance a bucket-wide setting. (#6168 ) If fs.s3a.create.performance is set on a bucket - All file overwrite checks are skipped, even if the caller says otherwise. - All directory existence checks are skipped. - Marker deletion is always skipped. This eliminates a HEAD and a LIST for every creation. * New path capability "fs.s3a.create.performance.enabled" true if the option is enabled. * Parameterize ITestS3AContractCreate to expect the different outcomes * Parameterize ITestCreateFileCost similarly, with changed cost assertions there. * create(/) raises an IOE. existing bug only noticed here. Contributed by Steve Loughran	2023-10-27 12:23:55 +01:00
Steve Loughran	8bd1f65efc	HADOOP-18948. S3A. Add option fs.s3a.directory.operations.purge.uploads to purge on rename/delete (#6218 ) S3A directory delete and rename will optionally abort all pending multipart uploads in their under their to-be-deleted paths when. fs.s3a.directory.operations.purge.upload is true It is off by default. The filesystems hasPathCapability("fs.s3a.directory.operations.purge.upload") probe will return true when this feature is enabled. Multipart uploads may accrue from interrupted data writes, uncommitted staging/magic committer jobs and other operations/applications. On AWS S3 lifecycle rules are the recommended way to clean these; this change improves support for stores which lack these rules. Contributed by Steve Loughran	2023-10-25 17:39:16 +01:00
huhaiyang	f85ac5b60d	HADOOP-18920. RPC Metrics : Optimize logic for log slow RPCs (#6146 )	2023-10-25 13:56:39 +08:00
huhaiyang	9d48af8d70	HADOOP-18868. Optimize the configuration and use of callqueue overflow trigger failover (#5998 )	2023-10-23 14:06:02 -07:00
Zita Dombi	4c04818d3d	HADOOP-18919. Zookeeper SSL/TLS support in HDFS ZKFC (#6194 )	2023-10-23 11:03:15 -07:00
Steve Loughran	e0563fed50	HADOOP-18908. Improve S3A region handling. (#6187 ) S3A region logic improved for better inference and to be compatible with previous releases 1. If you are using an AWS S3 AccessPoint, its region is determined from the ARN itself. 2. If fs.s3a.endpoint.region is set and non-empty, it is used. 3. If fs.s3a.endpoint is an s3.*.amazonaws.com url, the region is determined by by parsing the URL Note: vpce endpoints are not handled by this. 4. If fs.s3a.endpoint.region==null, and none could be determined from the endpoint, use us-east-2 as default. 5. If fs.s3a.endpoint.region=="" then it is handed off to The default AWS SDK resolution process. Consult the AWS SDK documentation for the details on its resolution process, knowing that it is complicated and may use environment variables, entries in ~/.aws/config, IAM instance information within EC2 deployments and possibly even JSON resources on the classpath. Put differently: it is somewhat brittle across deployments. Contributed by Ahmar Suhail	2023-10-17 15:37:36 +01:00
jianghuazhu	8963b25ab3	HADOOP-18926.Add some comments related to NodeFencer. (#6162 )	2023-10-13 15:34:44 -07:00
Steve Loughran	9bc159f4ac	HADOOP-18487. Make protobuf 2.5 an optional runtime dependency. (#4996 ) Protobuf 2.5 JAR is no longer needed at runtime. The option common.protobuf.scope defines whether the protobuf 2.5.0 dependency is marked as provided or not. * New package org.apache.hadoop.ipc.internal for internal only protobuf classes ...with a ShadedProtobufHelper in there which has shaded protobuf refs only, so guaranteed not to need protobuf-2.5 on the CP * All uses of org.apache.hadoop.ipc.ProtobufHelper have been replaced by uses of org.apache.hadoop.ipc.internal.ShadedProtobufHelper * The scope of protobuf-2.5 is set by the option common.protobuf2.scope In this patch is it is still "compile" * There is explicit reference to it in modules where it may be needed. * The maven scope of the dependency can be set with the common.protobuf2.scope option. It can be set to "provided" in a build: -Dcommon.protobuf2.scope=provided * Add new ipc(callable) method to catch and convert shaded protobuf exceptions raised during invocation of the supplied lambda expression * This is adopted in the code where the migration is not traumatically over-complex. RouterAdminProtocolTranslatorPB is left alone for this reason. Contributed by Steve Loughran	2023-10-13 13:48:38 +01:00
Steve Loughran	81edbebdd8	HADOOP-18889. S3A v2 SDK third party support (#6141 ) Tune AWS v2 SDK changes based on testing with third party stores including GCS. Contains HADOOP-18889. S3A v2 SDK error translations and troubleshooting docs * Changes needed to work with multiple third party stores * New third_party_stores document on how to bind to and test third party stores, including google gcs (which works!) * Troubleshooting docs mostly updated for v2 SDK Exception translation/resilience * New AWSUnsupportedFeatureException for unsupported/unavailable errors * Handle 501 method unimplemented as one of these * Error codes > 500 mapped to the AWSStatus500Exception if no explicit handler. * Precondition errors handled a bit better * GCS throttle exception also recognized. * GCS raises 404 on a delete of a file which doesn't exist: swallow it. * Error translation uses reflection to create IOE of the right type. All IOEs at the bottom of an AWS stack chain are regenerated. then a new exception of that specific type is created, with the top level ex its cause. This is done to retain the whole stack chain. * Reduce the number of retries within the AWS SDK * And those of s3a code. * S3ARetryPolicy explicitly declare SocketException as connectivity failure but subclasses BindException * SocketTimeoutException also considered connectivity * Log at debug whenever retry policies looked up * Reorder exceptions to alphabetical order, with commentary * Review use of the Invoke.retry() method The reduction in retries is because its clear when you try to create a bucket which doesn't resolve that the time for even an UnknownHostException to eventually fail over 90s, which then hit the s3a retry code. - Reducing the SDK retries means these escalate to our code better. - Cutting back on our own retries makes it a bit more responsive for most real deployments. - maybeTranslateNetworkException() and s3a retry policy means that unknown host exception is recognised and fails fast. Contributed by Steve Loughran	2023-10-12 17:47:44 +01:00
Kevin Risden	5c22934d90	HADOOP-18922. Race condition in ZKDelegationTokenSecretManager creating znode (#6150 ). Contributed by Kevin Risden. Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2023-10-12 23:21:26 +08:00
huangzhaobo	daa78adc88	HDFS-17200. Add some datanode related metrics to Metrics.md. (#6099 ). Contributed by huangzhaobo99 Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2023-10-06 12:40:44 +05:30
Viraj Jasani	27cb551821	HADOOP-18829. S3A prefetch LRU cache eviction metrics (#5893 ) Contributed by: Viraj Jasani	2023-09-21 14:31:44 +05:30
Pranav Saxena	f24b73e5f3	HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. (#6010 ) AbfsOutputStream to close the dataBlock object created for the upload. Contributed By: Pranav Saxena	2023-09-20 14:24:36 +05:30
PJ Fanning	c16484ffb2	HADOOP-18890. Remove use of okhttp in runtime code (#6057 ) Contributed by PJ Fanning	2023-09-19 12:38:36 +01:00
Hexiaoqiao	23c22b2823	HADOOP-18906. Increase default batch size of ZKDTSM token seqnum to reduce overflow speed of zonde dataVersion. (#6097 )	2023-09-18 10:50:53 -07:00
章锡平	60f3a2b101	HDFS-17138 RBF: We changed the hadoop.security.auth_to_local configur… (#5921 )	2023-09-18 09:40:22 -07:00
Vikas Kumar	e283375cdf	HADOOP-18851: Performance improvement for DelegationTokenSecretManager. (#6001 ). Contributed by Vikas Kumar. Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org> Signed-off-by: He Xiaoqiao <hexiaoqiao@apache.org>	2023-09-15 12:32:47 +08:00
ConfX	23360b3f6b	HADOOP-18824. ZKDelegationTokenSecretManager causes ArithmeticException due to improper numRetries value checking (#6052 )	2023-09-14 15:53:31 -07:00
PJ Fanning	56b928b86f	YARN-11498. Add exclusion for jettison everywhere jersey-json is loaded (#5786 ) All uses of jersey-json in the yarn and other hadoop modules now exclude the obsolete org.codehaus.jettison/jettison and so avoid all security issues which can come from the library. Contributed by PJ Fanning	2023-09-13 18:10:24 +01:00
Steve Loughran	81d90fd65b	HADOOP-18073. S3A: Upgrade AWS SDK to V2 (#5995 ) This patch migrates the S3A connector to use the V2 AWS SDK. This is a significant change at the source code level. Any applications using the internal extension/override points in the filesystem connector are likely to break. This includes but is not limited to: - Code invoking methods on the S3AFileSystem class which used classes from the V1 SDK. - The ability to define the factory for the `AmazonS3` client, and to retrieve it from the S3AFileSystem. There is a new factory API and a special interface S3AInternals to access a limited set of internal classes and operations. - Delegation token and auditing extensions. - Classes trying to integrate with the AWS SDK. All standard V1 credential providers listed in the option fs.s3a.aws.credentials.provider will be automatically remapped to their V2 equivalent. Other V1 Credential Providers are supported, but only if the V1 SDK is added back to the classpath. The SDK Signing plugin has changed; all v1 signers are incompatible. There is no support for the S3 "v2" signing algorithm. Finally, the aws-sdk-bundle JAR has been replaced by the shaded V2 equivalent, "bundle.jar", which is now exported by the hadoop-aws module. Consult the document aws_sdk_upgrade for the full details. Contributed by Ahmar Suhail + some bits by Steve Loughran	2023-09-11 14:30:25 +01:00
Szilard Nemeth	9342ecf6cc	HADOOP-18870. CURATOR-599 change broke functionality introduced in HADOOP-18139 and HADOOP-18709. Contributed by Ferenc Erdelyi	2023-09-06 21:32:36 -04:00
huhaiyang	2831c7ce26	HADOOP-18880. Add some rpc related metrics to Metrics.md (#6015 ) Contributed by Yanghai Hu. Reviewed-by: Inigo Goiri <inigoiri@apache.org> Signed-off-by: Shilun Fan <slfan1989@apache.org>	2023-09-05 17:34:05 +08:00
Steve Loughran	28c533a582	Revert "HADOOP-18860. Upgrade mockito version to 4.11.0 (#5977 )" This reverts commit `1046f9cf98`.	2023-08-31 14:54:53 +01:00
Anmol Asrani	1046f9cf98	HADOOP-18860. Upgrade mockito version to 4.11.0 (#5977 ) As well as the POM update, this patch moves to the (renamed) verify methods. Backporting mockito test changes may now require cherrypicking this patch, otherwise use the old method names. Contributed by Anmol Asrani	2023-08-29 12:12:27 +01:00

1 2 3 4 5 ...

6135 Commits