hadoop

Author	SHA1	Message	Date
Daniel Carl Jones	c30b2f0b8c	HADOOP-18304. Improve user-facing S3A committers documentation (#4478 ) Contributed by: Daniel Carl Jones	2022-10-19 13:08:27 +01:00
Steve Loughran	7a18ceb269	HADOOP-18476. Abfs and S3A FileContext bindings to close wrapped filesystems in finalizer (#4966 ) This is to try and close the underlying filesystems when the FileContext APIs are used. Without this, threads may be leaked Contributed by Steve Loughran	2022-10-18 15:28:55 +01:00
Hexiaoqiao	84c7fd909b	HADOOP-18497. Upgrade commons-text version to 1.10.0 to fix CVE-2022-42889. (#5037 ). Contributed by PJ Fanning.	2022-10-18 15:05:08 +01:00
slfan1989	2e3f91bdf5	HADOOP-18360. Update commons-csv from 1.0 to 1.9.0. (#4928 ). Contributed by fanshilun. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2022-10-17 10:23:13 +05:30
PJ Fanning	96d4b9e6a7	HADOOP-18493: upgrade jackson-databind to 2.12.7.1 (#5011 ). Contributed by PJ Fanning. Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>	2022-10-17 10:04:21 +05:30
Steve Loughran	cd856b7195	HADOOP-17563. Upgrade BouncyCastle to 1.68 (#3980 ) (#5015 ) Addresses CVE-2020-15522 and CVE-2020-26939. This can break builds with older maven shade plugins or other code using asm.jar which is not aware of recent java bytecodes and/or multi-release JARs. fix: use a later version of asm.jar Contributed by PJ Fanning	2022-10-15 15:09:05 +01:00
ahmarsuhail	08760fc4c1	HADOOP-18481. AWS v2 SDK upgrade log to not about standard AWS Credential Providers. (#4973 ) The AWS SDKV2 upgrade log no longer warns about instantiation of the v1 SDK credential providers which are commonly used in s3a configurations: * com.amazonaws.auth.EnvironmentVariableCredentialsProvider * com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper * com.amazonaws.auth.InstanceProfileCredentialsProvider When the hadoop-aws module moves to the v2 SDK, references to these credential providers will be rewritten to their v2 equivalents. Follow-on to HADOOP-18382. "Upgrade AWS SDK to V2 - Prerequisites" Contributed by Ahmar Suhail	2022-10-14 11:46:14 +01:00
ahmarsuhail	47c1c8eddc	HADOOP-18382. AWS SDK v2 upgrade prerequisites (#4698 ) This patch prepares the hadoop-aws module for a future migration to using the v2 AWS SDK (HADOOP-18073) That upgrade will be incompatible; this patch prepares for it: -marks some credential providers and other classes and methods as @deprecated. -updates site documentation -reduces the visibility of the s3 client; other than for testing, it is kept private to the S3AFileSystem class. -logs some warnings when deprecated APIs are used. The warning messages are printed only once per JVM's life. To disable them, set the log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade to ERROR Contributed by Ahmar Suhail	2022-10-14 11:45:43 +01:00
monthonk	52eca61a3e	HADOOP-18292. Fix s3 select tests when running against unsupported storage class (#4489 ) Follow-on from HADOOP-12020. Contributed by Monthon Klongklaew	2022-10-13 13:37:35 +01:00
belugabehr	6253bf72b6	HADOOP-17779: Lock File System Creator Semaphore Uninterruptibly (#3158 ) Contributed by David Mollitor.	2022-10-11 13:07:42 +01:00
Xing Lin	760144f135	HDFS-16628. RBF: Correct target directory when move to trash for kerberos login user. (#4974 ) Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-10-11 16:14:39 +09:00
Ashutosh Gupta	6847ec0647	HADOOP-11245. Update NFS gateway to use Netty4 (#2832 ) (#4997 ) Reviewed-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org> Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>	2022-10-11 05:27:43 +08:00
Mukund Thakur	77cb778a44	HADOOP-18460. checkIfVectoredIOStopped before populating the buffers (#4986 ) Contributed by Mukund Thakur	2022-10-10 11:18:22 +01:00
Steve Loughran	80525615e5	HADOOP-18480. Upgrade aws sdk to 1.12.316 (#4972 ) Contributed by Steve Loughran	2022-10-10 10:29:41 +01:00
Steve Loughran	e360e7620c	HADOOP-18468: Upgrade jettison to 1.5.1 to fix CVE-2022-40149 (#4937 ) Contributed by PJ Fanning	2022-10-10 10:05:39 +01:00
Xing Lin	7d7f7a9e9b	HDFS-16024. RBF: Rename data to the Trash should be based on src location (#4962 ) (cherry picked from commit `e18d806212`) Reviewed-by: Dinesh Chitlangia <dineshc@apache.org> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-10-10 00:33:48 +09:00
Steve Loughran	61e1603750	HADOOP-18401. No ARM binaries in branch-3.3.x releases. (#4953 ) Fix the branch-3.3 docker image and create-release scripts to work on arm 64 and macbook m1 Contributed by Ayush Saxena and Steve Loughran	2022-10-07 15:58:51 +01:00
Steve Loughran	c70b8709cc	HADOOP-18442. Remove openstack support (#4855 ) The swift:// connector for openstack support has been removed. The hadoop-openstack jar remains, only now it is empty of code. This is to ensure that projects which declare the JAR a dependency will still have successful builds. Contributed by Steve Loughran	2022-10-07 12:03:08 +01:00
Steve Loughran	80781306dd	HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940 ) Add to XMLUtils a set of methods to create secure XML Parsers/transformers, locking down DTD, schema, XXE exposure. Use these wherever XML parsers are created. Contributed by PJ Fanning	2022-10-07 10:47:55 +01:00
Ashutosh Gupta	725cd90712	MAPREDUCE-7370. Parallelize MultipleOutputs#close call (#4248 ). Contributed by Ashutosh Gupta. Reviewed-by: Akira Ajisaka <aajisaka@apache.org> Signed-off-by: Chris Nauroth <cnauroth@apache.org> (cherry picked from commit `062c50db6b`)	2022-10-06 23:14:38 +00:00
Ashutosh Gupta	1c3bf42ad0	YARN-11303. Upgrade jquery ui to 1.13.2 to mitigate CVE-2022-31160 (#4895 ) Contributed by Ashutosh Gupta	2022-10-05 12:09:11 +01:00
Mukund Thakur	0d772b353f	HADOOP-18463. Add an integration test to process data asynchronously during vectored read. (#4921 ) part of HADOOP-18103. Contributed by: Mukund Thakur	2022-09-28 15:38:41 -05:00
Mukund Thakur	bbe841e601	HADOOP-18347. S3A Vectored IO to use bounded thread pool. (#4918 ) part of HADOOP-18103. Also introducing a config fs.s3a.vectored.active.ranged.reads to configure the maximum number of number of range reads a single input stream can have active (downloading, or queued) to the central FileSystem instance's pool of queued operations. This stops a single stream overloading the shared thread pool. Contributed by: Mukund Thakur Conflicts: hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java	2022-09-28 15:34:31 -05:00
Mehakmeet Singh	e5a566c91f	HADOOP-18416. fix ITestS3AIOStatisticsContext test failure (#4931 ) Follow on to HADOOP-17461. Contributed by: Mehakmeet Singh	2022-09-28 14:17:56 +05:30
Ashutosh Gupta	dea018ef23	HDFS-16766. XML External Entity (XXE) attacks can occur while processing XML received from an untrusted source (#4886 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `d9f435f6ac`)	2022-09-27 15:44:58 +09:00
Ashutosh Gupta	51605f9dcc	HADOOP-18443. Upgrade snakeyaml to 1.32 (#4873 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org>	2022-09-25 23:50:46 +09:00
Xing Lin	f1c1ad52c5	HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash (#4869 ) * HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash Signed-off-by: Xing Lin <xinglin@linkedin.com>	2022-09-23 11:06:23 -07:00
Steve Loughran	af0a6d7987	HADOOP-18456. NullPointerException in ObjectListingIterator. (#4909 ) This problem surfaced in impala integration tests IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build after the change HADOOP-17461. Add thread-level IOStatistics Context The actual GC race condition came with HADOOP-18091. S3A auditing leaks memory through ThreadLocal references The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create() where a strong reference to the new value is kept in a local variable and referred to later so that the JVM will not GC it. Along with the fix, extra assertions ensure that if the problem is not fixed, applications will fail faster/more meaningfully. Contributed by Steve Loughran.	2022-09-23 09:57:49 +01:00
Kidd5368	ceec19e61a	HDFS-16776 Erasure Coding: The length of targets should be checked when DN gets a reconstruction task (#4901 ) (cherry picked from commit `9a29075f91`)	2022-09-23 12:29:39 +09:00
PJ Fanning	d66dea300e	HADOOP-18341: upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9 (#4916 )	2022-09-22 10:44:27 +09:00
Ashutosh Gupta	683fa264ee	HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails (#4896 ) The patch provides detailed diagnostics of file creation failure in LocalDirAllocator. Contributed by: Ashutosh Gupta	2022-09-21 11:54:47 +05:30
Ashutosh Gupta	3af155ceeb	HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `30c36ef25a`)	2022-09-19 13:45:47 +09:00
Steve Vaughan	357c83db94	HDFS-16686. GetJournalEditServlet fails to authorize valid Kerberos request (#4724 ) (#4794 )	2022-09-13 10:50:23 -07:00
Ashutosh Gupta	2532eca013	YARN-11241. Add uncleaning option for local app log file with log-aggregation enabled (#4703 ) Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> Signed-off-by: Akira Ajisaka <aajisaka@apache.org> (cherry picked from commit `65a027b112`)	2022-09-12 23:33:10 +09:00
Mukund Thakur	c9d6605a59	HADOOP-18439. Fix VectoredIO for LocalFileSystem when checksum is enabled. (#4862 ) part of HADOOP-18103. While merging the ranges in CheckSumFs, they are rounded up based on the value of checksum bytes size which leads to some ranges crossing the EOF thus they need to be fixed else it will cause EOFException during actual reads. Contributed By: Mukund Thakur	2022-09-09 11:17:32 -05:00
Sumangala Patki	2e4c5ca88f	HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3699 ) Successor for the reverted PR #3341, using the hadoop @VisibleForTesting attribute Contributed by Sumangala Patki	2022-09-06 11:34:55 +01:00
sreeb-msft	5f3bc4340e	HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758 ) ITestAbfsManifestCommitProtocol to set requireRenameResilience to false for nonHNS configuration Contributed by Sree Bhattacharyya	2022-09-02 12:34:43 +01:00
monthonk	9dffa65021	HADOOP-18339. S3A storage class option only picked up when buffering writes to disk. (#4669 ) Follow-up to HADOOP-12020 Support configuration of different S3 storage classes; S3 storage class is now set when buffering to heap/bytebuffers, and when creating directory markers Contributed by Monthon Klongklaew	2022-09-01 18:15:48 +01:00
Steve Vaughan	3a6c8ff8bb	HDFS-16755. TestQJMWithFaults.testUnresolvableHostName() can fail due to unexpected host resolution (#4833 ) Use ".invalid" domain from IETF RFC 2606 to ensure that the host doesn't resolve. Contributed by Steve Vaughan Jr	2022-09-01 14:01:26 +01:00
Mukund Thakur	6cc5c92a89	HADOOP-18391. Improvements in VectoredReadUtils#readVectored() for direct buffers (#4787 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-31 11:15:15 -05:00
Mukund Thakur	0a11ce2546	HADOOP-18407. Improve readVectored() api spec (#4760 ) part of HADOOP-18103. Contributed By: Mukund Thakur	2022-08-31 11:15:10 -05:00
Steve Loughran	f6c557d3b3	HADOOP-18410. S3AInputStream.unbuffer() does not release http connections (#4766 ) HADOOP-16202 "Enhance openFile()" added asynchronous draining of the remaining bytes of an S3 HTTP input stream for those operations (unbuffer, seek) where it could avoid blocking the active thread. This patch fixes the asynchronous stream draining to work and so return the stream back to the http pool. Without this, whenever unbuffer() or seek() was called on a stream and an asynchronous drain triggered, the connection was not returned; eventually the pool would be empty and subsequent S3 requests would fail with the message "Timeout waiting for connection from pool" The root cause was that even though the fields passed in to drain() were converted to references through the methods, in the lambda expression passed in to submit, they were direct references operation = client.submit( () -> drain(uri, streamStatistics, false, reason, remaining, object, wrappedStream)); /* here */ Those fields were only read during the async execution, at which point they would have been set to null (or even a subsequent read). A new SDKStreamDrainer class peforms the draining; this is a Callable and can be submitted directly to the executor pool. The class is used in both the classic and prefetching s3a input streams. Also, calling unbuffer() switches the S3AInputStream from adaptive to random IO mode; that is, it is considered a cue that future IO will not be sequential, whole-file reads. Contributed by Steve Loughran.	2022-08-31 16:52:12 +01:00
Masatake Iwasaki	2a1701151c	HADOOP-18375. Fix failure of shelltest for hadoop_add_ldlibpath. (#4652 ) (cherry picked from commit `22835be63d`)	2022-08-30 10:44:11 +00:00
Steve Vaughan	833fc64558	HDFS-16684. Exclude the current JournalNode (#4786 ) The JournalNodeSyncer will include the local instance in syncing when using a bind host (e.g. 0.0.0.0). There is a mechanism that is supposed to exclude the local instance, but it doesn't recognize the meta-address as a local address. Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log attempts to sync with itself as part of the normal syncing rotation. For an HA configuration running 3 JournalNodes, the "other" list used by the JournalNodeSyncer will include 3 proxies. Exclude bound local addresses, including the use of a wildcard address in the bound host configurations, while still allowing multiple instances on the same host. Allow sync attempts with unresolved addresses, so that sync attempts can drive resolution as servers become available. Backport. Signed-off-by: stack <stack@apache.org>	2022-08-28 11:15:04 -07:00
zhengchenyu	3edddaf9fc	HDFS-16732. [SBN READ] Avoid get location from observer when the block report is delayed (#4756 ) Signed-off-by: Erik Krogen <xkrogen@apache.org> (cherry picked from commit `231a4468cd`)	2022-08-25 10:41:04 -07:00
Simba Dzinamarira	0326b7e935	HADOOP-18406: Adds alignment context to call path for creating RPC proxy with multiple connections per user. Fixes #4748 Signed-off-by: Owen O'Malley <oomalley@linkedin.com>	2022-08-24 16:48:55 -07:00
xuzq	5b2d6684e6	HADOOP-13144. Enhancing IPC client throughput via multiple connections per user (#4542 )	2022-08-24 16:48:35 -07:00
Ayush Saxena	9890a4aea4	Revert "HADOOP-18417. Upgrade to M7 of surefire plugin (#4795 )" This reverts commit `1ff121041c`.	2022-08-25 03:53:34 +05:30
Steve Loughran	1168abc704	MAPREDUCE-7403. manifest-committer dynamic partitioning support. (#4728 ) Declares its compatibility with Spark's dynamic output partitioning by having the stream capability "mapreduce.job.committer.dynamic.partitioning" Requires a Spark release with SPARK-40034, which does the probing before deciding whether to accept/rejecting instantiation with dynamic partition overwrite set This feature can be declared as supported by any other PathOutputCommitter implementations whose algorithm and destination filesystem are compatible. None of the S3A committers are compatible. The classic FileOutputCommitter is, but it does not declare itself as such out of our fear of changing that code. The Spark-side code will automatically infer compatibility if the created committer is of that class or a subclass. Contributed by Steve Loughran.	2022-08-24 11:19:05 +01:00
Steve Vaughan	98dd2b534f	HADOOP-18417. Upgrade to M7 of surefire plugin (#4795 ) This addresses an issue where the plugin's default classpath for executing tests fails to include org.junit.platform.launcher.core.LauncherFactory. Contributed by: Steve Vaughan Jr	2022-08-24 11:07:34 +01:00

... 2 3 4 5 6 ...

25213 Commits