Commit Graph

25080 Commits

Author SHA1 Message Date
Anmol Asrani
98aac2a15c
HADOOP-18457. ABFS: Support account level throttling (#5034)
This allows  abfs request throttling to be shared across all
abfs connections talking to containers belonging to the same abfs storage
account -as that is the level at which IO throttling is applied.

The option is enabled/disabled in the configuration option
"fs.azure.account.throttling.enabled";
The default is "true"

Contributed by Anmol Asrani
2022-11-30 14:24:28 +00:00
Simbarashe Dzinamarira
19468c1232 HDFS-16847: RBF: Prevents StateStoreFileSystemImpl from committing tmp file after encountering an IOException. (#5145) 2022-11-29 10:19:48 -08:00
HarshitGupta11
77dfd7ed24
HADOOP-18530. ChecksumFileSystem::readVectored might return byte buffers not positioned at 0 (#5168)
Contributed by Harshit Gupta
2022-11-29 14:53:18 +00:00
sreeb-msft
3f256fa209 HADOOP-18498. ABFS: Remove unwanted ? prefix from SAS Tokens (#5136)
This commit parses SAS Tokens and removes the unwanted prefix of '?' from them, if present.

At present, SAS Tokens are provided to the driver through customer implementations of the SASTokenProvider interface. The SAS token providers should not assume that the token will be the first query parameter in the URIs that communicate with the backend. However, it was observed that certain public interfaces provided by Storage to generate SAS can include the '?' as the first character of the SAS Token, which would ideally be the case when it is the first query parameter. Thus, tokens that contain this prefix will lead to an error in the driver due to a clash of query parameters.

To avoid failures for use of such SAS tokens, after receiving the SAS Token from the provider, the code checks for whether any ? prefix is present or not. If yes, it is removed before further usage of the token. This way, users would not have to manually remove the prefix before passing it on as a configuration.

Contributed by Sree Bhattacharya
2022-11-28 11:49:02 +00:00
Owen O'Malley
6aacb3557f HDFS-16844: RBF: Adds resilancy when StateStore gets exceptions. (#5138)
Allows the StateStore to stay up when there are errors reading the data.
2022-11-18 09:30:04 -08:00
Owen O'Malley
bc4d7b4774 HADOOP-18324. Interrupting RPC Client calls can lead to thread exhaustion. (#4527)
* Exactly 1 sending thread per an RPC connection.
* If the calling thread is interrupted before the socket write, it will be skipped instead of sending it anyways.
* If the calling thread is interrupted during the socket write, the write will finish.
* RPC requests will be written to the socket in the order received.
* Sending thread is only started by the receiving thread.
* The sending thread periodically checks the shouldCloseConnection flag.
2022-11-18 08:51:57 -08:00
Ashutosh Gupta
2cc1896d44
HADOOP-18484. Upgrade hsqldb to v2.7.1 to mitigate CVE-2022-41853 (#5114) 2022-11-16 11:42:43 +01:00
Mehakmeet Singh
5ca626e3e3
HADOOP-18528. Disable abfs prefetching by default (#5134)
Disables block prefetching on ABFS InputStreams, by setting
fs.azure.enable.readahead to false in core-default.xml and
the matching java constant.

This prevents
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests.

Once a fix for that is committed, this change can be reverted.

Contributed by Mehakmeet Singh.
2022-11-15 14:32:23 +00:00
Steve Loughran
cd517eddea
HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead (#5103)
* HADOOP-18517. ABFS: Add fs.azure.enable.readahead option to disable readahead

Adds new config option to turn off readahead
* also allows it to be passed in through openFile(),
* extends ITestAbfsReadWriteAndSeek to use the option, including one
  replicated test...that shows that turning it off is slower.

Important: this does not address the critical data corruption issue
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests

What is does do is provide a way to completely bypass the ReadBufferManager.
To mitigate the problem, either fs.azure.enable.readahead needs to be set to false,
or set "fs.azure.readaheadqueue.depth" to 0 -this still goes near the (broken)
ReadBufferManager code, but does't trigger the bug.

For safe reading of files through the ABFS connector, readahead MUST be disabled
or the followup fix to HADOOP-18521 applied

Contributed by Steve Loughran
2022-11-08 13:42:17 +00:00
PJ Fanning
c96f02b0e8
HADOOP-18512: upgrade woodstox-core to 5.4.0 for security fix (#5087). Contributed by PJ Fanning.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-11-02 00:14:25 +05:30
Steve Loughran
9bf9560afe
HADOOP-18507. VectorIO FileRange type to support a "reference" field (#5076)
Contributed by Steve Loughran
2022-11-01 09:53:11 +00:00
sabertiger
88375fc354
HADOOP-18233. Possible race condition with TemporaryAWSCredentialsProvider (#5024)
This fixes a race condition with the TemporaryAWSCredentialProvider
one which has existed for a long time but which only surfaced
(usually in Spark) when the bucket existence probe was disabled
by setting fs.s3a.bucket.probe to 0 -a performance speedup
which was made the default in HADOOP-17454.

Contributed by Jimmy Wong.
2022-10-31 18:16:14 +00:00
Mehakmeet Singh
b5db5fa508
HADOOP-18499. S3A to support HTTPS web proxies (#5084)
The option "fs.s3a.proxy.ssl.enabled" controls
whether the s3a connects to a proxy over HTTP (default) or HTTPS.
Set to "true" to use HTTPS.

Contributed by Mehakmeet Singh
2022-10-27 20:18:46 +05:30
PJ Fanning
8a275b990a MAPREDUCE-7411: use secure XML parsers in mapreduce modules (#4980)
Lockdown of parsers in hadoop-mapreduce.

Follow-on to HADOOP-18469. Add secure XML parser factories to XMLUtils

Contributed by P J Fanning
2022-10-26 11:04:56 +01:00
Steve Loughran
65723c58bb
YARN-11330. use secure XML parsers (#4981)
Move construction of XML parsers in YARN
modules to using the locked-down parser factory
of HADOOP-18469.

One exception: GpuDeviceInformationParser still supports DTD resolution;
all other features are disabled.

Contributed by P J Fanning
2022-10-26 10:42:08 +01:00
Steve Loughran
a110b6e592
HADOOP-15983. Use jersey-json that is built to use jackson2 (#3988)
Moves from com.sun.jersey 1.19 to the artifact
com.github.pjfanning:jersey-json:1.20

This allows jackson 1 to be removed from the classpath.

Contains

* HADOOP-16908. Prune Jackson 1 from the codebase and restrict
   its usage for future
* HADOOP-18219. Fix shaded client test failure

These are needed for the HADOOP-15983 changes to build.

Contributed by PJ Fanning.
2022-10-21 11:34:09 +01:00
Steve Loughran
6dbdfda091
HDFS-16795. Use secure XML parsers (#4979)
Move construction of XML parsers in HDFS
modules to using the locked-down parser factory
of HADOOP-18469.

Contributed by P J Fanning
2022-10-20 17:55:26 +01:00
Daniel Carl Jones
1798677842
HADOOP-18465. Fix S3A SSE test skip when encryption is disabled (#4925)
Contributed by Daniel Carl Jones
2022-10-19 16:04:36 +01:00
Daniel Carl Jones
8e11fd6a81
HADOOP-18304. Improve user-facing S3A committers documentation (#4478)
Contributed by: Daniel Carl Jones
2022-10-19 13:10:18 +01:00
Steve Loughran
84404257c1
HADOOP-18476. Abfs and S3A FileContext bindings to close wrapped filesystems in finalizer (#4966)
This is to try and close the underlying filesystems when the FileContext APIs are used.
Without this, threads may be leaked

Contributed by Steve Loughran
2022-10-18 15:31:45 +01:00
Hexiaoqiao
193e042dd0
HADOOP-18497. Upgrade commons-text version to 1.10.0 to fix CVE-2022-42889. (#5037).
Contributed by PJ Fanning.
2022-10-18 15:31:30 +01:00
Steve Loughran
a05b01e71f
HADOOP-17563. Upgrade BouncyCastle to 1.68 (#3980) (#5015)
Addresses CVE-2020-15522 and CVE-2020-26939.

This can break builds with older maven shade plugins or
other code using asm.jar which is not aware of recent java bytecodes
and/or multi-release JARs. fix: use a later version of asm.jar

Contributed by PJ Fanning
2022-10-17 18:26:09 +01:00
slfan1989
09a4f5a4e7
HADOOP-18360. Update commons-csv from 1.0 to 1.9.0. (#4928). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-10-17 10:31:21 +05:30
PJ Fanning
12feeae983
HADOOP-18493: upgrade jackson-databind to 2.12.7.1 (3.3 branch) (#5018). Contributed by PJ Fanning.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-10-17 10:04:58 +05:30
ahmarsuhail
860c79192a
HADOOP-18481. AWS v2 SDK upgrade log to not about standard AWS Credential Providers. (#4973)
The AWS SDKV2 upgrade log no longer warns about instantiation
of the v1 SDK credential providers which are commonly used in
s3a configurations:

* com.amazonaws.auth.EnvironmentVariableCredentialsProvider
* com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper
* com.amazonaws.auth.InstanceProfileCredentialsProvider

When the hadoop-aws module moves to the v2 SDK, references to these
credential providers will be rewritten to their v2 equivalents.

Follow-on to HADOOP-18382. "Upgrade AWS SDK to V2 - Prerequisites"

Contributed by Ahmar Suhail
2022-10-14 11:47:50 +01:00
ahmarsuhail
c2bbff9ea1
HADOOP-18382. AWS SDK v2 upgrade prerequisites (#4698)
This patch prepares the hadoop-aws module for a future
migration to using the v2 AWS SDK (HADOOP-18073)

That upgrade will be incompatible; this patch prepares
for it:
-marks some credential providers and other
 classes and methods as @deprecated.
-updates site documentation
-reduces the visibility of the s3 client;
 other than for testing, it is kept private to
 the S3AFileSystem class.
-logs some warnings when deprecated APIs are used.

The warning messages are printed only once
per JVM's life. To disable them, set the
log level of org.apache.hadoop.fs.s3a.SDKV2Upgrade
to ERROR

Contributed by Ahmar Suhail
2022-10-14 11:47:34 +01:00
monthonk
7b607b8b12
HADOOP-18292. Fix s3 select tests when running against unsupported storage class (#4489)
Follow-on from HADOOP-12020.

Contributed by Monthon Klongklaew
2022-10-13 13:40:48 +01:00
Navink
802494e3a4 HDFS-13369. Fix for FSCK Report broken with RequestHedgingProxyProvider (#4917)
Contributed-by: navinko <nakumr@cloudera.com>
(cherry picked from commit 4891bf5049)
2022-10-12 17:16:58 +08:00
belugabehr
9b2839215b
HADOOP-17779: Lock File System Creator Semaphore Uninterruptibly (#3158)
Contributed by David Mollitor.
2022-10-11 13:25:40 +01:00
Ashutosh Gupta
0d067f69da
HADOOP-11245. Update NFS gateway to use Netty4 (#2832) (#4997)
Reviewed-by: Tsz-Wo Nicholas Sze <szetszwo@apache.org>

Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit 6847ec0647)
2022-10-10 14:46:31 -07:00
Mukund Thakur
2aa77a75f9
HADOOP-18460. checkIfVectoredIOStopped before populating the buffers (#4986)
Contributed by Mukund Thakur
2022-10-10 13:33:00 +01:00
Steve Loughran
cdbfc7abd3
HADOOP-18480. Upgrade aws sdk to 1.12.316 (#4972)
Contributed by Steve Loughran
2022-10-10 13:32:45 +01:00
Steve Loughran
cc5344e80f
HADOOP-18468: Upgrade jettison to 1.5.1 to fix CVE-2022-40149 (#4937)
Contributed by PJ Fanning
2022-10-10 10:07:57 +01:00
Steve Loughran
2a4d421c5c
HADOOP-18401. No ARM binaries in branch-3.3.x releases. (#4953)
Fix the branch-3.3 docker image and create-release scripts to work on arm 64 and macbook m1

Contributed by Ayush Saxena and Steve Loughran
2022-10-07 16:00:33 +01:00
Steve Loughran
8e8bc037aa
HADOOP-18442. Remove openstack support (#4855)
The swift:// connector for openstack support has been removed.
The hadoop-openstack jar remains, only now it is empty of code.
This is to ensure that projects which declare the JAR a dependency
will still have successful builds.

Contributed by Steve Loughran
2022-10-07 12:03:55 +01:00
Steve Loughran
8fb3a2bcbd
HADOOP-18469. Add secure XML parser factories to XMLUtils (#4940)
Add to XMLUtils a set of methods to create secure XML Parsers/transformers,
locking down DTD, schema, XXE exposure.

Use these wherever XML parsers are created.

Contributed by PJ Fanning
2022-10-07 11:03:31 +01:00
Ashutosh Gupta
7978c0a021
YARN-11303. Upgrade jquery ui to 1.13.2 to mitigate CVE-2022-31160 (#4895)
Contributed by Ashutosh Gupta
2022-10-05 12:10:24 +01:00
Mehakmeet Singh
bae34c4236 HADOOP-18416. fix ITestS3AIOStatisticsContext test failure (#4931)
Follow on to HADOOP-17461.

Contributed by: Mehakmeet Singh
2022-09-28 16:12:00 -05:00
Mukund Thakur
052d7bd227 HADOOP-18463. Add an integration test to process data asynchronously during vectored read. (#4921)
part of HADOOP-18103.

Contributed by: Mukund Thakur
2022-09-28 16:11:26 -05:00
Mukund Thakur
1bc63365a5 HADOOP-18347. S3A Vectored IO to use bounded thread pool. (#4918)
part of HADOOP-18103.

Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.

Contributed by: Mukund Thakur
 Conflicts:
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
	hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
2022-09-28 16:11:20 -05:00
Mukund Thakur
01761acbaa HADOOP-18470. Release hadoop 3.3.5 2022-09-27 11:30:18 -05:00
Ashutosh Gupta
dea018ef23
HDFS-16766. XML External Entity (XXE) attacks can occur while processing XML received from an untrusted source (#4886)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit d9f435f6ac)
2022-09-27 15:44:58 +09:00
Ashutosh Gupta
51605f9dcc
HADOOP-18443. Upgrade snakeyaml to 1.32 (#4873)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-09-25 23:50:46 +09:00
Xing Lin
f1c1ad52c5 HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash (#4869)
* HADOOP-18444 Add Support for localized trash for ViewFileSystem in Trash.moveToAppropriateTrash

Signed-off-by: Xing Lin <xinglin@linkedin.com>
2022-09-23 11:06:23 -07:00
Steve Loughran
af0a6d7987
HADOOP-18456. NullPointerException in ObjectListingIterator. (#4909)
This problem surfaced in impala integration tests
   IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
  HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
 HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.

Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.

Contributed by Steve Loughran.
2022-09-23 09:57:49 +01:00
Kidd5368
ceec19e61a HDFS-16776 Erasure Coding: The length of targets should be checked when DN gets a reconstruction task (#4901)
(cherry picked from commit 9a29075f91)
2022-09-23 12:29:39 +09:00
PJ Fanning
d66dea300e
HADOOP-18341: upgrade commons-configuration2 to 2.8.0 and commons-text to 1.9 (#4916) 2022-09-22 10:44:27 +09:00
Ashutosh Gupta
683fa264ee
HADOOP-16769. LocalDirAllocator to provide diagnostics when file creation fails (#4896)
The patch provides detailed diagnostics of file creation failure in LocalDirAllocator.

Contributed by: Ashutosh Gupta
2022-09-21 11:54:47 +05:30
Ashutosh Gupta
3af155ceeb HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 30c36ef25a)
2022-09-19 13:45:47 +09:00
Steve Vaughan
357c83db94
HDFS-16686. GetJournalEditServlet fails to authorize valid Kerberos request (#4724) (#4794) 2022-09-13 10:50:23 -07:00