* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
(cherry picked from commit a432925f74)
Fixes CVE-2018-7489 in shaded jackson.
+Add more commands in testing.md
to the CLI tests needed when qualifying
a release
Contributed by Steve Loughran
This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:
* Retrieved for a thread and cached for access from other
threads.
* reset() to record new statistics.
* Queried for live statistics through the
IOStatisticsSource.getIOStatistics() method.
* Queries for a statistics aggregator for use in instrumented
classes.
* Asked to create a serializable copy in snapshot()
The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.
Some changes in IOStatistics-gathering classes are needed for
this feature
* Caching the active context's aggregator in the object's
constructor
* Updating it in close()
Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.
Currently the IOStatisticsContext-aware classes are:
* The S3A input stream, output stream and list iterators.
* RawLocalFileSystem's input and output streams.
* The S3A committers.
* The TaskPool class in hadoop-common, which propagates
the active context into scheduled worker threads.
Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature
is considered stable.
To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;
Contributed by Mehakmeet Singh and Steve Loughran
This downgrades jackson from the version switched to in
HADOOP-18033 (2.13.0), to Jackson 2.12.7.
This removes the dependency on javax.ws.rs-api,
so avoiding runtime problems with applications using
jersey-core v1 and/or jsr311-api.
The 2.12.7 release still contains the fix for CVE-2020-36518.
Contributed by PJ Fanning
Partial/Incomplete groups list can be returned in LDAP groups lookup.
Backported in #4550; minor tuning of parameters needed.
Contributed by larry mccay
Reduce the ExitUtil synchronized block scopes so System.exit
and Runtime.halt calls aren't within their boundaries,
so ExitUtil wrappers do not block each other.
Enlarged catches to all Throwables (not just Exceptions).
Contributed by Remi Catherinot
HADOOP-18313: AliyunOSSBlockOutputStream should not mark the temporary file for deletion. Contributed by wujinhu.
(cherry picked from commit 3ec4b932c1)
ABFS rename fails intermittently when the Storage-blob tracking
metadata is in an incomplete state. This surfaces as the error code
404 and an error message of "RenameDestinationParentPathNotFound"
To mitigate this issue, when a request fails with this response.
the ABFS client issues a HEAD call on the source file
and then retries the rename operation again
ABFS filesystem statistics track when this occurs with new counters
rename_recovery
metadata_incomplete_rename_failures
rename_path_attempts
This is very rare occurrence and appears to be triggered under certain
heavy load conditions, just as with HADOOP-18163.
Contributed by Mehakmeet Singh.
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.
Contributed By: Mukund Thakur
Conflicts:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.
Contributed By: Mukund Thakur
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.
Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.
Contributed By: Owen O'Malley and Mukund Thakur
Conflicts:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
pom.xml
Updating the hadoop version of branch-3.3 to 3.3.9-SNAPSHOT
pending agreement on what number its future release should take.
Using 3.3.9-SNAPSHOT puts space in for other incremental releases,
while avoiding creating JIRA release ordering and autocompletion
confusion the way adding a 3.3.10 or higher version would do.
Contributed by Steve Loughran
Bump cos_api-bundle to 5.6.69
All copies of httpclient, including shaded ones in libraries used
by the s3a, gs and cos cloud connectors, turn out to load their
TLD list from the same resource mozilla/public-suffix-list.txt
Updating the hadoop-cos dependency ensures that its version
of public-suffix-list.txt is up to date -and so the s3a connector
able to talk to s3 resources if the cos-api-bundle JAR is where
the resource is loaded from.
Contributed by André Fonseca
Speed up the magic committer with key changes being
* Writes under __magic always retain directory markers
* File creation under __magic skips all overwrite checks,
including the LIST call intended to stop files being
created over dirs.
* mkdirs under __magic probes the path for existence
but does not look any further.
Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.
The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.
Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`
Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.
The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.
Contributed by Steve Loughran.
Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.
Contributed by zhengchenyu
This is a followup to HADOOP-18275 and its upgrade of os-maven-plugin.version
When that change is merged in, this MUST follow it.
Contributed by Steve Loughran
Change-Id: I61d087041561eeb8c9c42b5b7d8f0bb63f296b15
Adds a new option fs.s3a.create.storage.class which can
be used to set the storage class for files created in AWS S3.
Consult the documentation for details and instructions on how
disable the relevant tests when testing against third-party
stores.
Contributed by Monthon Klongklaew
Change-Id: I8cdebadf294a89fde08d98729ad96f251d58411c