This has triggered an OOM in a process which was churning through s3a fs
instances; the increased memory footprint of IOStatistics amplified what
must have been a long-standing issue with FS instances being created
and not closed()
* Makes sure instrumentation is closed when the FS is closed.
* Uses a weak reference from metrics to instrumentation, so even
if the FS wasn't closed (see HADOOP-18478), this back reference
would not cause the S3AInstrumentation reference to be retained.
* If S3AFileSystem is configured to log at TRACE it will log the
calling stack of initialize(), so help identify where the
instance is being created. This should help track down
the cause of instance leakage.
Contributed by Steve Loughran.
This addresses HADOOP-18521, "ABFS ReadBufferManager buffer sharing
across concurrent HTTP requests" by not trying to cancel
in progress reads.
It supercedes HADOOP-18528, which disables the prefetching.
If that patch is applied *after* this one, prefetching
will be disabled.
As well as changing the default value in the code,
core-default.xml is updated to set
fs.azure.enable.readahead = true
As a result, if Configuration.get("fs.azure.enable.readahead")
returns a non-null value, then it can be inferred that
it was set in or core-default.xml (the fix is present)
or in core-site.xml (someone asked for it).
Note: this commit contains the followup commit:
That is needed to avoid race conditions in the test.
Contributed by Pranav Saxena.
* Exactly 1 sending thread per an RPC connection.
* If the calling thread is interrupted before the socket write, it will be skipped instead of sending it anyways.
* If the calling thread is interrupted during the socket write, the write will finish.
* RPC requests will be written to the socket in the order received.
* Sending thread is only started by the receiving thread.
* The sending thread periodically checks the shouldCloseConnection flag.
Disables block prefetching on ABFS InputStreams, by setting
fs.azure.enable.readahead to false in core-default.xml and
the matching java constant.
This prevents
HADOOP-18521. ABFS ReadBufferManager buffer sharing across concurrent HTTP requests.
Once a fix for that is committed, this change can be reverted.
Contributed by Mehakmeet Singh.
* HDFS-15383. RBF: Add support for router delegation token without watch (#2047)
Improving router's performance for delegation tokens related operations. It achieves the goal by removing watchers from router on tokens since based on our experience. The huge number of watches inside Zookeeper is degrading Zookeeper's performance pretty hard. The current limit is about 1.2-1.5 million.
* HADOOP-17835. Use CuratorCache implementation instead of PathChildrenCache / TreeCache (#3266)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
Co-authored-by: lfengnan <lfengnan@uber.com>
Co-authored-by: Viraj Jasani <vjasani@apache.org>
Co-authored-by: Melissa You <myou@myou-mn1.linkedin.biz>
Move construction of XML parsers in YARN
modules to using the locked-down parser factory
of HADOOP-18469.
One exception: GpuDeviceInformationParser still supports DTD resolution;
all other features are disabled.
Contributed by P J Fanning
Moves from com.sun.jersey 1.19 to the artifact
com.github.pjfanning:jersey-json:1.20
This allows jackson 1 to be removed from the classpath.
Contains
* HADOOP-16908. Prune Jackson 1 from the codebase and restrict
its usage for future
* HADOOP-18219. Fix shaded client test failure
These are needed for the HADOOP-15983 changes to build.
Contributed by PJ Fanning.
The swift:// connector for openstack support has been removed.
The hadoop-openstack jar remains, only now it is empty of code.
This is to ensure that projects which declare the JAR a dependency
will still have successful builds.
Contributed by Steve Loughran
Add to XMLUtils a set of methods to create secure XML Parsers/transformers,
locking down DTD, schema, XXE exposure.
Use these wherever XML parsers are created.
Contributed by PJ Fanning
part of HADOOP-18103.
Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.
Contributed by: Mukund Thakur
Conflicts:
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
This problem surfaced in impala integration tests
IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references
The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.
Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.
Contributed by Steve Loughran.
part of HADOOP-18103.
While merging the ranges in CheckSumFs, they are rounded up based on the
value of checksum bytes size which leads to some ranges crossing the EOF
thus they need to be fixed else it will cause EOFException during actual reads.
Contributed By: Mukund Thakur
Use the existing DomainNameResolver to leverage the pluggable resolution framework. This provides a means to perform a reverse lookup if needed.
Update default implementation of DNSDomainNameResolver to protect against returning the IP address as a string from a cached value.
Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
Back port to branch-3.3, to avoid reconnecting to the old address after detecting that the address has been updated.
* Use a stable hashCode to allow safe IP addr changes
* Add test that updated address is used
Once the address has been updated, it will be used in future calls. Test verifies that a second request succeeds and that it uses the existing updated address instead of having to re-resolve.
Co-authored-by: Steve Vaughan Jr <s_vaughan@apple.com>
The name of the option to enable/disable thread level statistics is
"fs.iostatistics.thread.level.enabled";
There is also an enabled() probe in IOStatisticsContext which can
be used to see if the thread level statistics is active.
Contributed by Viraj Jasani
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
(cherry picked from commit a432925f74)
This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:
* Retrieved for a thread and cached for access from other
threads.
* reset() to record new statistics.
* Queried for live statistics through the
IOStatisticsSource.getIOStatistics() method.
* Queries for a statistics aggregator for use in instrumented
classes.
* Asked to create a serializable copy in snapshot()
The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.
Some changes in IOStatistics-gathering classes are needed for
this feature
* Caching the active context's aggregator in the object's
constructor
* Updating it in close()
Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.
Currently the IOStatisticsContext-aware classes are:
* The S3A input stream, output stream and list iterators.
* RawLocalFileSystem's input and output streams.
* The S3A committers.
* The TaskPool class in hadoop-common, which propagates
the active context into scheduled worker threads.
Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature
is considered stable.
To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;
Contributed by Mehakmeet Singh and Steve Loughran
Partial/Incomplete groups list can be returned in LDAP groups lookup.
Backported in #4550; minor tuning of parameters needed.
Contributed by larry mccay
Reduce the ExitUtil synchronized block scopes so System.exit
and Runtime.halt calls aren't within their boundaries,
so ExitUtil wrappers do not block each other.
Enlarged catches to all Throwables (not just Exceptions).
Contributed by Remi Catherinot
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.
Contributed By: Mukund Thakur
Conflicts:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.
Contributed By: Mukund Thakur
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.
Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.
Contributed By: Owen O'Malley and Mukund Thakur
Conflicts:
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/RawLocalFileSystem.java
pom.xml
Updating the hadoop version of branch-3.3 to 3.3.9-SNAPSHOT
pending agreement on what number its future release should take.
Using 3.3.9-SNAPSHOT puts space in for other incremental releases,
while avoiding creating JIRA release ordering and autocompletion
confusion the way adding a 3.3.10 or higher version would do.
Contributed by Steve Loughran
Speed up the magic committer with key changes being
* Writes under __magic always retain directory markers
* File creation under __magic skips all overwrite checks,
including the LIST call intended to stop files being
created over dirs.
* mkdirs under __magic probes the path for existence
but does not look any further.
Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.
The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.
Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`
Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.
The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.
Contributed by Steve Loughran.
Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.
Contributed by zhengchenyu
* Add the changelog and release notes
* add all jdiff XML files
* update the project pom with the new stable version
Change-Id: Iaea846c3e451bbd446b45de146845a48953d580d
This defines standard option and values for the
openFile() builder API for opening a file:
fs.option.openfile.read.policy
A list of the desired read policy, in preferred order.
standard values are
adaptive, default, random, sequential, vector, whole-file
fs.option.openfile.length
How long the file is.
fs.option.openfile.split.start
start of a task's split
fs.option.openfile.split.end
end of a task's split
These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data
The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.
Contributed by Steve Loughran.
Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
To get the new behavior, define fs.viewfs.trash.force-inside-mount-point to be true.
If the trash root for path p is in the same mount point as path p,
and one of:
* The mount point isn't at the top of the target fs.
* The resolved path of path is root (eg it is the fallback FS).
* The trash root isn't in user's target fs home directory.
get the corresponding viewFS path for the trash root and return it.
Otherwise, use <mnt>/.Trash/<user>.
Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
(cherry picked from commit 8b8158f02d)
Co-authored-by: Xing Lin <xinglin@linkedin.com>
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.
Contributed by Ahmar Suhail.
Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
RBF proxy. There is a new configuration knob dfs.namenode.ip-proxy-users that configures
the list of users than can set their client ip address using the client context.
Fixes#4081
* New statistic names in StoreStatisticNames
(for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist
+ tests.
This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.
Contributed by Steve Loughran
Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654