Caused by HADOOP-16830 and HADOOP-17271.
Fixes tests which fail intermittently based on configs and
in the case of the HugeFile tests, bulk runs with existing
FS instances meant statistic probes sometimes ended up probing those
of a previous FS.
Contributed by Steve Loughran.
Change-Id: I65ba3f44444e59d298df25ac5c8dc5a8781dfb7d
Caused by HADOOP-16380 and HADOOP-17271.
Fixes tests which fail intermittently based on configs and
in the case of the HugeFile tests, bulk runs with existing
FS instances meant statistic probes sometimes ended up probing those
of a previous FS.
Contributed by Steve Loughran.
This is the API and implementation classes of HADOOP-16830,
which allows callers to query IO object instances
(filesystems, streams, remote iterators, ...) and other classes
for statistics on their I/O Usage: operation count and min/max/mean
durations.
New Packages
org.apache.hadoop.fs.statistics.
Public API, including:
IOStatisticsSource
IOStatistics
IOStatisticsSnapshot (seralizable to java objects and json)
+helper classes for logging and integration
BufferedIOStatisticsInputStream
implements IOStatisticsSource and StreamCapabilities
BufferedIOStatisticsOutputStream
implements IOStatisticsSource, Syncable and StreamCapabilities
org.apache.hadoop.fs.statistics.impl
Implementation classes for internal use.
org.apache.hadoop.util.functional
functional programming support for RemoteIterators and
other operations which raise IOEs; all wrapper classes
implement and propagate IOStatisticsSource
Contributed by Steve Loughran.
This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".
This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.
The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.
If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.
Contributed by Steve Loughran.
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs
Contributed by Steve Loughran.
Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.
In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.
Contributed by Steve Loughran.
Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.
To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.
This comesin as an avro dependency, so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7
Contributed by DB Tsai and Liang-Chi Hsieh
When a filesystem is closed, the FileSystem log will, at debug level,
log the method calling close/closeAll.
At trace level: the full calling stack.
Contributed by Karen Coppage.
This changes directory tree deletion so that only files are incrementally deleted
from S3Guard after the objects are deleted; the directories are left alone
until metadataStore.deleteSubtree(path) is invoke.
This avoids directory tombstones being added above files/child directories,
which stop the treewalk and delete phase from working.
Also:
* Callback to delete objects splits files and dirs so that
any problems deleting the dirs doesn't trigger s3guard updates
* New statistic to measure #of objects deleted, alongside request count.
* Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers
to clarify behavior.
* Test enhancements to replicate the failure and verify the fix
Contributed by Steve Loughran
Contributed by Steve Loughran.
* Fixes AbstractContractSeekTest test to use readFully
* Doesn't do this to AbstractContractUnbufferTest test as it changes the test too much.
Instead just notes in the error that this may be transient
The issue is that read(buffer) doesn't guarantee that the buffer is filled, only that it will
read up to a point, and that may be just be the amount of data left in the TCP packet.
readFully corrects for this, but using it in the unbuffer test runs the risk that what
is tested for in terms of unbuffering doesn't actually get validated.
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.
This feature is *not* backwards compatible.
Consult the documentation and use with care.
Contributed by Steve Loughran.
Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
* Passing C/C++ standard flags -std is
not cross-compiler friendly as not all
compilers support all values.
* Thus, we need to make use of the
appropriate flags provided by CMake in
order to specify the C/C++ standards.
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>