This is the API and implementation classes of HADOOP-16830,
which allows callers to query IO object instances
(filesystems, streams, remote iterators, ...) and other classes
for statistics on their I/O Usage: operation count and min/max/mean
durations.
New Packages
org.apache.hadoop.fs.statistics.
Public API, including:
IOStatisticsSource
IOStatistics
IOStatisticsSnapshot (seralizable to java objects and json)
+helper classes for logging and integration
BufferedIOStatisticsInputStream
implements IOStatisticsSource and StreamCapabilities
BufferedIOStatisticsOutputStream
implements IOStatisticsSource, Syncable and StreamCapabilities
org.apache.hadoop.fs.statistics.impl
Implementation classes for internal use.
org.apache.hadoop.util.functional
functional programming support for RemoteIterators and
other operations which raise IOEs; all wrapper classes
implement and propagate IOStatisticsSource
Contributed by Steve Loughran.
This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".
This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.
The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.
If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.
Contributed by Steve Loughran.
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs
Contributed by Steve Loughran.
Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.
In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.
Contributed by Steve Loughran.
Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.
To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.
This comesin as an avro dependency, so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7
Contributed by DB Tsai and Liang-Chi Hsieh
When a filesystem is closed, the FileSystem log will, at debug level,
log the method calling close/closeAll.
At trace level: the full calling stack.
Contributed by Karen Coppage.