Adds an Abortable.abort() interface for streams to enable output streams to be terminated; this
is implemented by the S3A connector's output stream. It allows for commit protocols
to be implemented which commit/abort work by writing to the final destination and
using the abort() call to cancel any write which is not intended to be committed.
Consult the specification document for information about the interface and its use.
Contributed by Jungtaek Lim and Steve Loughran.
Change-Id: I7fcc25e9dd8c10ce6c29f383529f3a2642a201ae
This defines what output streams and especially those which implement
Syncable are meant to do, and documents where implementations (HDFS; S3)
don't. With tests.
The file:// FileSystem now supports Syncable if an application calls
FileSystem.setWriteChecksum(false) before creating a file -checksumming
and Syncable.hsync() are incompatible.
Contributed by Steve Loughran.
Change-Id: I892d768de6268f4dd6f175b3fe3b7e5bcaa91194
The ABFS connector now implements listStatusIterator() with
asynchronous prefetching of the next page(s) of results.
For listing large directories this can provide tangible speedups.
If for any reason this needs to be disabled, set
fs.azure.enable.abfslistiterator to false.
Contributed by Bilahari T H.
Change-Id: Ic9a52b80df1d0ffed4c81beae92c136e2a12698c
When 403 is returned from an ABFS HTTP call, an AccessDeniedException is raised.
The exception text is unchanged, for any application string matching on the getMessage() contents.
Contributed by Steve Loughran.
Change-Id: I519d50ccd657968fd8ee72d132518099de901e15
* core-default.xml updated so that fs.s3a.committer.magic.enabled = true
* CommitConstants updated to match
* All tests which previously enabled the magic committer now rely on
default settings. This helps make sure it is enabled.
* Docs cover the switch, mention its enabled and explain why you may
want to disable it.
Note: this doesn't switch to using the committer -it just enables the path
rewriting magic which it depends on.
Contributed by Steve Loughran.
This needs SPARK-33739 in the matching spark branch in order to work
Contributed by Steve Loughran.
Change-Id: I4fe75b057159e35aacc072da3cb7343467c0c3f1
Caused by HADOOP-16830 and HADOOP-17271.
Fixes tests which fail intermittently based on configs and
in the case of the HugeFile tests, bulk runs with existing
FS instances meant statistic probes sometimes ended up probing those
of a previous FS.
Contributed by Steve Loughran.
Change-Id: I65ba3f44444e59d298df25ac5c8dc5a8781dfb7d
S3A connector to support the IOStatistics API of HADOOP-16830,
This is a major rework of the S3A Statistics collection to
* Embrace the IOStatistics APIs
* Move from direct references of S3AInstrumention statistics
collectors to interface/implementation classes in new packages.
* Ubiquitous support of IOStatistics, including:
S3AFileSystem, input and output streams, RemoteIterator instances
provided in list calls.
* Adoption of new statistic names from hadoop-common
Regarding statistic collection, as well as all existing
statistics, the connector now records min/max/mean durations
of HTTP GET and HEAD requests, and those of LIST operations.
Contributed by Steve Loughran.
Change-Id: I182d34b6ac39e017a8b4a221dad8e930882b39cf
Part of the HADOOP-16830 IOStatistics API feature.
If the source FileSystem's listing RemoteIterators
implement IOStatisticsSource, these are collected and served through
the IOStatisticsSource API. If they are not: getIOStatistics() returns
null.
Only the listing statistics are collected; FileSystem.globStatus() doesn't
provide any, so IO use there is not included in the aggregate results.
Contributed by Steve Loughran.
Change-Id: Iff1485297c2c7e181b54eaf1d2c4f80faeee7cfa