Contributed by Steve Loughran.
This moves the new API of HDFS-13616 into a interface which is implemented by
HDFS RPC filesystem client (not WebHDFS or any other connector)
This new interface, BatchListingOperations, is in hadoop-common,
so applications do not need to be compiled with HDFS on the classpath.
They must cast the FS into the interface.
instanceof can probe the client for having the new interface -the patch
also adds a new path capability to probe for this.
The FileSystem implementation is cut; tests updated as appropriate.
All new interfaces/classes/constants are marked as @unstable.
Change-Id: I5623c51f2c75804f58f915dd7e60cb2cffdac681
Contributed by Steve Loughran.
Not all stores do complete validation here; in particular the S3A
Connector does not: checking up the entire directory tree to see if a path matches
is a file significantly slows things down.
This check does take place in S3A mkdirs(), which walks backwards up the list of
parent paths until it finds a directory (success) or a file (failure).
In practice production applications invariably create destination directories
before writing 1+ file into them -restricting check purely to the mkdirs()
call deliver significant speed up while implicitly including the checks.
Change-Id: I2c9df748e92b5655232e7d888d896f1868806eb0
Followup to HADOOP-16885: Encryption zone file copy failure leaks a temp file
Moving the delete() call broke a mocking test, which slipped through the review process.
Contributed by Steve Loughran.
Change-Id: Ia13faf0f4fffb1c99ddd616d823e4f4d0b7b0cbb
AreContributed by Mukund Thakur.
This addresses an issue which surfaced with KMS encryption: the wrong
KMS key could be picked up in the S3 COPY operation, so
renamed files, while encrypted, would end up with the
bucket default key.
As well as adding tests in the new suite
ITestS3AEncryptionWithDefaultS3Settings,
AbstractSTestS3AHugeFiles has a new test method to
verify that the encryption settings also work
for large files copied via multipart operations.
Contributed by Sergei Poganshev.
Catches Exception instead of IOException in closeStream()
and so handle exceptions such as SdkClientException by
aborting the wrapped stream. This will increase resilience
to failures, as any which occuring during stream closure
will be caught. Furthermore, because the
underlying HTTP connection is aborted, rather than closed,
it will not be recycled to cause problems on subsequent
operations.
Contributed by Xiaoyu Yao.
Contains HDFS-14892. Close the output stream if createWrappedOutputStream() fails
Copying file through the FsShell command into an HDFS encryption zone where
the caller lacks permissions is leaks a temp ._COPYING file
and potentially a wrapped stream unclosed.
This is a convergence of a fix for S3 meeting an issue in HDFS.
S3: a HEAD against a file can cache a 404,
-you must not do any existence checks, including deleteOnExit(),
until the file is written.
Hence: HADOOP-16490, only register files for deletion the create worked
and the upload is not direct.
HDFS-14892. HDFS doesn't close wrapped streams when IOEs are raised on
create() failures. Which means that an entry is retained on the NN.
-you need to register a file with deleteOnExit() even if the file wasn't
created.
This patch:
* Moves the deleteOnExit to ensure the created file get deleted cleanly.
* Fixes HDFS to close the wrapped stream on failures.