Stops the abfs connector warning if openFile().withFileStatus()
is invoked with a FileStatus is not an abfs VersionedFileStatus.
Contributed by Steve Loughran.
Change-Id: I85076b365eb30aaef2ed35139fa8714efd4d048e
S3A input stream support for the few fs.option.openfile settings.
As well as supporting the read policy option and values,
if the file length is declared in fs.option.openfile.length
then no HEAD request will be issued when opening a file.
This can cut a few tens of milliseconds off the operation.
The patch adds a new openfile parameter/FS configuration option
fs.s3a.input.async.drain.threshold (default: 16000).
It declares the number of bytes remaining in the http input stream
above which any operation to read and discard the rest of the stream,
"draining", is executed asynchronously.
This asynchronous draining offers some performance benefit on seek-heavy
file IO.
Contributed by Steve Loughran.
Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39
These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.
As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.
This commit depends on the associated hadoop-common patch,
which must be committed first.
Contributed by Steve Loughran.
Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
This defines standard option and values for the
openFile() builder API for opening a file:
fs.option.openfile.read.policy
A list of the desired read policy, in preferred order.
standard values are
adaptive, default, random, sequential, vector, whole-file
fs.option.openfile.length
How long the file is.
fs.option.openfile.split.start
start of a task's split
fs.option.openfile.split.end
end of a task's split
These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data
The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.
Contributed by Steve Loughran.
Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
* HADOOP-18172: Change scope of InodeTree and its member methods to make them accessible from outside package.
Co-authored-by: Xing Lin <xinglin@linkedin.com>
Since April 2022/CVE-2022-24765, git refuses to work in directories
whose owner != the current user, unless explicitly told to trust it.
This patches the create-release script to trust the /build/source
dir mounted from the hosting OS, whose userid is inevitably different
from that of the account in the container running git.
Contributed by: Steve Loughran, Ayush Saxena and the new git error messages
* The source files for hdfs_tail
uses getopt for parsing the
command line arguments.
* getopt is available only on
Linux and thus, isn't cross
platform.
* We need to replace getopt
with boost::program_options
to make these tools cross
platform.
* The source files for hdfs_stat
uses getopt for parsing the
command line arguments.
* getopt is available only on
Linux and thus, isn't cross platform.
* We need to replace getopt with
boost::program_options to make
this tool cross platform.
* The source files for hdfs_setrep
uses getopt for parsing the
command line arguments.
* getopt is available only on Linux
and thus, isn't cross platform.
* We need to replace getopt
with boost::program_options
to make this tool cross platform.