Make S3APrefetchingInputStream.seek() completely lazy. Calls to seek() will not affect the current buffer nor interfere with prefetching, until read() is called.
This change allows various usage patterns to benefit from prefetching, e.g. when calling readFully(position, buffer) in a loop for contiguous positions the intermediate internal calls to seek() will be noops and prefetching will have the same performance as in a sequential read.
Contributed by Alessandro Passaro.
part of HADOOP-18103.
Also introducing a config fs.s3a.vectored.active.ranged.reads
to configure the maximum number of number of range reads a
single input stream can have active (downloading, or queued)
to the central FileSystem instance's pool of queued operations.
This stops a single stream overloading the shared thread pool.
Contributed by: Mukund Thakur
This problem surfaced in impala integration tests
IMPALA-11592. TestLocalCatalogRetries.test_fetch_metadata_retry fails in S3 build
after the change
HADOOP-17461. Add thread-level IOStatistics Context
The actual GC race condition came with
HADOOP-18091. S3A auditing leaks memory through ThreadLocal references
The fix for this is, if our hypothesis is correct, in WeakReferenceMap.create()
where a strong reference to the new value is kept in a local variable
*and referred to later* so that the JVM will not GC it.
Along with the fix, extra assertions ensure that if the problem is not fixed,
applications will fail faster/more meaningfully.
Contributed by Steve Loughran.
part of HADOOP-18103.
While merging the ranges in CheckSumFs, they are rounded up based on the
value of checksum bytes size which leads to some ranges crossing the EOF
thus they need to be fixed else it will cause EOFException during actual reads.
Contributed By: Mukund Thakur
* This PR adds an option
use.platformToolsetVersion that
makes the build systems to use
this platform toolset version.
* This also makes sure that
win-vs-upgrade.cmd does not get
executed when the
use.platformToolsetVersion
option is specified.
Avoid reconnecting to the old address after detecting that the address has been updated.
* Fix Checkstyle line length violation
* Keep ConnectionId as Immutable for map key
The ConnectionId is used as a key in the connections map, and updating the remoteId caused problems with the cleanup of connections when the removeMethod was used.
Instead of updating the address within the remoteId, use the removeMethod to cleanup references to the current identifier and then replace it with a new identifier using the updated address.
* Use final to protect immutable ConnectionId
Mark non-test fields as private and final, and add a missing accessor.
* Use a stable hashCode to allow safe IP addr changes
* Add test that updated address is used
Once the address has been updated, it should be used in future calls. Check to ensure that a second request succeeds and that it uses the existing updated address instead of having to re-resolve.
Signed-off-by: Nick Dimiduk <ndimiduk@apache.org>
Signed-off-by: sokui
Signed-off-by: XanderZu
Signed-off-by: stack <stack@apache.org>
This is the the preview release of the HADOOP-18028 S3A performance input stream.
It is still stabilizing, but ready to test.
Contains
HADOOP-18028. High performance S3A input stream (#4109)
Contributed by Bhalchandra Pandit.
HADOOP-18180. Replace use of twitter util-core with java futures (#4115)
Contributed by PJ Fanning.
HADOOP-18177. Document prefetching architecture. (#4205)
Contributed by Ahmar Suhail
HADOOP-18175. fix test failures with prefetching s3a input stream (#4212)
Contributed by Monthon Klongklaew
HADOOP-18231. S3A prefetching: fix failing tests & drain stream async. (#4386)
* adds in new test for prefetching input stream
* creates streamStats before opening stream
* updates numBlocks calculation method
* fixes ITestS3AOpenCost.testOpenFileLongerLength
* drains stream async
* fixes failing unit test
Contributed by Ahmar Suhail
HADOOP-18254. Disable S3A prefetching by default. (#4469)
Contributed by Ahmar Suhail
HADOOP-18190. Collect IOStatistics during S3A prefetching (#4458)
This adds iOStatisticsConnection to the S3PrefetchingInputStream class, with
new statistic names in StreamStatistics.
This stream is not (yet) IOStatisticsContext aware.
Contributed by Ahmar Suhail
HADOOP-18379 rebase feature/HADOOP-18028-s3a-prefetch to trunk
HADOOP-18187. Convert s3a prefetching to use JavaDoc for fields and enums.
HADOOP-18318. Update class names to be clear they belong to S3A prefetching
Contributed by Steve Loughran
The name of the option to enable/disable thread level statistics is
"fs.iostatistics.thread.level.enabled";
There is also an enabled() probe in IOStatisticsContext which can
be used to see if the thread level statistics is active.
Contributed by Viraj Jasani
This adds a thread-level collector of IOStatistics, IOStatisticsContext,
which can be:
* Retrieved for a thread and cached for access from other
threads.
* reset() to record new statistics.
* Queried for live statistics through the
IOStatisticsSource.getIOStatistics() method.
* Queries for a statistics aggregator for use in instrumented
classes.
* Asked to create a serializable copy in snapshot()
The goal is to make it possible for applications with multiple
threads performing different work items simultaneously
to be able to collect statistics on the individual threads,
and so generate aggregate reports on the total work performed
for a specific job, query or similar unit of work.
Some changes in IOStatistics-gathering classes are needed for
this feature
* Caching the active context's aggregator in the object's
constructor
* Updating it in close()
Slightly more work is needed in multithreaded code,
such as the S3A committers, which collect statistics across
all threads used in task and job commit operations.
Currently the IOStatisticsContext-aware classes are:
* The S3A input stream, output stream and list iterators.
* RawLocalFileSystem's input and output streams.
* The S3A committers.
* The TaskPool class in hadoop-common, which propagates
the active context into scheduled worker threads.
Collection of statistics in the IOStatisticsContext
is disabled process-wide by default until the feature
is considered stable.
To enable the collection, set the option
fs.thread.level.iostatistics.enabled
to "true" in core-site.xml;
Contributed by Mehakmeet Singh and Steve Loughran
Reduce the ExitUtil synchronized block scopes so System.exit
and Runtime.halt calls aren't within their boundaries,
so ExitUtil wrappers do not block each other.
Enlarged catches to all Throwables (not just Exceptions).
Contributed by Remi Catherinot
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
Update the dependencies of the LDAP libraries used for testing:
ldap-api.version = 2.0.0
apacheds.version = 2.0.0.AM26
Contributed by Colm O hEigeartaigh.
part of HADOOP-18103.
Handling memory fragmentation in S3A vectored IO implementation by
allocating smaller user range requested size buffers and directly
filling them from the remote S3 stream and skipping undesired
data in between ranges.
This patch also adds aborting active vectored reads when stream is
closed or unbuffer() is called.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Required for vectored IO feature. None of current buffer pool
implementation is complete. ElasticByteBufferPool doesn't use
weak references and could lead to memory leak errors and
DirectBufferPool doesn't support caller preferences of direct
and heap buffers and has only fixed length buffer implementation.
Contributed By: Mukund Thakur
Part of HADOOP-18103.
Introducing fs.s3a.vectored.read.min.seek.size and fs.s3a.vectored.read.max.merged.size
to configure min seek and max read during a vectored IO operation in S3A connector.
These properties actually define how the ranges will be merged. To completely
disable merging set fs.s3a.max.readsize.vectored.read to 0.
Contributed By: Mukund Thakur
part of HADOOP-18103.
Add support for multiple ranged vectored read api in PositionedReadable.
The default iterates through the ranges to read each synchronously,
but the intent is that FSDataInputStream subclasses can make more
efficient readers especially in object stores implementation.
Also added implementation in S3A where smaller ranges are merged and
sliced byte buffers are returned to the readers. All the merged ranged are
fetched from S3 asynchronously.
Contributed By: Owen O'Malley and Mukund Thakur
Regression caused by HDFS-16563; the hdfs exception text was changed, but because it was
a YARN test doing the check, Yetus didn't notice.
Contributed by zhengchenyu
Speed up the magic committer with key changes being
* Writes under __magic always retain directory markers
* File creation under __magic skips all overwrite checks,
including the LIST call intended to stop files being
created over dirs.
* mkdirs under __magic probes the path for existence
but does not look any further.
Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.
The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.
Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`
Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.
The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.
Contributed by Steve Loughran.
* Add the changelog and release notes
* add all jdiff XML files
* update the project pom with the new stable version
Change-Id: Iaea846c3e451bbd446b45de146845a48953d580d
This defines standard option and values for the
openFile() builder API for opening a file:
fs.option.openfile.read.policy
A list of the desired read policy, in preferred order.
standard values are
adaptive, default, random, sequential, vector, whole-file
fs.option.openfile.length
How long the file is.
fs.option.openfile.split.start
start of a task's split
fs.option.openfile.split.end
end of a task's split
These can be used by filesystem connectors to optimize their
reading of the source file, including but not limited to
* skipping existence/length probes when opening a file
* choosing a policy for prefetching/caching data
The hadoop shell commands which read files all declare "whole-file"
and "sequential", as appropriate.
Contributed by Steve Loughran.
Change-Id: Ia290f79ea7973ce8713d4f90f1315b24d7a23da1
* HADOOP-18172: Change scope of InodeTree and its member methods to make them accessible from outside package.
Co-authored-by: Xing Lin <xinglin@linkedin.com>
Optimize the scan for s3 by performing a deep tree listing,
inferring directory counts from the paths returned.
Contributed by Ahmar Suhail.
Change-Id: I26ffa8c6f65fd11c68a88d6e2243b0eac6ffd024
RBF proxy. There is a new configuration knob dfs.namenode.ip-proxy-users that configures
the list of users than can set their client ip address using the client context.
Fixes#4081
* New statistic names in StoreStatisticNames
(for joint use with s3a committers)
* Improvements to IOStatistics implementation classes
* RateLimiting wrapper to guava RateLimiter
* S3A committer Tasks moved over as TaskPool and
added support for RemoteIterator
* JsonSerialization.load() to fail fast if source does not exist
+ tests.
This commit is a prerequisite for the main MAPREDUCE-7341 Manifest Committer
patch.
Contributed by Steve Loughran
Change-Id: Ia92e2ab5083ac3d8d3d713a4d9cb3e9e0278f654
To get the new behavior, define fs.viewfs.trash.force-inside-mount-point to be true.
If the trash root for path p is in the same mount point as path p,
and one of:
* The mount point isn't at the top of the target fs.
* The resolved path of path is root (eg it is the fallback FS).
* The trash root isn't in user's target fs home directory.
get the corresponding viewFS path for the trash root and return it.
Otherwise, use <mnt>/.Trash/<user>.
Signed-off-by: Owen O'Malley <oomalley@linkedin.com>
Multi object delete of size more than 1000 is not supported by S3 and
fails with MalformedXML error. So implementing paging of requests to
reduce the number of keys in a single request. Page size can be configured
using "fs.s3a.bulk.delete.page.size"
Contributed By: Mukund Thakur
Adds a new map type WeakReferenceMap, which stores weak
references to values, and a WeakReferenceThreadMap subclass
to more closely resemble a thread local type, as it is a
map of threadId to value.
Construct it with a factory method and optional callback
for notification on loss and regeneration.
WeakReferenceThreadMap<WrappingAuditSpan> activeSpan =
new WeakReferenceThreadMap<>(
(k) -> getUnbondedSpan(),
this::noteSpanReferenceLost);
This is used in ActiveAuditManagerS3A for span tracking.
Relates to
* HADOOP-17511. Add an Audit plugin point for S3A
* HADOOP-18094. Disable S3A auditing by default.
Contributed by Steve Loughran.
Completely removes S3Guard support from the S3A codebase.
If the connector is configured to use any metastore other than
the null and local stores (i.e. DynamoDB is selected) the s3a client
will raise an exception and refuse to initialize.
This is to ensure that there is no mix of S3Guard enabled and disabled
deployments with the same configuration but different hadoop releases
-it must be turned off completely.
The "hadoop s3guard" command has been retained -but the supported
subcommands have been reduced to those which are not purely S3Guard
related: "bucket-info" and "uploads".
This is major change in terms of the number of files
changed; before cherry picking subsequent s3a patches into
older releases, this patch will probably need backporting
first.
Goodbye S3Guard, your work is done. Time to die.
Contributed by Steve Loughran.
This switches the default behavior of S3A output streams
to warning that Syncable.hsync() or hflush() have been
called; it's not considered an error unless the defaults
are overridden.
This avoids breaking applications which call the APIs,
at the risk of people trying to use S3 as a safe store
of streamed data (HBase WALs, audit logs etc).
Contributed by Steve Loughran.
Add support for S3 Access Points. This provides extra security as it
ensures applications are not working with buckets belong to third parties.
To bind a bucket to an access point, set the access point (ap) ARN,
which must be done for each specific bucket, using the pattern
fs.s3a.bucket.$BUCKET.accesspoint.arn = ARN
* The global/bucket option `fs.s3a.accesspoint.required` to
mandate that buckets must declare their access point.
* This is not compatible with S3Guard.
Consult the documentation for further details.
Contributed by Bogdan Stolojan
Addresses the problem of processes running out of memory when
there are many ABFS output streams queuing data to upload,
especially when the network upload bandwidth is less than the rate
data is generated.
ABFS Output streams now buffer their blocks of data to
"disk", "bytebuffer" or "array", as set in
"fs.azure.data.blocks.buffer"
When buffering via disk, the location for temporary storage
is set in "fs.azure.buffer.dir"
For safe scaling: use "disk" (default); for performance, when
confident that upload bandwidth will never be a bottleneck,
experiment with the memory options.
The number of blocks a single stream can have queued for uploading
is set in "fs.azure.block.upload.active.blocks".
The default value is 20.
Contributed by Mehakmeet Singh.
* HDFS-16129. Fixing the signature secret file misusage in HttpFS.
The signature secret file was not used in HttpFs.
- if the configuration did not contain the deprecated
httpfs.authentication.signature.secret.file option then it
used the random secret provider
- if both option (httpfs. and hadoop.http.) was set then
the HttpFSAuthenticationFilter could not read the file
because the file path was not substituted properly
!NOTE! behavioral change: the deprecated httpfs. configuration
values are overwritten with the hadoop.http. values.
The commit also contains a follow up change to the YARN-10814,
empty secret files will result in a random secret provider.
Co-authored-by: Tamas Domok <tdomok@cloudera.com>
This adds a new class org.apache.hadoop.util.Preconditions which is
* @Private/@Unstable
* Intended to allow us to move off Google Guava
* Is designed to be trivially backportable
(i.e contains no references to guava classes internally)
Please use this instead of the guava equivalents, where possible.
Contributed by: Ahmed Hussein
Change-Id: Ic392451bcfe7d446184b7c995734bcca8c07286e
This migrates the fs.s3a-server-side encryption configuration options
to a name which covers client-side encryption too.
fs.s3a.server-side-encryption-algorithm becomes fs.s3a.encryption.algorithm
fs.s3a.server-side-encryption.key becomes fs.s3a.encryption.key
The existing keys remain valid, simply deprecated and remapped
to the new values. If you want server-side encryption options
to be picked up regardless of hadoop versions, use
the old keys.
(the old key also works for CSE, though as no version of Hadoop
with CSE support has shipped without this remapping, it's less
relevant)
Contributed by: Mehakmeet Singh
This migrates the fs.s3a-server-side encryption configuration options
to a name which covers client-side encryption too.
fs.s3a.server-side-encryption-algorithm becomes fs.s3a.encryption.algorithm
fs.s3a.server-side-encryption.key becomes fs.s3a.encryption.key
The existing keys remain valid, simply deprecated and remapped
to the new values. If you want server-side encryption options
to be picked up regardless of hadoop versions, use
the old keys.
(the old key also works for CSE, though as no version of Hadoop
with CSE support has shipped without this remapping, it's less
relevant)
Contributed by: Mehakmeet Singh
* Router to support resolving monitored namenodes with DNS
* Style
* fix style and test failure
* Add test for NNHAServiceTarget const
* Resolve comments
* Fix test
* Comments and style
* Create a simple function to extract port
* Use LambdaTestUtils.intercept
* fix javadoc
* Trigger Build
* CredentialProviderFactory to detect and report on recursion.
* S3AFS to remove incompatible providers.
* Integration Test for this.
Contributed by Steve Loughran.
Currently, GzipCodec only supports BuiltInGzipDecompressor, if native zlib is not loaded. So, without Hadoop native codec installed, saving SequenceFile using GzipCodec will throw exception like "SequenceFile doesn't work with GzipCodec without native-hadoop code!"
Same as other codecs which we migrated to using prepared packages (lz4, snappy), it will be better if we support GzipCodec generally without Hadoop native codec installed. Similar to BuiltInGzipDecompressor, we can use Java Deflater to support BuiltInGzipCompressor.
Fixes the regression caused by HADOOP-17511 by moving where the
option fs.s3a.acl.default is read -doing it before the RequestFactory
is created.
Adds
* A unit test in TestRequestFactory to verify the ACLs are set
on all file write operations.
* A new ITestS3ACannedACLs test which verifies that ACLs really
do get all the way through.
* S3A Assumed Role delegation tokens to include the IAM permission
s3:PutObjectAcl in the generated role.
Contributed by Steve Loughran
This patch cuts down the size of directory trees used for
distcp contract tests against object stores, so making
them much faster against distant/slow stores.
On abfs, the test only runs with -Dscale (as was the case for s3a already),
and has the larger scale test timeout.
After every test case, the FileSystem IOStatistics are logged,
to provide information about what IO is taking place and
what it's performance is.
There are some test cases which upload files of 1+ MiB; you can
increase the size of the upload in the option
"scale.test.distcp.file.size.kb"
Set it to zero and the large file tests are skipped.
Contributed by Steve Loughran.
This work
* Defines the behavior of FileSystem.copyFromLocal in filesystem.md
* Implements a high performance implementation of copyFromLocalOperation
for S3
* Adds a contract test for the operation: AbstractContractCopyFromLocalTest
* Implements the contract tests for Local and S3A FileSystems
Contributed by: Bogdan Stolojan
The rest endpoint would be unusable with an empty secret file
(throwing IllegalArgumentExceptions).
Any IO error would have resulted in the same fallback path.
Co-authored-by: Tamas Domok <tdomok@cloudera.com>
This (big!) patch adds support for client side encryption in AWS S3,
with keys managed by AWS-KMS.
Read the documentation in encryption.md very, very carefully before
use and consider it unstable.
S3-CSE is enabled in the existing configuration option
"fs.s3a.server-side-encryption-algorithm":
fs.s3a.server-side-encryption-algorithm=CSE-KMS
fs.s3a.server-side-encryption.key=<KMS_KEY_ID>
You cannot enable CSE and SSE in the same client, although
you can still enable a default SSE option in the S3 console.
* Filesystem list/get status operations subtract 16 bytes from the length
of all files >= 16 bytes long to compensate for the padding which CSE
adds.
* The SDK always warns about the specific algorithm chosen being
deprecated. It is critical to use this algorithm for ranged
GET requests to work (i.e. random IO). Ignore.
* Unencrypted files CANNOT BE READ.
The entire bucket SHOULD be encrypted with S3-CSE.
* Uploading files may be a bit slower as blocks are now
written sequentially.
* The Multipart Upload API is disabled when S3-CSE is active.
Contributed by Mehakmeet Singh