HADOOP-15332. Fix typos in hadoop-aws markdown docs. Contributed by Gabor Bota.
This commit is contained in:
parent
2caba999bb
commit
7ce6b41509
@ -28,7 +28,7 @@ The standard commit algorithms (the `FileOutputCommitter` and its v1 and v2 algo
|
||||
rely on directory rename being an `O(1)` atomic operation: callers output their
|
||||
work to temporary directories in the destination filesystem, then
|
||||
rename these directories to the final destination as way of committing work.
|
||||
This is the perfect solution for commiting work against any filesystem with
|
||||
This is the perfect solution for committing work against any filesystem with
|
||||
consistent listing operations and where the `FileSystem.rename()` command
|
||||
is an atomic `O(1)` operation.
|
||||
|
||||
@ -60,7 +60,7 @@ delayed completion of multi-part PUT operations
|
||||
That is: tasks write all data as multipart uploads, *but delay the final
|
||||
commit action until until the final, single job commit action.* Only that
|
||||
data committed in the job commit action will be made visible; work from speculative
|
||||
and failed tasks will not be instiantiated. As there is no rename, there is no
|
||||
and failed tasks will not be instantiated. As there is no rename, there is no
|
||||
delay while data is copied from a temporary directory to the final directory.
|
||||
The duration of the commit will be the time needed to determine which commit operations
|
||||
to construct, and to execute them.
|
||||
@ -109,7 +109,7 @@ This is traditionally implemented via a `FileSystem.rename()` call.
|
||||
|
||||
It is useful to differentiate between a *task-side commit*: an operation performed
|
||||
in the task process after its work, and a *driver-side task commit*, in which
|
||||
the Job driver perfoms the commit operation. Any task-side commit work will
|
||||
the Job driver performs the commit operation. Any task-side commit work will
|
||||
be performed across the cluster, and may take place off the critical part for
|
||||
job execution. However, unless the commit protocol requires all tasks to await
|
||||
a signal from the job driver, task-side commits cannot instantiate their output
|
||||
@ -241,7 +241,7 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
|
||||
fs.rename(taskAttemptPath, taskCommittedPath)
|
||||
```
|
||||
|
||||
On a genuine fileystem this is an `O(1)` directory rename.
|
||||
On a genuine filesystem this is an `O(1)` directory rename.
|
||||
|
||||
On an object store with a mimiced rename, it is `O(data)` for the copy,
|
||||
along with overhead for listing and deleting all files (For S3, that's
|
||||
@ -257,13 +257,13 @@ def abortTask(fs, jobAttemptPath, taskAttemptPath, dest):
|
||||
fs.delete(taskAttemptPath, recursive=True)
|
||||
```
|
||||
|
||||
On a genuine fileystem this is an `O(1)` operation. On an object store,
|
||||
On a genuine filesystem this is an `O(1)` operation. On an object store,
|
||||
proportional to the time to list and delete files, usually in batches.
|
||||
|
||||
|
||||
### Job Commit
|
||||
|
||||
Merge all files/directories in all task commited paths into final destination path.
|
||||
Merge all files/directories in all task committed paths into final destination path.
|
||||
Optionally; create 0-byte `_SUCCESS` file in destination path.
|
||||
|
||||
```python
|
||||
@ -420,9 +420,9 @@ by renaming the files.
|
||||
A a key difference is that the v1 algorithm commits a source directory to
|
||||
via a directory rename, which is traditionally an `O(1)` operation.
|
||||
|
||||
In constrast, the v2 algorithm lists all direct children of a source directory
|
||||
In contrast, the v2 algorithm lists all direct children of a source directory
|
||||
and recursively calls `mergePath()` on them, ultimately renaming the individual
|
||||
files. As such, the number of renames it performa equals the number of source
|
||||
files. As such, the number of renames it performs equals the number of source
|
||||
*files*, rather than the number of source *directories*; the number of directory
|
||||
listings being `O(depth(src))` , where `depth(path)` is a function returning the
|
||||
depth of directories under the given path.
|
||||
@ -431,7 +431,7 @@ On a normal filesystem, the v2 merge algorithm is potentially more expensive
|
||||
than the v1 algorithm. However, as the merging only takes place in task commit,
|
||||
it is potentially less of a bottleneck in the entire execution process.
|
||||
|
||||
On an objcct store, it is suboptimal not just from its expectation that `rename()`
|
||||
On an object store, it is suboptimal not just from its expectation that `rename()`
|
||||
is an `O(1)` operation, but from its expectation that a recursive tree walk is
|
||||
an efficient way to enumerate and act on a tree of data. If the algorithm was
|
||||
switched to using `FileSystem.listFiles(path, recursive)` for a single call to
|
||||
@ -548,7 +548,7 @@ the final destination FS, while `file://` can retain the default
|
||||
|
||||
### Task Setup
|
||||
|
||||
`Task.initialize()`: read in the configuration, instantate the `JobContextImpl`
|
||||
`Task.initialize()`: read in the configuration, instantiate the `JobContextImpl`
|
||||
and `TaskAttemptContextImpl` instances bonded to the current job & task.
|
||||
|
||||
### Task Ccommit
|
||||
@ -610,7 +610,7 @@ deleting the previous attempt's data is straightforward. However, for S3 committ
|
||||
using Multipart Upload as the means of uploading uncommitted data, it is critical
|
||||
to ensure that pending uploads are always aborted. This can be done by
|
||||
|
||||
* Making sure that all task-side failure branvches in `Task.done()` call `committer.abortTask()`.
|
||||
* Making sure that all task-side failure branches in `Task.done()` call `committer.abortTask()`.
|
||||
* Having job commit & abort cleaning up all pending multipart writes to the same directory
|
||||
tree. That is: require that no other jobs are writing to the same tree, and so
|
||||
list all pending operations and cancel them.
|
||||
@ -653,7 +653,7 @@ rather than relying on fields initiated from the context passed to the construct
|
||||
|
||||
#### AM: Job setup: `OutputCommitter.setupJob()`
|
||||
|
||||
This is initated in `org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.StartTransition`.
|
||||
This is initiated in `org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.StartTransition`.
|
||||
It is queued for asynchronous execution in `org.apache.hadoop.mapreduce.v2.app.MRAppMaster.startJobs()`,
|
||||
which is invoked when the service is started. Thus: the job is set up when the
|
||||
AM is started.
|
||||
@ -686,7 +686,7 @@ the job is considered not to have attempted to commit itself yet.
|
||||
|
||||
|
||||
The presence of `COMMIT_SUCCESS` or `COMMIT_FAIL` are taken as evidence
|
||||
that the previous job completed successfully or unsucessfully; the AM
|
||||
that the previous job completed successfully or unsuccessfully; the AM
|
||||
then completes with a success/failure error code, without attempting to rerun
|
||||
the job.
|
||||
|
||||
@ -871,16 +871,16 @@ base directory. As well as translating the write operation, it also supports
|
||||
a `getFileStatus()` call on the original path, returning details on the file
|
||||
at the final destination. This allows for committing applications to verify
|
||||
the creation/existence/size of the written files (in contrast to the magic
|
||||
committer covdered below).
|
||||
committer covered below).
|
||||
|
||||
The FS targets Openstack Swift, though other object stores are supportable through
|
||||
different backends.
|
||||
|
||||
This solution is innovative in that it appears to deliver the same semantics
|
||||
(and hence failure modes) as the Spark Direct OutputCommitter, but which
|
||||
does not need any changs in either Spark *or* the Hadoop committers. In contrast,
|
||||
does not need any change in either Spark *or* the Hadoop committers. In contrast,
|
||||
the committers proposed here combines changing the Hadoop MR committers for
|
||||
ease of pluggability, and offers a new committer exclusivley for S3, one
|
||||
ease of pluggability, and offers a new committer exclusively for S3, one
|
||||
strongly dependent upon and tightly integrated with the S3A Filesystem.
|
||||
|
||||
The simplicity of the Stocator committer is something to appreciate.
|
||||
@ -922,7 +922,7 @@ The completion operation is apparently `O(1)`; presumably the PUT requests
|
||||
have already uploaded the data to the server(s) which will eventually be
|
||||
serving up the data for the final path. All that is needed to complete
|
||||
the upload is to construct an object by linking together the files in
|
||||
the server's local filesystem and udate an entry the index table of the
|
||||
the server's local filesystem and update an entry the index table of the
|
||||
object store.
|
||||
|
||||
In the S3A client, all PUT calls in the sequence and the final commit are
|
||||
@ -941,11 +941,11 @@ number of appealing features
|
||||
|
||||
The final point is not to be underestimated, es not even
|
||||
a need for a consistency layer.
|
||||
* Overall a simpler design.pecially given the need to
|
||||
* Overall a simpler design. Especially given the need to
|
||||
be resilient to the various failure modes which may arise.
|
||||
|
||||
|
||||
The commiter writes task outputs to a temporary directory on the local FS.
|
||||
The committer writes task outputs to a temporary directory on the local FS.
|
||||
Task outputs are directed to the local FS by `getTaskAttemptPath` and `getWorkPath`.
|
||||
On task commit, the committer enumerates files in the task attempt directory (ignoring hidden files).
|
||||
Each file is uploaded to S3 using the [multi-part upload API](http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html),
|
||||
@ -966,7 +966,7 @@ is a local `file://` reference.
|
||||
within a consistent, cluster-wide filesystem. For Netflix, that is HDFS.
|
||||
1. The Standard `FileOutputCommitter` (algorithm 1) is used to manage the commit/abort of these
|
||||
files. That is: it copies only those lists of files to commit from successful tasks
|
||||
into a (transient) job commmit directory.
|
||||
into a (transient) job commit directory.
|
||||
1. The S3 job committer reads the pending file list for every task committed
|
||||
in HDFS, and completes those put requests.
|
||||
|
||||
@ -1028,7 +1028,7 @@ complete at or near the same time, there may be a peak of bandwidth load
|
||||
slowing down the upload.
|
||||
|
||||
Time to commit will be the same, and, given the Netflix committer has already
|
||||
implemented the paralellization logic here, a time of `O(files/threads)`.
|
||||
implemented the parallelization logic here, a time of `O(files/threads)`.
|
||||
|
||||
### Resilience
|
||||
|
||||
@ -1105,7 +1105,7 @@ This is done by
|
||||
an abort of all successfully read files.
|
||||
1. List and abort all pending multipart uploads.
|
||||
|
||||
Because of action #2, action #1 is superflous. It is retained so as to leave
|
||||
Because of action #2, action #1 is superfluous. It is retained so as to leave
|
||||
open the option of making action #2 a configurable option -which would be
|
||||
required to handle the use case of >1 partitioned commit running simultaneously/
|
||||
|
||||
@ -1115,7 +1115,7 @@ Because the local data is managed with the v1 commit algorithm, the
|
||||
second attempt of the job will recover all the outstanding commit data
|
||||
of the first attempt; those tasks will not be rerun.
|
||||
|
||||
This also ensures that on a job abort, the invidual tasks' .pendingset
|
||||
This also ensures that on a job abort, the individual tasks' .pendingset
|
||||
files can be read and used to initiate the abort of those uploads.
|
||||
That is: a recovered job can clean up the pending writes of the previous job
|
||||
|
||||
@ -1129,7 +1129,7 @@ must be configured to automatically delete the pending request.
|
||||
Those uploads already executed by a failed job commit will persist; those
|
||||
yet to execute will remain outstanding.
|
||||
|
||||
The committer currently declares itself as non-recoverble, but that
|
||||
The committer currently declares itself as non-recoverable, but that
|
||||
may not actually hold, as the recovery process could be one of:
|
||||
|
||||
1. Enumerate all job commits from the .pendingset files (*:= Commits*).
|
||||
@ -1203,7 +1203,7 @@ that of the final job destination. When the job is committed, the pending
|
||||
writes are instantiated.
|
||||
|
||||
With the addition of the Netflix Staging committer, the actual committer
|
||||
code now shares common formats for the persistent metadadata and shared routines
|
||||
code now shares common formats for the persistent metadata and shared routines
|
||||
for parallel committing of work, including all the error handling based on
|
||||
the Netflix experience.
|
||||
|
||||
@ -1333,7 +1333,7 @@ during job and task committer initialization.
|
||||
|
||||
The job/task commit protocol is expected to handle this with the task
|
||||
only committing work when the job driver tells it to. A network partition
|
||||
should trigger the task committer's cancellation of the work (this is a protcol
|
||||
should trigger the task committer's cancellation of the work (this is a protocol
|
||||
above the committers).
|
||||
|
||||
#### Job Driver failure
|
||||
@ -1349,7 +1349,7 @@ when the job driver cleans up it will cancel pending writes under the directory.
|
||||
|
||||
#### Multiple jobs targeting the same destination directory
|
||||
|
||||
This leaves things in an inderminate state.
|
||||
This leaves things in an indeterminate state.
|
||||
|
||||
|
||||
#### Failure during task commit
|
||||
@ -1388,7 +1388,7 @@ Two options present themselves
|
||||
and test that code as appropriate.
|
||||
|
||||
Fixing the calling code does seem to be the best strategy, as it allows the
|
||||
failure to be explictly handled in the commit protocol, rather than hidden
|
||||
failure to be explicitly handled in the commit protocol, rather than hidden
|
||||
in the committer.::OpenFile
|
||||
|
||||
#### Preemption
|
||||
@ -1418,7 +1418,7 @@ with many millions of objects —rather than list all keys searching for those
|
||||
with `/__magic/**/*.pending` in their name, work backwards from the active uploads to
|
||||
the directories with the data.
|
||||
|
||||
We may also want to consider having a cleanup operationn in the S3 CLI to
|
||||
We may also want to consider having a cleanup operation in the S3 CLI to
|
||||
do the full tree scan and purge of pending items; give some statistics on
|
||||
what was found. This will keep costs down and help us identify problems
|
||||
related to cleanup.
|
||||
@ -1538,7 +1538,7 @@ The S3A Committer version, would
|
||||
|
||||
In order to support the ubiquitous `FileOutputFormat` and subclasses,
|
||||
S3A Committers will need somehow be accepted as a valid committer by the class,
|
||||
a class which explicity expects the output committer to be `FileOutputCommitter`
|
||||
a class which explicitly expects the output committer to be `FileOutputCommitter`
|
||||
|
||||
```java
|
||||
public Path getDefaultWorkFile(TaskAttemptContext context,
|
||||
@ -1555,10 +1555,10 @@ Here are some options which have been considered, explored and discarded
|
||||
|
||||
1. Adding more of a factory mechanism to create `FileOutputCommitter` instances;
|
||||
subclass this for S3A output and return it. The complexity of `FileOutputCommitter`
|
||||
and of supporting more dynamic consturction makes this dangerous from an implementation
|
||||
and of supporting more dynamic construction makes this dangerous from an implementation
|
||||
and maintenance perspective.
|
||||
|
||||
1. Add a new commit algorithmm "3", which actually reads in the configured
|
||||
1. Add a new commit algorithm "3", which actually reads in the configured
|
||||
classname of a committer which it then instantiates and then relays the commit
|
||||
operations, passing in context information. Ths new committer interface would
|
||||
add methods for methods and attributes. This is viable, but does still change
|
||||
@ -1695,7 +1695,7 @@ marker implied the classic `FileOutputCommitter` had been used; if it could be r
|
||||
then it provides some details on the commit operation which are then used
|
||||
in assertions in the test suite.
|
||||
|
||||
It has since been extended to collet metrics and other values, and has proven
|
||||
It has since been extended to collect metrics and other values, and has proven
|
||||
equally useful in Spark integration testing.
|
||||
|
||||
## Integrating the Committers with Apache Spark
|
||||
@ -1727,8 +1727,8 @@ tree.
|
||||
|
||||
Alternatively, the fact that Spark tasks provide data to the job committer on their
|
||||
completion means that a list of pending PUT commands could be built up, with the commit
|
||||
operations being excuted by an S3A-specific implementation of the `FileCommitProtocol`.
|
||||
As noted earlier, this may permit the reqirement for a consistent list operation
|
||||
operations being executed by an S3A-specific implementation of the `FileCommitProtocol`.
|
||||
As noted earlier, this may permit the requirement for a consistent list operation
|
||||
to be bypassed. It would still be important to list what was being written, as
|
||||
it is needed to aid aborting work in failed tasks, but the list of files
|
||||
created by successful tasks could be passed directly from the task to committer,
|
||||
@ -1833,7 +1833,7 @@ quotas in local FS, keeping temp dirs on different mounted FS from root.
|
||||
The intermediate `.pendingset` files are saved in HDFS under the directory in
|
||||
`fs.s3a.committer.staging.tmp.path`; defaulting to `/tmp`. This data can
|
||||
disclose the workflow (it contains the destination paths & amount of data
|
||||
generated), and if deleted, breaks the job. If malicous code were to edit
|
||||
generated), and if deleted, breaks the job. If malicious code were to edit
|
||||
the file, by, for example, reordering the ordered etag list, the generated
|
||||
data would be committed out of order, creating invalid files. As this is
|
||||
the (usually transient) cluster FS, any user in the cluster has the potential
|
||||
@ -1848,7 +1848,7 @@ The directory defined by `fs.s3a.buffer.dir` is used to buffer blocks
|
||||
before upload, unless the job is configured to buffer the blocks in memory.
|
||||
This is as before: no incremental risk. As blocks are deleted from the filesystem
|
||||
after upload, the amount of storage needed is determined by the data generation
|
||||
bandwidth and the data upload bandwdith.
|
||||
bandwidth and the data upload bandwidth.
|
||||
|
||||
No use is made of the cluster filesystem; there are no risks there.
|
||||
|
||||
@ -1946,6 +1946,6 @@ which will made absolute relative to the current user. In filesystems in
|
||||
which access under user's home directories are restricted, this final, absolute
|
||||
path, will not be visible to untrusted accounts.
|
||||
|
||||
* Maybe: define the for valid characters in a text strings, and a regext for
|
||||
* Maybe: define the for valid characters in a text strings, and a regex for
|
||||
validating, e,g, `[a-zA-Z0-9 \.\,\(\) \-\+]+` and then validate any free text
|
||||
JSON fields on load and save.
|
||||
|
@ -226,14 +226,14 @@ it is committed through the standard "v1" commit algorithm.
|
||||
When the Job is committed, the Job Manager reads the lists of pending writes from its
|
||||
HDFS Job destination directory and completes those uploads.
|
||||
|
||||
Cancelling a task is straighforward: the local directory is deleted with
|
||||
Cancelling a task is straightforward: the local directory is deleted with
|
||||
its staged data. Cancelling a job is achieved by reading in the lists of
|
||||
pending writes from the HDFS job attempt directory, and aborting those
|
||||
uploads. For extra safety, all outstanding multipart writes to the destination directory
|
||||
are aborted.
|
||||
|
||||
The staging committer comes in two slightly different forms, with slightly
|
||||
diffrent conflict resolution policies:
|
||||
different conflict resolution policies:
|
||||
|
||||
|
||||
* **Directory**: the entire directory tree of data is written or overwritten,
|
||||
@ -278,7 +278,7 @@ any with the same name. Reliable use requires unique names for generated files,
|
||||
which the committers generate
|
||||
by default.
|
||||
|
||||
The difference between the two staging ommitters are as follows:
|
||||
The difference between the two staging committers are as follows:
|
||||
|
||||
The Directory Committer uses the entire directory tree for conflict resolution.
|
||||
If any file exists at the destination it will fail in job setup; if the resolution
|
||||
@ -301,7 +301,7 @@ It's intended for use in Apache Spark Dataset operations, rather
|
||||
than Hadoop's original MapReduce engine, and only in jobs
|
||||
where adding new data to an existing dataset is the desired goal.
|
||||
|
||||
Preequisites for successful work
|
||||
Prerequisites for successful work
|
||||
|
||||
1. The output is written into partitions via `PARTITIONED BY` or `partitionedBy()`
|
||||
instructions.
|
||||
@ -401,7 +401,7 @@ Generated files are initially written to a local directory underneath one of the
|
||||
directories listed in `fs.s3a.buffer.dir`.
|
||||
|
||||
|
||||
The staging commmitter needs a path in the cluster filesystem
|
||||
The staging committer needs a path in the cluster filesystem
|
||||
(e.g. HDFS). This must be declared in `fs.s3a.committer.staging.tmp.path`.
|
||||
|
||||
Temporary files are saved in HDFS (or other cluster filesystem) under the path
|
||||
@ -460,7 +460,7 @@ What the partitioned committer does is, where the tooling permits, allows caller
|
||||
to add data to an existing partitioned layout*.
|
||||
|
||||
More specifically, it does this by having a conflict resolution options which
|
||||
only act on invididual partitions, rather than across the entire output tree.
|
||||
only act on individual partitions, rather than across the entire output tree.
|
||||
|
||||
| `fs.s3a.committer.staging.conflict-mode` | Meaning |
|
||||
| -----------------------------------------|---------|
|
||||
@ -508,7 +508,7 @@ documentation to see if it is consistent, hence compatible "out of the box".
|
||||
<property>
|
||||
<name>fs.s3a.committer.magic.enabled</name>
|
||||
<description>
|
||||
Enable support in the filesystem for the S3 "Magic" committter.
|
||||
Enable support in the filesystem for the S3 "Magic" committer.
|
||||
</description>
|
||||
<value>true</value>
|
||||
</property>
|
||||
@ -706,7 +706,7 @@ This message should not appear through the committer itself —it will
|
||||
fail with the error message in the previous section, but may arise
|
||||
if other applications are attempting to create files under the path `/__magic/`.
|
||||
|
||||
Make sure the filesytem meets the requirements of the magic committer
|
||||
Make sure the filesystem meets the requirements of the magic committer
|
||||
(a consistent S3A filesystem through S3Guard or the S3 service itself),
|
||||
and set the `fs.s3a.committer.magic.enabled` flag to indicate that magic file
|
||||
writes are supported.
|
||||
@ -741,7 +741,7 @@ at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.
|
||||
While that will not make the problem go away, it will at least make
|
||||
the failure happen at the start of a job.
|
||||
|
||||
(Setting this option will not interfer with the Staging Committers' use of HDFS,
|
||||
(Setting this option will not interfere with the Staging Committers' use of HDFS,
|
||||
as it explicitly sets the algorithm to "2" for that part of its work).
|
||||
|
||||
The other way to check which committer to use is to examine the `_SUCCESS` file.
|
||||
|
@ -23,7 +23,7 @@
|
||||
The S3A filesystem client supports Amazon S3's Server Side Encryption
|
||||
for at-rest data encryption.
|
||||
You should to read up on the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html)
|
||||
for S3 Server Side Encryption for up to date information on the encryption mechansims.
|
||||
for S3 Server Side Encryption for up to date information on the encryption mechanisms.
|
||||
|
||||
|
||||
|
||||
@ -135,7 +135,7 @@ it blank to use the default configured for that region.
|
||||
the right to use it, uses it to encrypt the object-specific key.
|
||||
|
||||
|
||||
When downloading SSE-KMS encrypte data, the sequence is as follows
|
||||
When downloading SSE-KMS encrypted data, the sequence is as follows
|
||||
|
||||
1. The S3A client issues an HTTP GET request to read the data.
|
||||
1. S3 sees that the data was encrypted with SSE-KMS, and looks up the specific key in the KMS service
|
||||
@ -413,8 +413,8 @@ a KMS key hosted in the AWS-KMS service in the same region.
|
||||
|
||||
```
|
||||
|
||||
Again the approprate bucket policy can be used to guarantee that all callers
|
||||
will use SSE-KMS; they can even mandata the name of the key used to encrypt
|
||||
Again the appropriate bucket policy can be used to guarantee that all callers
|
||||
will use SSE-KMS; they can even mandate the name of the key used to encrypt
|
||||
the data, so guaranteeing that access to thee data can be read by everyone
|
||||
granted access to that key, and nobody without access to it.
|
||||
|
||||
|
@ -638,7 +638,7 @@ over that of the `hadoop.security` list (i.e. they are prepended to the common l
|
||||
</property>
|
||||
```
|
||||
|
||||
This was added to suppport binding different credential providers on a per
|
||||
This was added to support binding different credential providers on a per
|
||||
bucket basis, without adding alternative secrets in the credential list.
|
||||
However, some applications (e.g Hive) prevent the list of credential providers
|
||||
from being dynamically updated by users. As per-bucket secrets are now supported,
|
||||
@ -938,7 +938,7 @@ The S3A client makes a best-effort attempt at recovering from network failures;
|
||||
this section covers the details of what it does.
|
||||
|
||||
The S3A divides exceptions returned by the AWS SDK into different categories,
|
||||
and chooses a differnt retry policy based on their type and whether or
|
||||
and chooses a different retry policy based on their type and whether or
|
||||
not the failing operation is idempotent.
|
||||
|
||||
|
||||
@ -969,7 +969,7 @@ These failures will be retried with a fixed sleep interval set in
|
||||
`fs.s3a.retry.interval`, up to the limit set in `fs.s3a.retry.limit`.
|
||||
|
||||
|
||||
### Only retrible on idempotent operations
|
||||
### Only retriable on idempotent operations
|
||||
|
||||
Some network failures are considered to be retriable if they occur on
|
||||
idempotent operations; there's no way to know if they happened
|
||||
@ -997,11 +997,11 @@ it's a no-op if reprocessed. As indeed, is `Filesystem.delete()`.
|
||||
1. Any filesystem supporting an atomic `FileSystem.create(path, overwrite=false)`
|
||||
operation to reject file creation if the path exists MUST NOT consider
|
||||
delete to be idempotent, because a `create(path, false)` operation will
|
||||
only succeed if the first `delete()` call has already succeded.
|
||||
only succeed if the first `delete()` call has already succeeded.
|
||||
1. And a second, retried `delete()` call could delete the new data.
|
||||
|
||||
Because S3 is eventially consistent *and* doesn't support an
|
||||
atomic create-no-overwrite operation, the choice is more ambigious.
|
||||
Because S3 is eventually consistent *and* doesn't support an
|
||||
atomic create-no-overwrite operation, the choice is more ambiguous.
|
||||
|
||||
Currently S3A considers delete to be
|
||||
idempotent because it is convenient for many workflows, including the
|
||||
@ -1045,11 +1045,11 @@ Notes
|
||||
1. There is also throttling taking place inside the AWS SDK; this is managed
|
||||
by the value `fs.s3a.attempts.maximum`.
|
||||
1. Throttling events are tracked in the S3A filesystem metrics and statistics.
|
||||
1. Amazon KMS may thottle a customer based on the total rate of uses of
|
||||
1. Amazon KMS may throttle a customer based on the total rate of uses of
|
||||
KMS *across all user accounts and applications*.
|
||||
|
||||
Throttling of S3 requests is all too common; it is caused by too many clients
|
||||
trying to access the same shard of S3 Storage. This generatlly
|
||||
trying to access the same shard of S3 Storage. This generally
|
||||
happen if there are too many reads, those being the most common in Hadoop
|
||||
applications. This problem is exacerbated by Hive's partitioning
|
||||
strategy used when storing data, such as partitioning by year and then month.
|
||||
@ -1087,7 +1087,7 @@ of data asked for in every GET request, as well as how much data is
|
||||
skipped in the existing stream before aborting it and creating a new stream.
|
||||
1. If the DynamoDB tables used by S3Guard are being throttled, increase
|
||||
the capacity through `hadoop s3guard set-capacity` (and pay more, obviously).
|
||||
1. KMS: "consult AWS about increating your capacity".
|
||||
1. KMS: "consult AWS about increasing your capacity".
|
||||
|
||||
|
||||
|
||||
@ -1173,14 +1173,14 @@ fs.s3a.bucket.nightly.server-side-encryption-algorithm
|
||||
```
|
||||
|
||||
When accessing the bucket `s3a://nightly/`, the per-bucket configuration
|
||||
options for that backet will be used, here the access keys and token,
|
||||
options for that bucket will be used, here the access keys and token,
|
||||
and including the encryption algorithm and key.
|
||||
|
||||
|
||||
### <a name="per_bucket_endpoints"></a>Using Per-Bucket Configuration to access data round the world
|
||||
|
||||
S3 Buckets are hosted in different "regions", the default being "US-East".
|
||||
The S3A client talks to this region by default, issing HTTP requests
|
||||
The S3A client talks to this region by default, issuing HTTP requests
|
||||
to the server `s3.amazonaws.com`.
|
||||
|
||||
S3A can work with buckets from any region. Each region has its own
|
||||
@ -1331,12 +1331,12 @@ The "fast" output stream
|
||||
to the available disk space.
|
||||
1. Generates output statistics as metrics on the filesystem, including
|
||||
statistics of active and pending block uploads.
|
||||
1. Has the time to `close()` set by the amount of remaning data to upload, rather
|
||||
1. Has the time to `close()` set by the amount of remaining data to upload, rather
|
||||
than the total size of the file.
|
||||
|
||||
Because it starts uploading while data is still being written, it offers
|
||||
significant benefits when very large amounts of data are generated.
|
||||
The in memory buffering mechanims may also offer speedup when running adjacent to
|
||||
The in memory buffering mechanisms may also offer speedup when running adjacent to
|
||||
S3 endpoints, as disks are not used for intermediate data storage.
|
||||
|
||||
|
||||
@ -1400,7 +1400,7 @@ upload operation counts, so identifying when there is a backlog of work/
|
||||
a mismatch between data generation rates and network bandwidth. Per-stream
|
||||
statistics can also be logged by calling `toString()` on the current stream.
|
||||
|
||||
* Files being written are still invisible untl the write
|
||||
* Files being written are still invisible until the write
|
||||
completes in the `close()` call, which will block until the upload is completed.
|
||||
|
||||
|
||||
@ -1526,7 +1526,7 @@ compete with other filesystem operations.
|
||||
|
||||
We recommend a low value of `fs.s3a.fast.upload.active.blocks`; enough
|
||||
to start background upload without overloading other parts of the system,
|
||||
then experiment to see if higher values deliver more throughtput —especially
|
||||
then experiment to see if higher values deliver more throughput —especially
|
||||
from VMs running on EC2.
|
||||
|
||||
```xml
|
||||
@ -1569,10 +1569,10 @@ from VMs running on EC2.
|
||||
There are two mechanisms for cleaning up after leftover multipart
|
||||
uploads:
|
||||
- Hadoop s3guard CLI commands for listing and deleting uploads by their
|
||||
age. Doumented in the [S3Guard](./s3guard.html) section.
|
||||
age. Documented in the [S3Guard](./s3guard.html) section.
|
||||
- The configuration parameter `fs.s3a.multipart.purge`, covered below.
|
||||
|
||||
If an large stream writeoperation is interrupted, there may be
|
||||
If a large stream write operation is interrupted, there may be
|
||||
intermediate partitions uploaded to S3 —data which will be billed for.
|
||||
|
||||
These charges can be reduced by enabling `fs.s3a.multipart.purge`,
|
||||
|
@ -506,7 +506,7 @@ Input seek policy: fs.s3a.experimental.input.fadvise=normal
|
||||
|
||||
Note that other clients may have a S3Guard table set up to store metadata
|
||||
on this bucket; the checks are all done from the perspective of the configuration
|
||||
setttings of the current client.
|
||||
settings of the current client.
|
||||
|
||||
```bash
|
||||
hadoop s3guard bucket-info -guarded -auth s3a://landsat-pds
|
||||
@ -798,6 +798,6 @@ The IO load of clients of the (shared) DynamoDB table was exceeded.
|
||||
Currently S3Guard doesn't do any throttling and retries here; the way to address
|
||||
this is to increase capacity via the AWS console or the `set-capacity` command.
|
||||
|
||||
## Other Topis
|
||||
## Other Topics
|
||||
|
||||
For details on how to test S3Guard, see [Testing S3Guard](./testing.html#s3guard)
|
||||
|
@ -28,7 +28,7 @@ be ignored.
|
||||
|
||||
## <a name="policy"></a> Policy for submitting patches which affect the `hadoop-aws` module.
|
||||
|
||||
The Apache Jenkins infrastucture does not run any S3 integration tests,
|
||||
The Apache Jenkins infrastructure does not run any S3 integration tests,
|
||||
due to the need to keep credentials secure.
|
||||
|
||||
### The submitter of any patch is required to run all the integration tests and declare which S3 region/implementation they used.
|
||||
@ -319,10 +319,10 @@ mvn verify -Dparallel-tests -Dscale -DtestsThreadCount=8
|
||||
|
||||
The most bandwidth intensive tests (those which upload data) always run
|
||||
sequentially; those which are slow due to HTTPS setup costs or server-side
|
||||
actionsare included in the set of parallelized tests.
|
||||
actions are included in the set of parallelized tests.
|
||||
|
||||
|
||||
### <a name="tuning_scale"></a> Tuning scale optins from Maven
|
||||
### <a name="tuning_scale"></a> Tuning scale options from Maven
|
||||
|
||||
|
||||
Some of the tests can be tuned from the maven build or from the
|
||||
@ -344,7 +344,7 @@ then the configuration value is used. The `unset` option is used to
|
||||
|
||||
Only a few properties can be set this way; more will be added.
|
||||
|
||||
| Property | Meaninging |
|
||||
| Property | Meaning |
|
||||
|-----------|-------------|
|
||||
| `fs.s3a.scale.test.timeout`| Timeout in seconds for scale tests |
|
||||
| `fs.s3a.scale.test.huge.filesize`| Size for huge file uploads |
|
||||
@ -493,7 +493,7 @@ cases must be disabled:
|
||||
<value>false</value>
|
||||
</property>
|
||||
```
|
||||
These tests reqest a temporary set of credentials from the STS service endpoint.
|
||||
These tests request a temporary set of credentials from the STS service endpoint.
|
||||
An alternate endpoint may be defined in `test.fs.s3a.sts.endpoint`.
|
||||
|
||||
```xml
|
||||
@ -641,7 +641,7 @@ to support the declaration of a specific large test file on alternate filesystem
|
||||
|
||||
### Works Over Long-haul Links
|
||||
|
||||
As well as making file size and operation counts scaleable, this includes
|
||||
As well as making file size and operation counts scalable, this includes
|
||||
making test timeouts adequate. The Scale tests make this configurable; it's
|
||||
hard coded to ten minutes in `AbstractS3ATestBase()`; subclasses can
|
||||
change this by overriding `getTestTimeoutMillis()`.
|
||||
@ -677,7 +677,7 @@ Tests can overrun `createConfiguration()` to add new options to the configuratio
|
||||
file for the S3A Filesystem instance used in their tests.
|
||||
|
||||
However, filesystem caching may mean that a test suite may get a cached
|
||||
instance created with an differennnt configuration. For tests which don't need
|
||||
instance created with an different configuration. For tests which don't need
|
||||
specific configurations caching is good: it reduces test setup time.
|
||||
|
||||
For those tests which do need unique options (encryption, magic files),
|
||||
@ -888,7 +888,7 @@ s3a://bucket/a/b/c/DELAY_LISTING_ME
|
||||
```
|
||||
|
||||
In real-life S3 inconsistency, however, we expect that all the above paths
|
||||
(including `a` and `b`) will be subject to delayed visiblity.
|
||||
(including `a` and `b`) will be subject to delayed visibility.
|
||||
|
||||
### Using the `InconsistentAmazonS3CClient` in downstream integration tests
|
||||
|
||||
@ -952,7 +952,7 @@ When the `s3guard` profile is enabled, following profiles can be specified:
|
||||
DynamoDB web service; launch the server and creating the table.
|
||||
You won't be charged bills for using DynamoDB in test. As it runs in-JVM,
|
||||
the table isn't shared across other tests running in parallel.
|
||||
* `non-auth`: treat the S3Guard metadata as authorative.
|
||||
* `non-auth`: treat the S3Guard metadata as authoritative.
|
||||
|
||||
```bash
|
||||
mvn -T 1C verify -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Ddynamo -Dauth
|
||||
@ -984,7 +984,7 @@ throttling, and compare performance for different implementations. These
|
||||
are included in the scale tests executed when `-Dscale` is passed to
|
||||
the maven command line.
|
||||
|
||||
The two S3Guard scale testse are `ITestDynamoDBMetadataStoreScale` and
|
||||
The two S3Guard scale tests are `ITestDynamoDBMetadataStoreScale` and
|
||||
`ITestLocalMetadataStoreScale`. To run the DynamoDB test, you will need to
|
||||
define your table name and region in your test configuration. For example,
|
||||
the following settings allow us to run `ITestDynamoDBMetadataStoreScale` with
|
||||
|
@ -967,7 +967,7 @@ Again, this is due to the fact that the data is cached locally until the
|
||||
`close()` operation. The S3A filesystem cannot be used as a store of data
|
||||
if it is required that the data is persisted durably after every
|
||||
`Syncable.hflush()` or `Syncable.hsync()` call.
|
||||
This includes resilient logging, HBase-style journalling
|
||||
This includes resilient logging, HBase-style journaling
|
||||
and the like. The standard strategy here is to save to HDFS and then copy to S3.
|
||||
|
||||
## <a name="encryption"></a> S3 Server Side Encryption
|
||||
|
Loading…
Reference in New Issue
Block a user