HADOOP-17117 Fix typos in hadoop-aws documentation (#2127)
(cherry picked from commit 5b1ed2113b
)
This commit is contained in:
parent
10c9df1d0a
commit
f9619b0b97
@ -242,7 +242,7 @@ def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
|
|||||||
|
|
||||||
On a genuine filesystem this is an `O(1)` directory rename.
|
On a genuine filesystem this is an `O(1)` directory rename.
|
||||||
|
|
||||||
On an object store with a mimiced rename, it is `O(data)` for the copy,
|
On an object store with a mimicked rename, it is `O(data)` for the copy,
|
||||||
along with overhead for listing and deleting all files (For S3, that's
|
along with overhead for listing and deleting all files (For S3, that's
|
||||||
`(1 + files/500)` lists, and the same number of delete calls.
|
`(1 + files/500)` lists, and the same number of delete calls.
|
||||||
|
|
||||||
@ -476,7 +476,7 @@ def needsTaskCommit(fs, jobAttemptPath, taskAttemptPath, dest):
|
|||||||
|
|
||||||
def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
|
def commitTask(fs, jobAttemptPath, taskAttemptPath, dest):
|
||||||
if fs.exists(taskAttemptPath) :
|
if fs.exists(taskAttemptPath) :
|
||||||
mergePathsV2(fs. taskAttemptPath, dest)
|
mergePathsV2(fs, taskAttemptPath, dest)
|
||||||
```
|
```
|
||||||
|
|
||||||
### v2 Task Abort
|
### v2 Task Abort
|
||||||
@ -903,7 +903,7 @@ not be a problem.
|
|||||||
IBM's [Stocator](https://github.com/SparkTC/stocator) can transform indirect
|
IBM's [Stocator](https://github.com/SparkTC/stocator) can transform indirect
|
||||||
writes of V1/V2 committers into direct writes to the destination directory.
|
writes of V1/V2 committers into direct writes to the destination directory.
|
||||||
|
|
||||||
Hpw does it do this? It's a special Hadoop `FileSystem` implementation which
|
How does it do this? It's a special Hadoop `FileSystem` implementation which
|
||||||
recognizes writes to `_temporary` paths and translate them to writes to the
|
recognizes writes to `_temporary` paths and translate them to writes to the
|
||||||
base directory. As well as translating the write operation, it also supports
|
base directory. As well as translating the write operation, it also supports
|
||||||
a `getFileStatus()` call on the original path, returning details on the file
|
a `getFileStatus()` call on the original path, returning details on the file
|
||||||
@ -969,7 +969,7 @@ It is that fact, that a different process may perform different parts
|
|||||||
of the upload, which make this algorithm viable.
|
of the upload, which make this algorithm viable.
|
||||||
|
|
||||||
|
|
||||||
## The Netfix "Staging" committer
|
## The Netflix "Staging" committer
|
||||||
|
|
||||||
Ryan Blue, of Netflix, has submitted an alternate committer, one which has a
|
Ryan Blue, of Netflix, has submitted an alternate committer, one which has a
|
||||||
number of appealing features
|
number of appealing features
|
||||||
@ -1081,7 +1081,7 @@ output reaches the job commit.
|
|||||||
Similarly, if a task is aborted, temporary output on the local FS is removed.
|
Similarly, if a task is aborted, temporary output on the local FS is removed.
|
||||||
|
|
||||||
If a task dies while the committer is running, it is possible for data to be
|
If a task dies while the committer is running, it is possible for data to be
|
||||||
eft on the local FS or as unfinished parts in S3.
|
left on the local FS or as unfinished parts in S3.
|
||||||
Unfinished upload parts in S3 are not visible to table readers and are cleaned
|
Unfinished upload parts in S3 are not visible to table readers and are cleaned
|
||||||
up following the rules in the target bucket's life-cycle policy.
|
up following the rules in the target bucket's life-cycle policy.
|
||||||
|
|
||||||
|
@ -159,7 +159,7 @@ the number of files, during which time partial updates may be visible. If
|
|||||||
the operations are interrupted, the filesystem is left in an intermediate state.
|
the operations are interrupted, the filesystem is left in an intermediate state.
|
||||||
|
|
||||||
|
|
||||||
### Warning #2: Directories are mimiced
|
### Warning #2: Directories are mimicked
|
||||||
|
|
||||||
The S3A clients mimics directories by:
|
The S3A clients mimics directories by:
|
||||||
|
|
||||||
@ -184,7 +184,7 @@ Parts of Hadoop relying on this can have unexpected behaviour. E.g. the
|
|||||||
performance recursive listings whenever possible.
|
performance recursive listings whenever possible.
|
||||||
* It is possible to create files under files if the caller tries hard.
|
* It is possible to create files under files if the caller tries hard.
|
||||||
* The time to rename a directory is proportional to the number of files
|
* The time to rename a directory is proportional to the number of files
|
||||||
underneath it (directory or indirectly) and the size of the files. (The copyis
|
underneath it (directory or indirectly) and the size of the files. (The copy is
|
||||||
executed inside the S3 storage, so the time is independent of the bandwidth
|
executed inside the S3 storage, so the time is independent of the bandwidth
|
||||||
from client to S3).
|
from client to S3).
|
||||||
* Directory renames are not atomic: they can fail partway through, and callers
|
* Directory renames are not atomic: they can fail partway through, and callers
|
||||||
@ -320,7 +320,7 @@ export AWS_SECRET_ACCESS_KEY=my.secret.key
|
|||||||
|
|
||||||
If the environment variable `AWS_SESSION_TOKEN` is set, session authentication
|
If the environment variable `AWS_SESSION_TOKEN` is set, session authentication
|
||||||
using "Temporary Security Credentials" is enabled; the Key ID and secret key
|
using "Temporary Security Credentials" is enabled; the Key ID and secret key
|
||||||
must be set to the credentials for that specific sesssion.
|
must be set to the credentials for that specific session.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
export AWS_SESSION_TOKEN=SECRET-SESSION-TOKEN
|
export AWS_SESSION_TOKEN=SECRET-SESSION-TOKEN
|
||||||
@ -534,7 +534,7 @@ This means that the default S3A authentication chain can be defined as
|
|||||||
to directly authenticate with S3 and DynamoDB services.
|
to directly authenticate with S3 and DynamoDB services.
|
||||||
When S3A Delegation tokens are enabled, depending upon the delegation
|
When S3A Delegation tokens are enabled, depending upon the delegation
|
||||||
token binding it may be used
|
token binding it may be used
|
||||||
to communicate wih the STS endpoint to request session/role
|
to communicate with the STS endpoint to request session/role
|
||||||
credentials.
|
credentials.
|
||||||
|
|
||||||
These are loaded and queried in sequence for a valid set of credentials.
|
These are loaded and queried in sequence for a valid set of credentials.
|
||||||
@ -630,13 +630,13 @@ The S3A configuration options with sensitive data
|
|||||||
and `fs.s3a.server-side-encryption.key`) can
|
and `fs.s3a.server-side-encryption.key`) can
|
||||||
have their data saved to a binary file stored, with the values being read in
|
have their data saved to a binary file stored, with the values being read in
|
||||||
when the S3A filesystem URL is used for data access. The reference to this
|
when the S3A filesystem URL is used for data access. The reference to this
|
||||||
credential provider then declareed in the hadoop configuration.
|
credential provider then declared in the Hadoop configuration.
|
||||||
|
|
||||||
For additional reading on the Hadoop Credential Provider API see:
|
For additional reading on the Hadoop Credential Provider API see:
|
||||||
[Credential Provider API](../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
|
[Credential Provider API](../../../hadoop-project-dist/hadoop-common/CredentialProviderAPI.html).
|
||||||
|
|
||||||
|
|
||||||
The following configuration options can be storeed in Hadoop Credential Provider
|
The following configuration options can be stored in Hadoop Credential Provider
|
||||||
stores.
|
stores.
|
||||||
|
|
||||||
```
|
```
|
||||||
@ -725,7 +725,7 @@ of credentials.
|
|||||||
|
|
||||||
### Using secrets from credential providers
|
### Using secrets from credential providers
|
||||||
|
|
||||||
Once the provider is set in the Hadoop configuration, hadoop commands
|
Once the provider is set in the Hadoop configuration, Hadoop commands
|
||||||
work exactly as if the secrets were in an XML file.
|
work exactly as if the secrets were in an XML file.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -761,7 +761,7 @@ used to change the endpoint, encryption and authentication mechanisms of buckets
|
|||||||
S3Guard options, various minor options.
|
S3Guard options, various minor options.
|
||||||
|
|
||||||
Here are the S3A properties for use in production. The S3Guard options are
|
Here are the S3A properties for use in production. The S3Guard options are
|
||||||
documented in the [S3Guard documenents](./s3guard.html); some testing-related
|
documented in the [S3Guard documents](./s3guard.html); some testing-related
|
||||||
options are covered in [Testing](./testing.md).
|
options are covered in [Testing](./testing.md).
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
|
Loading…
Reference in New Issue
Block a user