Fix "the the" and friends typos (#5267)
Signed-off-by: Nikita Eshkeev <neshkeev@yandex.ru>
This commit is contained in:
parent
d81d98388c
commit
4de31123ce
@ -24,7 +24,7 @@ This filter must be configured in front of all the web application resources tha
|
||||
|
||||
The Hadoop Auth and dependent JAR files must be in the web application classpath (commonly the `WEB-INF/lib` directory).
|
||||
|
||||
Hadoop Auth uses SLF4J-API for logging. Auth Maven POM dependencies define the SLF4J API dependency but it does not define the dependency on a concrete logging implementation, this must be addded explicitly to the web application. For example, if the web applicationan uses Log4j, the SLF4J-LOG4J12 and LOG4J jar files must be part part of the web application classpath as well as the Log4j configuration file.
|
||||
Hadoop Auth uses SLF4J-API for logging. Auth Maven POM dependencies define the SLF4J API dependency but it does not define the dependency on a concrete logging implementation, this must be addded explicitly to the web application. For example, if the web applicationan uses Log4j, the SLF4J-LOG4J12 and LOG4J jar files must be part of the web application classpath as well as the Log4j configuration file.
|
||||
|
||||
### Common Configuration parameters
|
||||
|
||||
|
@ -975,7 +975,7 @@ this will be in the bucket; the `rm` operation will then take time proportional
|
||||
to the size of the data. Furthermore, the deleted files will continue to incur
|
||||
storage costs.
|
||||
|
||||
To avoid this, use the the `-skipTrash` option.
|
||||
To avoid this, use the `-skipTrash` option.
|
||||
|
||||
```bash
|
||||
hadoop fs -rm -skipTrash s3a://bucket/dataset
|
||||
|
@ -220,7 +220,7 @@ Each metrics record contains tags such as ProcessName, SessionId, and Hostname a
|
||||
| `WarmUpEDEKTimeNumOps` | Total number of warming up EDEK |
|
||||
| `WarmUpEDEKTimeAvgTime` | Average time of warming up EDEK in milliseconds |
|
||||
| `WarmUpEDEKTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in warming up EDEK in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
|
||||
| `ResourceCheckTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of of NameNode resource check latency in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
|
||||
| `ResourceCheckTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of NameNode resource check latency in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
|
||||
| `EditLogTailTimeNumOps` | Total number of times the standby NameNode tailed the edit log |
|
||||
| `EditLogTailTimeAvgTime` | Average time (in milliseconds) spent by standby NameNode in tailing edit log |
|
||||
| `EditLogTailTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in tailing edit logs by standby NameNode in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
|
||||
|
@ -595,7 +595,7 @@ hadoop kdiag \
|
||||
--keytab zk.service.keytab --principal zookeeper/devix.example.org@REALM
|
||||
```
|
||||
|
||||
This attempts to to perform all diagnostics without failing early, load in
|
||||
This attempts to perform all diagnostics without failing early, load in
|
||||
the HDFS and YARN XML resources, require a minimum key length of 1024 bytes,
|
||||
and log in as the principal `zookeeper/devix.example.org@REALM`, whose key must be in
|
||||
the keytab `zk.service.keytab`
|
||||
|
@ -501,7 +501,7 @@ Where
|
||||
def blocks(FS, p, s, s + l) = a list of the blocks containing data(FS, path)[s:s+l]
|
||||
|
||||
|
||||
Note that that as `length(FS, f) ` is defined as `0` if `isDir(FS, f)`, the result
|
||||
Note that as `length(FS, f) ` is defined as `0` if `isDir(FS, f)`, the result
|
||||
of `getFileBlockLocations()` on a directory is `[]`
|
||||
|
||||
|
||||
@ -707,7 +707,7 @@ This is a significant difference between the behavior of object stores
|
||||
and that of filesystems, as it allows >1 client to create a file with `overwrite=false`,
|
||||
and potentially confuse file/directory logic. In particular, using `create()` to acquire
|
||||
an exclusive lock on a file (whoever creates the file without an error is considered
|
||||
the holder of the lock) may not not a safe algorithm to use when working with object stores.
|
||||
the holder of the lock) may not be a safe algorithm to use when working with object stores.
|
||||
|
||||
* Object stores may create an empty file as a marker when a file is created.
|
||||
However, object stores with `overwrite=true` semantics may not implement this atomically,
|
||||
|
@ -167,7 +167,7 @@ rather than just any FS-specific subclass implemented by the implementation
|
||||
custom subclasses.
|
||||
|
||||
This is critical to ensure safe use of the feature: directory listing/
|
||||
status serialization/deserialization can result result in the `withFileStatus()`
|
||||
status serialization/deserialization can result in the `withFileStatus()`
|
||||
argument not being the custom subclass returned by the Filesystem instance's
|
||||
own `getFileStatus()`, `listFiles()`, `listLocatedStatus()` calls, etc.
|
||||
|
||||
@ -686,4 +686,4 @@ public T load(FileSystem fs,
|
||||
*Note:* : in Hadoop 3.3.2 and earlier, the `withFileStatus(status)` call
|
||||
required a non-null parameter; this has since been relaxed.
|
||||
For maximum compatibility across versions, only invoke the method
|
||||
when the file status is known to be non-null.
|
||||
when the file status is known to be non-null.
|
||||
|
@ -228,7 +228,7 @@ Accordingly: *Use if and only if you are confident that the conditions are met.*
|
||||
|
||||
### `fs.s3a.create.header` User-supplied header support
|
||||
|
||||
Options with the prefix `fs.s3a.create.header.` will be added to to the
|
||||
Options with the prefix `fs.s3a.create.header.` will be added to the
|
||||
S3 object metadata as "user defined metadata".
|
||||
This metadata is visible to all applications. It can also be retrieved through the
|
||||
FileSystem/FileContext `listXAttrs()` and `getXAttrs()` API calls with the prefix `header.`
|
||||
@ -236,4 +236,4 @@ FileSystem/FileContext `listXAttrs()` and `getXAttrs()` API calls with the prefi
|
||||
When an object is renamed, the metadata is propagated the copy created.
|
||||
|
||||
It is possible to probe an S3A Filesystem instance for this capability through
|
||||
the `hasPathCapability(path, "fs.s3a.create.header")` check.
|
||||
the `hasPathCapability(path, "fs.s3a.create.header")` check.
|
||||
|
@ -980,7 +980,7 @@ throw `UnsupportedOperationException`.
|
||||
### `StreamCapabilities`
|
||||
|
||||
Implementors of filesystem clients SHOULD implement the `StreamCapabilities`
|
||||
interface and its `hasCapabilities()` method to to declare whether or not
|
||||
interface and its `hasCapabilities()` method to declare whether or not
|
||||
an output streams offer the visibility and durability guarantees of `Syncable`.
|
||||
|
||||
Implementors of `StreamCapabilities.hasCapabilities()` MUST NOT declare that
|
||||
@ -1013,4 +1013,4 @@ all data to the datanodes.
|
||||
|
||||
1. `close()` SHALL return once the guarantees of `hflush()` are met: the data is
|
||||
visible to others.
|
||||
1. For durability guarantees, `hsync()` MUST be called first.
|
||||
1. For durability guarantees, `hsync()` MUST be called first.
|
||||
|
@ -143,7 +143,7 @@ too must have this context defined.
|
||||
|
||||
### Identifying the system accounts `hadoop.registry.system.acls`
|
||||
|
||||
These are the the accounts which are given full access to the base of the
|
||||
These are the accounts which are given full access to the base of the
|
||||
registry. The Resource Manager needs this option to create the root paths.
|
||||
|
||||
Client applications writing to the registry access to the nodes it creates.
|
||||
|
@ -29,7 +29,7 @@ a secure registry:
|
||||
1. Allow the RM to create per-user regions of the registration space
|
||||
1. Allow applications belonging to a user to write registry entries
|
||||
into their part of the space. These may be short-lived or long-lived
|
||||
YARN applications, or they may be be static applications.
|
||||
YARN applications, or they may be static applications.
|
||||
1. Prevent other users from writing into another user's part of the registry.
|
||||
1. Allow system services to register to a `/services` section of the registry.
|
||||
1. Provide read access to clients of a registry.
|
||||
|
@ -124,7 +124,7 @@ Please make sure you write code that is portable.
|
||||
* Don't write code that could force a non-aligned word access.
|
||||
* This causes performance issues on most architectures and isn't supported at all on some.
|
||||
* Generally the compiler will prevent this unless you are doing clever things with pointers e.g. abusing placement new or reinterpreting a pointer into a pointer to a wider type.
|
||||
* If a type needs to be a a specific width make sure to specify it.
|
||||
* If a type needs to be a specific width make sure to specify it.
|
||||
* `int32_t my_32_bit_wide_int`
|
||||
* Avoid using compiler dependent pragmas or attributes.
|
||||
* If there is a justified and unavoidable reason for using these you must document why. See examples below.
|
||||
|
@ -30,7 +30,7 @@ Hadoop has an option parsing framework that employs parsing generic options as w
|
||||
|:---- |:---- |
|
||||
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
|
||||
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
|
||||
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
| COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
|
||||
User Commands
|
||||
-------------
|
||||
|
@ -24,7 +24,7 @@ The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the clie
|
||||
|
||||
* Users can browse the HDFS file system through their local file system
|
||||
on NFSv3 client compatible operating systems.
|
||||
* Users can download files from the the HDFS file system on to their
|
||||
* Users can download files from the HDFS file system on to their
|
||||
local file system.
|
||||
* Users can upload files from their local file system directly to the
|
||||
HDFS file system.
|
||||
@ -92,7 +92,7 @@ The rest of the NFS gateway configurations are optional for both secure and non-
|
||||
the super-user can do anything in that permissions checks never fail for the super-user.
|
||||
If the following property is configured, the superuser on NFS client can access any file
|
||||
on HDFS. By default, the super user is not configured in the gateway.
|
||||
Note that, even the the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
|
||||
Note that, even the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
|
||||
For example, the superuser will not have write access to HDFS files through the gateway if
|
||||
the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts".
|
||||
|
||||
@ -154,7 +154,7 @@ It's strongly recommended for the users to update a few configuration properties
|
||||
the super-user can do anything in that permissions checks never fail for the super-user.
|
||||
If the following property is configured, the superuser on NFS client can access any file
|
||||
on HDFS. By default, the super user is not configured in the gateway.
|
||||
Note that, even the the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
|
||||
Note that, even the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
|
||||
For example, the superuser will not have write access to HDFS files through the gateway if
|
||||
the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts".
|
||||
|
||||
|
@ -227,7 +227,7 @@ For command usage, see [namenode](./HDFSCommands.html#namenode).
|
||||
Balancer
|
||||
--------
|
||||
|
||||
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
|
||||
HDFS data might not always be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
|
||||
|
||||
* Policy to keep one of the replicas of a block on the same node as
|
||||
the node that is writing the block.
|
||||
|
@ -84,11 +84,11 @@ If users want some of their existing cluster (`hdfs://cluster`) data to mount wi
|
||||
|
||||
Let's consider the following operations to understand where these operations will be delegated based on mount links.
|
||||
|
||||
*Op1:* Create a file with the the path `hdfs://cluster/user/fileA`, then physically this file will be created at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user/`.
|
||||
*Op1:* Create a file with the path `hdfs://cluster/user/fileA`, then physically this file will be created at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user/`.
|
||||
|
||||
*Op2:* Create a file the the path `hdfs://cluster/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
|
||||
*Op2:* Create a file the path `hdfs://cluster/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
|
||||
|
||||
*Op3:* Create a file with the the path `hdfs://cluster/backup/data.zip`, then physically this file will be created at `s3a://bucket1/backup/data.zip`. This delegation happened based on the third configuration parameter in above configurations. Here `/backup` was mapped to `s3a://bucket1/backup/`.
|
||||
*Op3:* Create a file with the path `hdfs://cluster/backup/data.zip`, then physically this file will be created at `s3a://bucket1/backup/data.zip`. This delegation happened based on the third configuration parameter in above configurations. Here `/backup` was mapped to `s3a://bucket1/backup/`.
|
||||
|
||||
|
||||
**Example 2:**
|
||||
@ -114,11 +114,11 @@ If users want some of their existing cluster (`s3a://bucketA/`) data to mount wi
|
||||
```
|
||||
Let's consider the following operations to understand to where these operations will be delegated based on mount links.
|
||||
|
||||
*Op1:* Create a file with the the path `s3a://bucketA/user/fileA`, then this file will be created physically at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user`.
|
||||
*Op1:* Create a file with the path `s3a://bucketA/user/fileA`, then this file will be created physically at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user`.
|
||||
|
||||
*Op2:* Create a file the the path `s3a://bucketA/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
|
||||
*Op2:* Create a file the path `s3a://bucketA/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
|
||||
|
||||
*Op3:* Create a file with the the path `s3a://bucketA/salesDB/dbfile`, then physically this file will be created at `s3a://bucketA/salesDB/dbfile`. This delegation happened based on the third configuration parameter in above configurations. Here `/salesDB` was mapped to `s3a://bucket1/salesDB`.
|
||||
*Op3:* Create a file with the path `s3a://bucketA/salesDB/dbfile`, then physically this file will be created at `s3a://bucketA/salesDB/dbfile`. This delegation happened based on the third configuration parameter in above configurations. Here `/salesDB` was mapped to `s3a://bucket1/salesDB`.
|
||||
|
||||
Note: In above examples we used create operation only, but the same mechanism applies to any other file system APIs here.
|
||||
|
||||
|
@ -41,7 +41,7 @@ We cannot ensure complete binary compatibility with the applications that use **
|
||||
Not Supported
|
||||
-------------
|
||||
|
||||
MRAdmin has been removed in MRv2 because because `mradmin` commands no longer exist. They have been replaced by the commands in `rmadmin`. We neither support binary compatibility nor source compatibility for the applications that use this class directly.
|
||||
MRAdmin has been removed in MRv2 because `mradmin` commands no longer exist. They have been replaced by the commands in `rmadmin`. We neither support binary compatibility nor source compatibility for the applications that use this class directly.
|
||||
|
||||
Tradeoffs between MRv1 Users and Early MRv2 Adopters
|
||||
----------------------------------------------------
|
||||
|
@ -30,7 +30,7 @@ Hadoop has an option parsing framework that employs parsing generic options as w
|
||||
|:---- |:---- |
|
||||
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
|
||||
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
|
||||
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
| COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
|
||||
User Commands
|
||||
-------------
|
||||
|
@ -113,7 +113,7 @@ No renaming takes place —the files are left in their original location.
|
||||
The directory treewalk is single-threaded, then it is `O(directories)`,
|
||||
with each directory listing using one or more paged LIST calls.
|
||||
|
||||
This is simple, and for most tasks, the scan is off the critical path of of the job.
|
||||
This is simple, and for most tasks, the scan is off the critical path of the job.
|
||||
|
||||
Statistics analysis may justify moving to parallel scans in future.
|
||||
|
||||
@ -332,4 +332,4 @@ Any store/FS which supports auditing is able to collect this data
|
||||
and include in their logs.
|
||||
|
||||
To ease backporting, all audit integration is in the single class
|
||||
`org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.AuditingIntegration`.
|
||||
`org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.AuditingIntegration`.
|
||||
|
@ -213,7 +213,7 @@ the same correctness guarantees as the v1 algorithm.
|
||||
attempt working directory to their final destination path, holding back on
|
||||
the final manifestation POST.
|
||||
1. A JSON file containing all information needed to complete the upload of all
|
||||
files in the task attempt is written to the Job Attempt directory of of the
|
||||
files in the task attempt is written to the Job Attempt directory of the
|
||||
wrapped committer working with HDFS.
|
||||
1. Job commit: load in all the manifest files in the HDFS job attempt directory,
|
||||
then issued the POST request to complete the uploads. These are parallelised.
|
||||
|
@ -76,7 +76,7 @@ The tool works by performing the following procedure:
|
||||
- its aggregation status has successfully completed
|
||||
- has at least ``-minNumberLogFiles`` log files
|
||||
- the sum of its log files size is less than ``-maxTotalLogsSize`` megabytes
|
||||
2. If there are are more than ``-maxEligibleApps`` applications found, the
|
||||
2. If there are more than ``-maxEligibleApps`` applications found, the
|
||||
newest applications are dropped. They can be processed next time.
|
||||
3. A shell script is generated based on the eligible applications
|
||||
4. The Distributed Shell program is run with the aformentioned script. It
|
||||
|
@ -411,7 +411,7 @@ log4j.logger.org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor=DEBUG
|
||||
This adds one log line per request -and does provide some insight into
|
||||
communications between the S3A client and AWS S3.
|
||||
|
||||
For low-level debugging of the Auditing system, such as when when spans are
|
||||
For low-level debugging of the Auditing system, such as when spans are
|
||||
entered and exited, set the log to `TRACE`:
|
||||
|
||||
```
|
||||
|
@ -63,7 +63,7 @@ entries, the duration of the lock is low.
|
||||
|
||||
In contrast to a "real" filesystem, Amazon's S3A object store, similar to
|
||||
most others, does not support `rename()` at all. A hash operation on the filename
|
||||
determines the location of of the data —there is no separate metadata to change.
|
||||
determines the location of the data —there is no separate metadata to change.
|
||||
To mimic renaming, the Hadoop S3A client has to copy the data to a new object
|
||||
with the destination filename, then delete the original entry. This copy
|
||||
can be executed server-side, but as it does not complete until the in-cluster
|
||||
@ -79,7 +79,7 @@ The solution to this problem is closely coupled to the S3 protocol itself:
|
||||
delayed completion of multi-part PUT operations
|
||||
|
||||
That is: tasks write all data as multipart uploads, *but delay the final
|
||||
commit action until until the final, single job commit action.* Only that
|
||||
commit action until the final, single job commit action.* Only that
|
||||
data committed in the job commit action will be made visible; work from speculative
|
||||
and failed tasks will not be instantiated. As there is no rename, there is no
|
||||
delay while data is copied from a temporary directory to the final directory.
|
||||
@ -307,7 +307,7 @@ def isCommitJobRepeatable() :
|
||||
Accordingly, it is a failure point in the protocol. With a low number of files
|
||||
and fast rename/list algorithms, the window of vulnerability is low. At
|
||||
scale, the vulnerability increases. It could actually be reduced through
|
||||
parallel execution of the renaming of of committed tasks.
|
||||
parallel execution of the renaming of committed tasks.
|
||||
|
||||
|
||||
### Job Abort
|
||||
|
@ -89,7 +89,7 @@ of:
|
||||
* A set of AWS session credentials
|
||||
(`fs.s3a.access.key`, `fs.s3a.secret.key`, `fs.s3a.session.token`).
|
||||
|
||||
These credentials are obtained from the AWS Secure Token Service (STS) when the the token is issued.
|
||||
These credentials are obtained from the AWS Secure Token Service (STS) when the token is issued.
|
||||
* A set of AWS session credentials binding the user to a specific AWS IAM Role,
|
||||
further restricted to only access the S3 bucket.
|
||||
Again, these credentials are requested when the token is issued.
|
||||
|
@ -218,7 +218,7 @@ This can have adverse effects on those large directories, again.
|
||||
|
||||
In the Presto [S3 connector](https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration),
|
||||
`mkdirs()` is a no-op.
|
||||
Whenever it lists any path which isn't an object or a prefix of one more more objects, it returns an
|
||||
Whenever it lists any path which isn't an object or a prefix of one more objects, it returns an
|
||||
empty listing. That is:; by default, every path is an empty directory.
|
||||
|
||||
Provided no code probes for a directory existing and fails if it is there, this
|
||||
@ -524,7 +524,7 @@ Ignoring 3 markers in authoritative paths
|
||||
```
|
||||
|
||||
All of this S3A bucket _other_ than the authoritative path `/tables` will be safe for
|
||||
incompatible Hadoop releases to to use.
|
||||
incompatible Hadoop releases to use.
|
||||
|
||||
|
||||
### <a name="marker-tool-clean"></a>`markers clean`
|
||||
|
@ -505,7 +505,7 @@ providers listed after it will be ignored.
|
||||
|
||||
### <a name="auth_simple"></a> Simple name/secret credentials with `SimpleAWSCredentialsProvider`*
|
||||
|
||||
This is is the standard credential provider, which supports the secret
|
||||
This is the standard credential provider, which supports the secret
|
||||
key in `fs.s3a.access.key` and token in `fs.s3a.secret.key`
|
||||
values.
|
||||
|
||||
@ -1392,7 +1392,7 @@ an S3 implementation that doesn't return eTags.
|
||||
|
||||
When `true` (default) and 'Get Object' doesn't return eTag or
|
||||
version ID (depending on configured 'source'), a `NoVersionAttributeException`
|
||||
will be thrown. When `false` and and eTag or version ID is not returned,
|
||||
will be thrown. When `false` and eTag or version ID is not returned,
|
||||
the stream can be read, but without any version checking.
|
||||
|
||||
|
||||
@ -1868,7 +1868,7 @@ in byte arrays in the JVM's heap prior to upload.
|
||||
This *may* be faster than buffering to disk.
|
||||
|
||||
The amount of data which can be buffered is limited by the available
|
||||
size of the JVM heap heap. The slower the write bandwidth to S3, the greater
|
||||
size of the JVM heap. The slower the write bandwidth to S3, the greater
|
||||
the risk of heap overflows. This risk can be mitigated by
|
||||
[tuning the upload settings](#upload_thread_tuning).
|
||||
|
||||
|
@ -122,7 +122,7 @@ Optimised for random IO, specifically the Hadoop `PositionedReadable`
|
||||
operations —though `seek(offset); read(byte_buffer)` also benefits.
|
||||
|
||||
Rather than ask for the whole file, the range of the HTTP request is
|
||||
set to that that of the length of data desired in the `read` operation
|
||||
set to the length of data desired in the `read` operation
|
||||
(Rounded up to the readahead value set in `setReadahead()` if necessary).
|
||||
|
||||
By reducing the cost of closing existing HTTP requests, this is
|
||||
@ -172,7 +172,7 @@ sequential to `random`.
|
||||
This policy essentially recognizes the initial read pattern of columnar
|
||||
storage formats (e.g. Apache ORC and Apache Parquet), which seek to the end
|
||||
of a file, read in index data and then seek backwards to selectively read
|
||||
columns. The first seeks may be be expensive compared to the random policy,
|
||||
columns. The first seeks may be expensive compared to the random policy,
|
||||
however the overall process is much less expensive than either sequentially
|
||||
reading through a file with the `random` policy, or reading columnar data
|
||||
with the `sequential` policy.
|
||||
@ -384,7 +384,7 @@ data loss.
|
||||
Amazon S3 uses a set of front-end servers to provide access to the underlying data.
|
||||
The choice of which front-end server to use is handled via load-balancing DNS
|
||||
service: when the IP address of an S3 bucket is looked up, the choice of which
|
||||
IP address to return to the client is made based on the the current load
|
||||
IP address to return to the client is made based on the current load
|
||||
of the front-end servers.
|
||||
|
||||
Over time, the load across the front-end changes, so those servers considered
|
||||
@ -694,4 +694,4 @@ connectors for other buckets, would end up blocking too.
|
||||
|
||||
Consider experimenting with this when running applications
|
||||
where many threads may try to simultaneously interact
|
||||
with the same slow-to-initialize object stores.
|
||||
with the same slow-to-initialize object stores.
|
||||
|
@ -738,7 +738,7 @@ at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecor
|
||||
```
|
||||
|
||||
The underlying problem is that the gzip decompressor is automatically enabled
|
||||
when the the source file ends with the ".gz" extension. Because S3 Select
|
||||
when the source file ends with the ".gz" extension. Because S3 Select
|
||||
returns decompressed data, the codec fails.
|
||||
|
||||
The workaround here is to declare that the job should add the "Passthrough Codec"
|
||||
|
@ -29,7 +29,7 @@ the S3A connector**
|
||||
- - -
|
||||
|
||||
|
||||
## <a name="migrating"></a> How to migrate to to the S3A client
|
||||
## <a name="migrating"></a> How to migrate to the S3A client
|
||||
|
||||
1. Keep the `hadoop-aws` JAR on your classpath.
|
||||
|
||||
|
@ -324,7 +324,7 @@ There's two main causes
|
||||
|
||||
If you see this and you are trying to use the S3A connector with Spark, then the cause can
|
||||
be that the isolated classloader used to load Hive classes is interfering with the S3A
|
||||
connector's dynamic loading of `com.amazonaws` classes. To fix this, declare that that
|
||||
connector's dynamic loading of `com.amazonaws` classes. To fix this, declare that
|
||||
the classes in the aws SDK are loaded from the same classloader which instantiated
|
||||
the S3A FileSystem instance:
|
||||
|
||||
|
@ -435,7 +435,7 @@ The service is expected to return a response in JSON format:
|
||||
|
||||
### Delegation token support in WASB
|
||||
|
||||
Delegation token support support can be enabled in WASB using the following configuration:
|
||||
Delegation token support can be enabled in WASB using the following configuration:
|
||||
|
||||
```xml
|
||||
<property>
|
||||
@ -507,7 +507,7 @@ The cache is maintained at a filesystem object level.
|
||||
</property>
|
||||
```
|
||||
|
||||
The maximum number of entries that that cache can hold can be customized using the following setting:
|
||||
The maximum number of entries that the cache can hold can be customized using the following setting:
|
||||
```
|
||||
<property>
|
||||
<name>fs.azure.authorization.caching.maxentries</name>
|
||||
|
@ -618,7 +618,7 @@ Below are the pre-requiste steps to follow:
|
||||
|
||||
```
|
||||
XInclude is supported, so for extra security secrets may be
|
||||
kept out of the source tree then referenced through an an XInclude element:
|
||||
kept out of the source tree then referenced through an XInclude element:
|
||||
|
||||
<include xmlns="http://www.w3.org/2001/XInclude"
|
||||
href="/users/self/.secrets/auth-keys.xml" />
|
||||
|
@ -150,7 +150,7 @@ yarn.scheduler.capacity.root.marketing.accessible-node-labels.GPU.capacity=50
|
||||
yarn.scheduler.capacity.root.engineering.default-node-label-expression=GPU
|
||||
```
|
||||
|
||||
You can see root.engineering/marketing/sales.capacity=33, so each of them can has guaranteed resource equals to 1/3 of resource **without partition**. So each of them can use 1/3 resource of h1..h4, which is 24 * 4 * (1/3) = (32G mem, 32 v-cores).
|
||||
You can see root.engineering/marketing/sales.capacity=33, so each of them has guaranteed resource equals to 1/3 of resource **without partition**. So each of them can use 1/3 resource of h1..h4, which is 24 * 4 * (1/3) = (32G mem, 32 v-cores).
|
||||
|
||||
And only engineering/marketing queue has permission to access GPU partition (see root.`<queue-name>`.accessible-node-labels).
|
||||
|
||||
|
@ -89,13 +89,13 @@ The Timeline Domain offers a namespace for Timeline server allowing
|
||||
users to host multiple entities, isolating them from other users and applications.
|
||||
Timeline server Security is defined at this level.
|
||||
|
||||
A "Domain" primarily stores owner info, read and& write ACL information,
|
||||
A "Domain" primarily stores owner info, read and write ACL information,
|
||||
created and modified time stamp information. Each Domain is identified by an ID which
|
||||
must be unique across all users in the YARN cluster.
|
||||
|
||||
#### Timeline Entity
|
||||
|
||||
A Timeline Entity contains the the meta information of a conceptual entity
|
||||
A Timeline Entity contains the meta information of a conceptual entity
|
||||
and its related events.
|
||||
|
||||
The entity can be an application, an application attempt, a container or
|
||||
@ -199,7 +199,7 @@ to `kerberos`, after which the following configuration options are available:
|
||||
| `yarn.timeline-service.http-authentication.type` | Defines authentication used for the timeline server HTTP endpoint. Supported values are: `simple` / `kerberos` / #AUTHENTICATION_HANDLER_CLASSNAME#. Defaults to `simple`. |
|
||||
| `yarn.timeline-service.http-authentication.simple.anonymous.allowed` | Indicates if anonymous requests are allowed by the timeline server when using 'simple' authentication. Defaults to `true`. |
|
||||
| `yarn.timeline-service.principal` | The Kerberos principal for the timeline server. |
|
||||
| `yarn.timeline-service.keytab` | The Kerberos keytab for the timeline server. Defaults on Unix to to `/etc/krb5.keytab`. |
|
||||
| `yarn.timeline-service.keytab` | The Kerberos keytab for the timeline server. Defaults on Unix to `/etc/krb5.keytab`. |
|
||||
| `yarn.timeline-service.delegation.key.update-interval` | Defaults to `86400000` (1 day). |
|
||||
| `yarn.timeline-service.delegation.token.renew-interval` | Defaults to `86400000` (1 day). |
|
||||
| `yarn.timeline-service.delegation.token.max-lifetime` | Defaults to `604800000` (7 days). |
|
||||
|
@ -865,7 +865,7 @@ none of the apps match the predicates, an empty list will be returned.
|
||||
1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal
|
||||
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
|
||||
1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the
|
||||
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
"(<metricid> <compareop> <metricvalue>) <op> (<metricid> <compareop> <metricvalue>)".<br/>
|
||||
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
|
||||
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
|
||||
@ -998,7 +998,7 @@ match the predicates, an empty list will be returned.
|
||||
1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal
|
||||
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
|
||||
1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the
|
||||
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
"(<metricid> <compareop> <metricvalue>) <op> (<metricid> <compareop> <metricvalue>)".<br/>
|
||||
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
|
||||
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
|
||||
@ -1205,7 +1205,7 @@ If none of the entities match the predicates, an empty list will be returned.
|
||||
1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal
|
||||
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
|
||||
1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the
|
||||
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
"(<metricid> <compareop> <metricvalue>) <op> (<metricid> <compareop> <metricvalue>)"<br/>
|
||||
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
|
||||
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
|
||||
@ -1341,7 +1341,7 @@ If none of the entities match the predicates, an empty list will be returned.
|
||||
1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal
|
||||
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
|
||||
1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the
|
||||
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
|
||||
"(<metricid> <compareop> <metricvalue>) <op> (<metricid> <compareop> <metricvalue>)"<br/>
|
||||
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
|
||||
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
|
||||
|
@ -30,7 +30,7 @@ YARN has an option parsing framework that employs parsing generic options as wel
|
||||
|:---- |:---- |
|
||||
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
|
||||
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
|
||||
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
| COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
|
||||
|
||||
User Commands
|
||||
-------------
|
||||
|
@ -535,7 +535,7 @@ PUT URL - http://localhost:8088/app/v1/services/hello-world
|
||||
|
||||
#### PUT Request JSON
|
||||
|
||||
Note, irrespective of what the current lifetime value is, this update request will set the lifetime of the service to be 3600 seconds (1 hour) from the time the request is submitted. Hence, if a a service has remaining lifetime of 5 mins (say) and would like to extend it to an hour OR if an application has remaining lifetime of 5 hours (say) and would like to reduce it down to an hour, then for both scenarios you need to submit the same request below.
|
||||
Note, irrespective of what the current lifetime value is, this update request will set the lifetime of the service to be 3600 seconds (1 hour) from the time the request is submitted. Hence, if a service has remaining lifetime of 5 mins (say) and would like to extend it to an hour OR if an application has remaining lifetime of 5 hours (say) and would like to reduce it down to an hour, then for both scenarios you need to submit the same request below.
|
||||
|
||||
```json
|
||||
{
|
||||
|
Loading…
x
Reference in New Issue
Block a user