Fix "the the" and friends typos (#5267)

Signed-off-by: Nikita Eshkeev <neshkeev@yandex.ru>
This commit is contained in:
Nikita Eshkeev 2023-01-16 22:33:59 +03:00 committed by GitHub
parent d81d98388c
commit 4de31123ce
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
36 changed files with 62 additions and 62 deletions

View File

@ -24,7 +24,7 @@ This filter must be configured in front of all the web application resources tha
The Hadoop Auth and dependent JAR files must be in the web application classpath (commonly the `WEB-INF/lib` directory). The Hadoop Auth and dependent JAR files must be in the web application classpath (commonly the `WEB-INF/lib` directory).
Hadoop Auth uses SLF4J-API for logging. Auth Maven POM dependencies define the SLF4J API dependency but it does not define the dependency on a concrete logging implementation, this must be addded explicitly to the web application. For example, if the web applicationan uses Log4j, the SLF4J-LOG4J12 and LOG4J jar files must be part part of the web application classpath as well as the Log4j configuration file. Hadoop Auth uses SLF4J-API for logging. Auth Maven POM dependencies define the SLF4J API dependency but it does not define the dependency on a concrete logging implementation, this must be addded explicitly to the web application. For example, if the web applicationan uses Log4j, the SLF4J-LOG4J12 and LOG4J jar files must be part of the web application classpath as well as the Log4j configuration file.
### Common Configuration parameters ### Common Configuration parameters

View File

@ -975,7 +975,7 @@ this will be in the bucket; the `rm` operation will then take time proportional
to the size of the data. Furthermore, the deleted files will continue to incur to the size of the data. Furthermore, the deleted files will continue to incur
storage costs. storage costs.
To avoid this, use the the `-skipTrash` option. To avoid this, use the `-skipTrash` option.
```bash ```bash
hadoop fs -rm -skipTrash s3a://bucket/dataset hadoop fs -rm -skipTrash s3a://bucket/dataset

View File

@ -220,7 +220,7 @@ Each metrics record contains tags such as ProcessName, SessionId, and Hostname a
| `WarmUpEDEKTimeNumOps` | Total number of warming up EDEK | | `WarmUpEDEKTimeNumOps` | Total number of warming up EDEK |
| `WarmUpEDEKTimeAvgTime` | Average time of warming up EDEK in milliseconds | | `WarmUpEDEKTimeAvgTime` | Average time of warming up EDEK in milliseconds |
| `WarmUpEDEKTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in warming up EDEK in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. | | `WarmUpEDEKTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in warming up EDEK in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
| `ResourceCheckTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of of NameNode resource check latency in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. | | `ResourceCheckTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of NameNode resource check latency in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |
| `EditLogTailTimeNumOps` | Total number of times the standby NameNode tailed the edit log | | `EditLogTailTimeNumOps` | Total number of times the standby NameNode tailed the edit log |
| `EditLogTailTimeAvgTime` | Average time (in milliseconds) spent by standby NameNode in tailing edit log | | `EditLogTailTimeAvgTime` | Average time (in milliseconds) spent by standby NameNode in tailing edit log |
| `EditLogTailTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in tailing edit logs by standby NameNode in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. | | `EditLogTailTime`*num*`s(50/75/90/95/99)thPercentileLatency` | The 50/75/90/95/99th percentile of time spent in tailing edit logs by standby NameNode in milliseconds (*num* seconds granularity). Percentile measurement is off by default, by watching no intervals. The intervals are specified by `dfs.metrics.percentiles.intervals`. |

View File

@ -595,7 +595,7 @@ hadoop kdiag \
--keytab zk.service.keytab --principal zookeeper/devix.example.org@REALM --keytab zk.service.keytab --principal zookeeper/devix.example.org@REALM
``` ```
This attempts to to perform all diagnostics without failing early, load in This attempts to perform all diagnostics without failing early, load in
the HDFS and YARN XML resources, require a minimum key length of 1024 bytes, the HDFS and YARN XML resources, require a minimum key length of 1024 bytes,
and log in as the principal `zookeeper/devix.example.org@REALM`, whose key must be in and log in as the principal `zookeeper/devix.example.org@REALM`, whose key must be in
the keytab `zk.service.keytab` the keytab `zk.service.keytab`

View File

@ -501,7 +501,7 @@ Where
def blocks(FS, p, s, s + l) = a list of the blocks containing data(FS, path)[s:s+l] def blocks(FS, p, s, s + l) = a list of the blocks containing data(FS, path)[s:s+l]
Note that that as `length(FS, f) ` is defined as `0` if `isDir(FS, f)`, the result Note that as `length(FS, f) ` is defined as `0` if `isDir(FS, f)`, the result
of `getFileBlockLocations()` on a directory is `[]` of `getFileBlockLocations()` on a directory is `[]`
@ -707,7 +707,7 @@ This is a significant difference between the behavior of object stores
and that of filesystems, as it allows &gt;1 client to create a file with `overwrite=false`, and that of filesystems, as it allows &gt;1 client to create a file with `overwrite=false`,
and potentially confuse file/directory logic. In particular, using `create()` to acquire and potentially confuse file/directory logic. In particular, using `create()` to acquire
an exclusive lock on a file (whoever creates the file without an error is considered an exclusive lock on a file (whoever creates the file without an error is considered
the holder of the lock) may not not a safe algorithm to use when working with object stores. the holder of the lock) may not be a safe algorithm to use when working with object stores.
* Object stores may create an empty file as a marker when a file is created. * Object stores may create an empty file as a marker when a file is created.
However, object stores with `overwrite=true` semantics may not implement this atomically, However, object stores with `overwrite=true` semantics may not implement this atomically,

View File

@ -167,7 +167,7 @@ rather than just any FS-specific subclass implemented by the implementation
custom subclasses. custom subclasses.
This is critical to ensure safe use of the feature: directory listing/ This is critical to ensure safe use of the feature: directory listing/
status serialization/deserialization can result result in the `withFileStatus()` status serialization/deserialization can result in the `withFileStatus()`
argument not being the custom subclass returned by the Filesystem instance's argument not being the custom subclass returned by the Filesystem instance's
own `getFileStatus()`, `listFiles()`, `listLocatedStatus()` calls, etc. own `getFileStatus()`, `listFiles()`, `listLocatedStatus()` calls, etc.
@ -686,4 +686,4 @@ public T load(FileSystem fs,
*Note:* : in Hadoop 3.3.2 and earlier, the `withFileStatus(status)` call *Note:* : in Hadoop 3.3.2 and earlier, the `withFileStatus(status)` call
required a non-null parameter; this has since been relaxed. required a non-null parameter; this has since been relaxed.
For maximum compatibility across versions, only invoke the method For maximum compatibility across versions, only invoke the method
when the file status is known to be non-null. when the file status is known to be non-null.

View File

@ -228,7 +228,7 @@ Accordingly: *Use if and only if you are confident that the conditions are met.*
### `fs.s3a.create.header` User-supplied header support ### `fs.s3a.create.header` User-supplied header support
Options with the prefix `fs.s3a.create.header.` will be added to to the Options with the prefix `fs.s3a.create.header.` will be added to the
S3 object metadata as "user defined metadata". S3 object metadata as "user defined metadata".
This metadata is visible to all applications. It can also be retrieved through the This metadata is visible to all applications. It can also be retrieved through the
FileSystem/FileContext `listXAttrs()` and `getXAttrs()` API calls with the prefix `header.` FileSystem/FileContext `listXAttrs()` and `getXAttrs()` API calls with the prefix `header.`
@ -236,4 +236,4 @@ FileSystem/FileContext `listXAttrs()` and `getXAttrs()` API calls with the prefi
When an object is renamed, the metadata is propagated the copy created. When an object is renamed, the metadata is propagated the copy created.
It is possible to probe an S3A Filesystem instance for this capability through It is possible to probe an S3A Filesystem instance for this capability through
the `hasPathCapability(path, "fs.s3a.create.header")` check. the `hasPathCapability(path, "fs.s3a.create.header")` check.

View File

@ -980,7 +980,7 @@ throw `UnsupportedOperationException`.
### `StreamCapabilities` ### `StreamCapabilities`
Implementors of filesystem clients SHOULD implement the `StreamCapabilities` Implementors of filesystem clients SHOULD implement the `StreamCapabilities`
interface and its `hasCapabilities()` method to to declare whether or not interface and its `hasCapabilities()` method to declare whether or not
an output streams offer the visibility and durability guarantees of `Syncable`. an output streams offer the visibility and durability guarantees of `Syncable`.
Implementors of `StreamCapabilities.hasCapabilities()` MUST NOT declare that Implementors of `StreamCapabilities.hasCapabilities()` MUST NOT declare that
@ -1013,4 +1013,4 @@ all data to the datanodes.
1. `close()` SHALL return once the guarantees of `hflush()` are met: the data is 1. `close()` SHALL return once the guarantees of `hflush()` are met: the data is
visible to others. visible to others.
1. For durability guarantees, `hsync()` MUST be called first. 1. For durability guarantees, `hsync()` MUST be called first.

View File

@ -143,7 +143,7 @@ too must have this context defined.
### Identifying the system accounts `hadoop.registry.system.acls` ### Identifying the system accounts `hadoop.registry.system.acls`
These are the the accounts which are given full access to the base of the These are the accounts which are given full access to the base of the
registry. The Resource Manager needs this option to create the root paths. registry. The Resource Manager needs this option to create the root paths.
Client applications writing to the registry access to the nodes it creates. Client applications writing to the registry access to the nodes it creates.

View File

@ -29,7 +29,7 @@ a secure registry:
1. Allow the RM to create per-user regions of the registration space 1. Allow the RM to create per-user regions of the registration space
1. Allow applications belonging to a user to write registry entries 1. Allow applications belonging to a user to write registry entries
into their part of the space. These may be short-lived or long-lived into their part of the space. These may be short-lived or long-lived
YARN applications, or they may be be static applications. YARN applications, or they may be static applications.
1. Prevent other users from writing into another user's part of the registry. 1. Prevent other users from writing into another user's part of the registry.
1. Allow system services to register to a `/services` section of the registry. 1. Allow system services to register to a `/services` section of the registry.
1. Provide read access to clients of a registry. 1. Provide read access to clients of a registry.

View File

@ -124,7 +124,7 @@ Please make sure you write code that is portable.
* Don't write code that could force a non-aligned word access. * Don't write code that could force a non-aligned word access.
* This causes performance issues on most architectures and isn't supported at all on some. * This causes performance issues on most architectures and isn't supported at all on some.
* Generally the compiler will prevent this unless you are doing clever things with pointers e.g. abusing placement new or reinterpreting a pointer into a pointer to a wider type. * Generally the compiler will prevent this unless you are doing clever things with pointers e.g. abusing placement new or reinterpreting a pointer into a pointer to a wider type.
* If a type needs to be a a specific width make sure to specify it. * If a type needs to be a specific width make sure to specify it.
* `int32_t my_32_bit_wide_int` * `int32_t my_32_bit_wide_int`
* Avoid using compiler dependent pragmas or attributes. * Avoid using compiler dependent pragmas or attributes.
* If there is a justified and unavoidable reason for using these you must document why. See examples below. * If there is a justified and unavoidable reason for using these you must document why. See examples below.

View File

@ -30,7 +30,7 @@ Hadoop has an option parsing framework that employs parsing generic options as w
|:---- |:---- | |:---- |:---- |
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. | | SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. | | GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). | | COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
User Commands User Commands
------------- -------------

View File

@ -24,7 +24,7 @@ The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the clie
* Users can browse the HDFS file system through their local file system * Users can browse the HDFS file system through their local file system
on NFSv3 client compatible operating systems. on NFSv3 client compatible operating systems.
* Users can download files from the the HDFS file system on to their * Users can download files from the HDFS file system on to their
local file system. local file system.
* Users can upload files from their local file system directly to the * Users can upload files from their local file system directly to the
HDFS file system. HDFS file system.
@ -92,7 +92,7 @@ The rest of the NFS gateway configurations are optional for both secure and non-
the super-user can do anything in that permissions checks never fail for the super-user. the super-user can do anything in that permissions checks never fail for the super-user.
If the following property is configured, the superuser on NFS client can access any file If the following property is configured, the superuser on NFS client can access any file
on HDFS. By default, the super user is not configured in the gateway. on HDFS. By default, the super user is not configured in the gateway.
Note that, even the the superuser is configured, "nfs.exports.allowed.hosts" still takes effect. Note that, even the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
For example, the superuser will not have write access to HDFS files through the gateway if For example, the superuser will not have write access to HDFS files through the gateway if
the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts". the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts".
@ -154,7 +154,7 @@ It's strongly recommended for the users to update a few configuration properties
the super-user can do anything in that permissions checks never fail for the super-user. the super-user can do anything in that permissions checks never fail for the super-user.
If the following property is configured, the superuser on NFS client can access any file If the following property is configured, the superuser on NFS client can access any file
on HDFS. By default, the super user is not configured in the gateway. on HDFS. By default, the super user is not configured in the gateway.
Note that, even the the superuser is configured, "nfs.exports.allowed.hosts" still takes effect. Note that, even the superuser is configured, "nfs.exports.allowed.hosts" still takes effect.
For example, the superuser will not have write access to HDFS files through the gateway if For example, the superuser will not have write access to HDFS files through the gateway if
the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts". the NFS client host is not allowed to have write access in "nfs.exports.allowed.hosts".

View File

@ -227,7 +227,7 @@ For command usage, see [namenode](./HDFSCommands.html#namenode).
Balancer Balancer
-------- --------
HDFS data might not always be be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are: HDFS data might not always be placed uniformly across the DataNode. One common reason is addition of new DataNodes to an existing cluster. While placing new blocks (data for a file is stored as a series of blocks), NameNode considers various parameters before choosing the DataNodes to receive these blocks. Some of the considerations are:
* Policy to keep one of the replicas of a block on the same node as * Policy to keep one of the replicas of a block on the same node as
the node that is writing the block. the node that is writing the block.

View File

@ -84,11 +84,11 @@ If users want some of their existing cluster (`hdfs://cluster`) data to mount wi
Let's consider the following operations to understand where these operations will be delegated based on mount links. Let's consider the following operations to understand where these operations will be delegated based on mount links.
*Op1:* Create a file with the the path `hdfs://cluster/user/fileA`, then physically this file will be created at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user/`. *Op1:* Create a file with the path `hdfs://cluster/user/fileA`, then physically this file will be created at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user/`.
*Op2:* Create a file the the path `hdfs://cluster/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`. *Op2:* Create a file the path `hdfs://cluster/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
*Op3:* Create a file with the the path `hdfs://cluster/backup/data.zip`, then physically this file will be created at `s3a://bucket1/backup/data.zip`. This delegation happened based on the third configuration parameter in above configurations. Here `/backup` was mapped to `s3a://bucket1/backup/`. *Op3:* Create a file with the path `hdfs://cluster/backup/data.zip`, then physically this file will be created at `s3a://bucket1/backup/data.zip`. This delegation happened based on the third configuration parameter in above configurations. Here `/backup` was mapped to `s3a://bucket1/backup/`.
**Example 2:** **Example 2:**
@ -114,11 +114,11 @@ If users want some of their existing cluster (`s3a://bucketA/`) data to mount wi
``` ```
Let's consider the following operations to understand to where these operations will be delegated based on mount links. Let's consider the following operations to understand to where these operations will be delegated based on mount links.
*Op1:* Create a file with the the path `s3a://bucketA/user/fileA`, then this file will be created physically at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user`. *Op1:* Create a file with the path `s3a://bucketA/user/fileA`, then this file will be created physically at `hdfs://cluster/user/fileA`. This delegation happened based on the first configuration parameter in above configurations. Here `/user` mapped to `hdfs://cluster/user`.
*Op2:* Create a file the the path `s3a://bucketA/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`. *Op2:* Create a file the path `s3a://bucketA/data/datafile`, then this file will be created at `o3fs://bucket1.volume1.omhost/data/datafile`. This delegation happened based on second configurations parameter in above configurations. Here `/data` was mapped with `o3fs://bucket1.volume1.omhost/data/`.
*Op3:* Create a file with the the path `s3a://bucketA/salesDB/dbfile`, then physically this file will be created at `s3a://bucketA/salesDB/dbfile`. This delegation happened based on the third configuration parameter in above configurations. Here `/salesDB` was mapped to `s3a://bucket1/salesDB`. *Op3:* Create a file with the path `s3a://bucketA/salesDB/dbfile`, then physically this file will be created at `s3a://bucketA/salesDB/dbfile`. This delegation happened based on the third configuration parameter in above configurations. Here `/salesDB` was mapped to `s3a://bucket1/salesDB`.
Note: In above examples we used create operation only, but the same mechanism applies to any other file system APIs here. Note: In above examples we used create operation only, but the same mechanism applies to any other file system APIs here.

View File

@ -41,7 +41,7 @@ We cannot ensure complete binary compatibility with the applications that use **
Not Supported Not Supported
------------- -------------
MRAdmin has been removed in MRv2 because because `mradmin` commands no longer exist. They have been replaced by the commands in `rmadmin`. We neither support binary compatibility nor source compatibility for the applications that use this class directly. MRAdmin has been removed in MRv2 because `mradmin` commands no longer exist. They have been replaced by the commands in `rmadmin`. We neither support binary compatibility nor source compatibility for the applications that use this class directly.
Tradeoffs between MRv1 Users and Early MRv2 Adopters Tradeoffs between MRv1 Users and Early MRv2 Adopters
---------------------------------------------------- ----------------------------------------------------

View File

@ -30,7 +30,7 @@ Hadoop has an option parsing framework that employs parsing generic options as w
|:---- |:---- | |:---- |:---- |
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. | | SHELL\_OPTIONS | The common set of shell options. These are documented on the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. | | GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the [Hadoop Commands Reference](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). | | COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
User Commands User Commands
------------- -------------

View File

@ -113,7 +113,7 @@ No renaming takes place —the files are left in their original location.
The directory treewalk is single-threaded, then it is `O(directories)`, The directory treewalk is single-threaded, then it is `O(directories)`,
with each directory listing using one or more paged LIST calls. with each directory listing using one or more paged LIST calls.
This is simple, and for most tasks, the scan is off the critical path of of the job. This is simple, and for most tasks, the scan is off the critical path of the job.
Statistics analysis may justify moving to parallel scans in future. Statistics analysis may justify moving to parallel scans in future.
@ -332,4 +332,4 @@ Any store/FS which supports auditing is able to collect this data
and include in their logs. and include in their logs.
To ease backporting, all audit integration is in the single class To ease backporting, all audit integration is in the single class
`org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.AuditingIntegration`. `org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.AuditingIntegration`.

View File

@ -213,7 +213,7 @@ the same correctness guarantees as the v1 algorithm.
attempt working directory to their final destination path, holding back on attempt working directory to their final destination path, holding back on
the final manifestation POST. the final manifestation POST.
1. A JSON file containing all information needed to complete the upload of all 1. A JSON file containing all information needed to complete the upload of all
files in the task attempt is written to the Job Attempt directory of of the files in the task attempt is written to the Job Attempt directory of the
wrapped committer working with HDFS. wrapped committer working with HDFS.
1. Job commit: load in all the manifest files in the HDFS job attempt directory, 1. Job commit: load in all the manifest files in the HDFS job attempt directory,
then issued the POST request to complete the uploads. These are parallelised. then issued the POST request to complete the uploads. These are parallelised.

View File

@ -76,7 +76,7 @@ The tool works by performing the following procedure:
- its aggregation status has successfully completed - its aggregation status has successfully completed
- has at least ``-minNumberLogFiles`` log files - has at least ``-minNumberLogFiles`` log files
- the sum of its log files size is less than ``-maxTotalLogsSize`` megabytes - the sum of its log files size is less than ``-maxTotalLogsSize`` megabytes
2. If there are are more than ``-maxEligibleApps`` applications found, the 2. If there are more than ``-maxEligibleApps`` applications found, the
newest applications are dropped. They can be processed next time. newest applications are dropped. They can be processed next time.
3. A shell script is generated based on the eligible applications 3. A shell script is generated based on the eligible applications
4. The Distributed Shell program is run with the aformentioned script. It 4. The Distributed Shell program is run with the aformentioned script. It

View File

@ -411,7 +411,7 @@ log4j.logger.org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor=DEBUG
This adds one log line per request -and does provide some insight into This adds one log line per request -and does provide some insight into
communications between the S3A client and AWS S3. communications between the S3A client and AWS S3.
For low-level debugging of the Auditing system, such as when when spans are For low-level debugging of the Auditing system, such as when spans are
entered and exited, set the log to `TRACE`: entered and exited, set the log to `TRACE`:
``` ```

View File

@ -63,7 +63,7 @@ entries, the duration of the lock is low.
In contrast to a "real" filesystem, Amazon's S3A object store, similar to In contrast to a "real" filesystem, Amazon's S3A object store, similar to
most others, does not support `rename()` at all. A hash operation on the filename most others, does not support `rename()` at all. A hash operation on the filename
determines the location of of the data —there is no separate metadata to change. determines the location of the data —there is no separate metadata to change.
To mimic renaming, the Hadoop S3A client has to copy the data to a new object To mimic renaming, the Hadoop S3A client has to copy the data to a new object
with the destination filename, then delete the original entry. This copy with the destination filename, then delete the original entry. This copy
can be executed server-side, but as it does not complete until the in-cluster can be executed server-side, but as it does not complete until the in-cluster
@ -79,7 +79,7 @@ The solution to this problem is closely coupled to the S3 protocol itself:
delayed completion of multi-part PUT operations delayed completion of multi-part PUT operations
That is: tasks write all data as multipart uploads, *but delay the final That is: tasks write all data as multipart uploads, *but delay the final
commit action until until the final, single job commit action.* Only that commit action until the final, single job commit action.* Only that
data committed in the job commit action will be made visible; work from speculative data committed in the job commit action will be made visible; work from speculative
and failed tasks will not be instantiated. As there is no rename, there is no and failed tasks will not be instantiated. As there is no rename, there is no
delay while data is copied from a temporary directory to the final directory. delay while data is copied from a temporary directory to the final directory.
@ -307,7 +307,7 @@ def isCommitJobRepeatable() :
Accordingly, it is a failure point in the protocol. With a low number of files Accordingly, it is a failure point in the protocol. With a low number of files
and fast rename/list algorithms, the window of vulnerability is low. At and fast rename/list algorithms, the window of vulnerability is low. At
scale, the vulnerability increases. It could actually be reduced through scale, the vulnerability increases. It could actually be reduced through
parallel execution of the renaming of of committed tasks. parallel execution of the renaming of committed tasks.
### Job Abort ### Job Abort

View File

@ -89,7 +89,7 @@ of:
* A set of AWS session credentials * A set of AWS session credentials
(`fs.s3a.access.key`, `fs.s3a.secret.key`, `fs.s3a.session.token`). (`fs.s3a.access.key`, `fs.s3a.secret.key`, `fs.s3a.session.token`).
These credentials are obtained from the AWS Secure Token Service (STS) when the the token is issued. These credentials are obtained from the AWS Secure Token Service (STS) when the token is issued.
* A set of AWS session credentials binding the user to a specific AWS IAM Role, * A set of AWS session credentials binding the user to a specific AWS IAM Role,
further restricted to only access the S3 bucket. further restricted to only access the S3 bucket.
Again, these credentials are requested when the token is issued. Again, these credentials are requested when the token is issued.

View File

@ -218,7 +218,7 @@ This can have adverse effects on those large directories, again.
In the Presto [S3 connector](https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration), In the Presto [S3 connector](https://prestodb.io/docs/current/connector/hive.html#amazon-s3-configuration),
`mkdirs()` is a no-op. `mkdirs()` is a no-op.
Whenever it lists any path which isn't an object or a prefix of one more more objects, it returns an Whenever it lists any path which isn't an object or a prefix of one more objects, it returns an
empty listing. That is:; by default, every path is an empty directory. empty listing. That is:; by default, every path is an empty directory.
Provided no code probes for a directory existing and fails if it is there, this Provided no code probes for a directory existing and fails if it is there, this
@ -524,7 +524,7 @@ Ignoring 3 markers in authoritative paths
``` ```
All of this S3A bucket _other_ than the authoritative path `/tables` will be safe for All of this S3A bucket _other_ than the authoritative path `/tables` will be safe for
incompatible Hadoop releases to to use. incompatible Hadoop releases to use.
### <a name="marker-tool-clean"></a>`markers clean` ### <a name="marker-tool-clean"></a>`markers clean`

View File

@ -505,7 +505,7 @@ providers listed after it will be ignored.
### <a name="auth_simple"></a> Simple name/secret credentials with `SimpleAWSCredentialsProvider`* ### <a name="auth_simple"></a> Simple name/secret credentials with `SimpleAWSCredentialsProvider`*
This is is the standard credential provider, which supports the secret This is the standard credential provider, which supports the secret
key in `fs.s3a.access.key` and token in `fs.s3a.secret.key` key in `fs.s3a.access.key` and token in `fs.s3a.secret.key`
values. values.
@ -1392,7 +1392,7 @@ an S3 implementation that doesn't return eTags.
When `true` (default) and 'Get Object' doesn't return eTag or When `true` (default) and 'Get Object' doesn't return eTag or
version ID (depending on configured 'source'), a `NoVersionAttributeException` version ID (depending on configured 'source'), a `NoVersionAttributeException`
will be thrown. When `false` and and eTag or version ID is not returned, will be thrown. When `false` and eTag or version ID is not returned,
the stream can be read, but without any version checking. the stream can be read, but without any version checking.
@ -1868,7 +1868,7 @@ in byte arrays in the JVM's heap prior to upload.
This *may* be faster than buffering to disk. This *may* be faster than buffering to disk.
The amount of data which can be buffered is limited by the available The amount of data which can be buffered is limited by the available
size of the JVM heap heap. The slower the write bandwidth to S3, the greater size of the JVM heap. The slower the write bandwidth to S3, the greater
the risk of heap overflows. This risk can be mitigated by the risk of heap overflows. This risk can be mitigated by
[tuning the upload settings](#upload_thread_tuning). [tuning the upload settings](#upload_thread_tuning).

View File

@ -122,7 +122,7 @@ Optimised for random IO, specifically the Hadoop `PositionedReadable`
operations —though `seek(offset); read(byte_buffer)` also benefits. operations —though `seek(offset); read(byte_buffer)` also benefits.
Rather than ask for the whole file, the range of the HTTP request is Rather than ask for the whole file, the range of the HTTP request is
set to that that of the length of data desired in the `read` operation set to the length of data desired in the `read` operation
(Rounded up to the readahead value set in `setReadahead()` if necessary). (Rounded up to the readahead value set in `setReadahead()` if necessary).
By reducing the cost of closing existing HTTP requests, this is By reducing the cost of closing existing HTTP requests, this is
@ -172,7 +172,7 @@ sequential to `random`.
This policy essentially recognizes the initial read pattern of columnar This policy essentially recognizes the initial read pattern of columnar
storage formats (e.g. Apache ORC and Apache Parquet), which seek to the end storage formats (e.g. Apache ORC and Apache Parquet), which seek to the end
of a file, read in index data and then seek backwards to selectively read of a file, read in index data and then seek backwards to selectively read
columns. The first seeks may be be expensive compared to the random policy, columns. The first seeks may be expensive compared to the random policy,
however the overall process is much less expensive than either sequentially however the overall process is much less expensive than either sequentially
reading through a file with the `random` policy, or reading columnar data reading through a file with the `random` policy, or reading columnar data
with the `sequential` policy. with the `sequential` policy.
@ -384,7 +384,7 @@ data loss.
Amazon S3 uses a set of front-end servers to provide access to the underlying data. Amazon S3 uses a set of front-end servers to provide access to the underlying data.
The choice of which front-end server to use is handled via load-balancing DNS The choice of which front-end server to use is handled via load-balancing DNS
service: when the IP address of an S3 bucket is looked up, the choice of which service: when the IP address of an S3 bucket is looked up, the choice of which
IP address to return to the client is made based on the the current load IP address to return to the client is made based on the current load
of the front-end servers. of the front-end servers.
Over time, the load across the front-end changes, so those servers considered Over time, the load across the front-end changes, so those servers considered
@ -694,4 +694,4 @@ connectors for other buckets, would end up blocking too.
Consider experimenting with this when running applications Consider experimenting with this when running applications
where many threads may try to simultaneously interact where many threads may try to simultaneously interact
with the same slow-to-initialize object stores. with the same slow-to-initialize object stores.

View File

@ -738,7 +738,7 @@ at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecor
``` ```
The underlying problem is that the gzip decompressor is automatically enabled The underlying problem is that the gzip decompressor is automatically enabled
when the the source file ends with the ".gz" extension. Because S3 Select when the source file ends with the ".gz" extension. Because S3 Select
returns decompressed data, the codec fails. returns decompressed data, the codec fails.
The workaround here is to declare that the job should add the "Passthrough Codec" The workaround here is to declare that the job should add the "Passthrough Codec"

View File

@ -29,7 +29,7 @@ the S3A connector**
- - - - - -
## <a name="migrating"></a> How to migrate to to the S3A client ## <a name="migrating"></a> How to migrate to the S3A client
1. Keep the `hadoop-aws` JAR on your classpath. 1. Keep the `hadoop-aws` JAR on your classpath.

View File

@ -324,7 +324,7 @@ There's two main causes
If you see this and you are trying to use the S3A connector with Spark, then the cause can If you see this and you are trying to use the S3A connector with Spark, then the cause can
be that the isolated classloader used to load Hive classes is interfering with the S3A be that the isolated classloader used to load Hive classes is interfering with the S3A
connector's dynamic loading of `com.amazonaws` classes. To fix this, declare that that connector's dynamic loading of `com.amazonaws` classes. To fix this, declare that
the classes in the aws SDK are loaded from the same classloader which instantiated the classes in the aws SDK are loaded from the same classloader which instantiated
the S3A FileSystem instance: the S3A FileSystem instance:

View File

@ -435,7 +435,7 @@ The service is expected to return a response in JSON format:
### Delegation token support in WASB ### Delegation token support in WASB
Delegation token support support can be enabled in WASB using the following configuration: Delegation token support can be enabled in WASB using the following configuration:
```xml ```xml
<property> <property>
@ -507,7 +507,7 @@ The cache is maintained at a filesystem object level.
</property> </property>
``` ```
The maximum number of entries that that cache can hold can be customized using the following setting: The maximum number of entries that the cache can hold can be customized using the following setting:
``` ```
<property> <property>
<name>fs.azure.authorization.caching.maxentries</name> <name>fs.azure.authorization.caching.maxentries</name>

View File

@ -618,7 +618,7 @@ Below are the pre-requiste steps to follow:
``` ```
XInclude is supported, so for extra security secrets may be XInclude is supported, so for extra security secrets may be
kept out of the source tree then referenced through an an XInclude element: kept out of the source tree then referenced through an XInclude element:
<include xmlns="http://www.w3.org/2001/XInclude" <include xmlns="http://www.w3.org/2001/XInclude"
href="/users/self/.secrets/auth-keys.xml" /> href="/users/self/.secrets/auth-keys.xml" />

View File

@ -150,7 +150,7 @@ yarn.scheduler.capacity.root.marketing.accessible-node-labels.GPU.capacity=50
yarn.scheduler.capacity.root.engineering.default-node-label-expression=GPU yarn.scheduler.capacity.root.engineering.default-node-label-expression=GPU
``` ```
You can see root.engineering/marketing/sales.capacity=33, so each of them can has guaranteed resource equals to 1/3 of resource **without partition**. So each of them can use 1/3 resource of h1..h4, which is 24 * 4 * (1/3) = (32G mem, 32 v-cores). You can see root.engineering/marketing/sales.capacity=33, so each of them has guaranteed resource equals to 1/3 of resource **without partition**. So each of them can use 1/3 resource of h1..h4, which is 24 * 4 * (1/3) = (32G mem, 32 v-cores).
And only engineering/marketing queue has permission to access GPU partition (see root.`<queue-name>`.accessible-node-labels). And only engineering/marketing queue has permission to access GPU partition (see root.`<queue-name>`.accessible-node-labels).

View File

@ -89,13 +89,13 @@ The Timeline Domain offers a namespace for Timeline server allowing
users to host multiple entities, isolating them from other users and applications. users to host multiple entities, isolating them from other users and applications.
Timeline server Security is defined at this level. Timeline server Security is defined at this level.
A "Domain" primarily stores owner info, read and& write ACL information, A "Domain" primarily stores owner info, read and write ACL information,
created and modified time stamp information. Each Domain is identified by an ID which created and modified time stamp information. Each Domain is identified by an ID which
must be unique across all users in the YARN cluster. must be unique across all users in the YARN cluster.
#### Timeline Entity #### Timeline Entity
A Timeline Entity contains the the meta information of a conceptual entity A Timeline Entity contains the meta information of a conceptual entity
and its related events. and its related events.
The entity can be an application, an application attempt, a container or The entity can be an application, an application attempt, a container or
@ -199,7 +199,7 @@ to `kerberos`, after which the following configuration options are available:
| `yarn.timeline-service.http-authentication.type` | Defines authentication used for the timeline server HTTP endpoint. Supported values are: `simple` / `kerberos` / #AUTHENTICATION_HANDLER_CLASSNAME#. Defaults to `simple`. | | `yarn.timeline-service.http-authentication.type` | Defines authentication used for the timeline server HTTP endpoint. Supported values are: `simple` / `kerberos` / #AUTHENTICATION_HANDLER_CLASSNAME#. Defaults to `simple`. |
| `yarn.timeline-service.http-authentication.simple.anonymous.allowed` | Indicates if anonymous requests are allowed by the timeline server when using 'simple' authentication. Defaults to `true`. | | `yarn.timeline-service.http-authentication.simple.anonymous.allowed` | Indicates if anonymous requests are allowed by the timeline server when using 'simple' authentication. Defaults to `true`. |
| `yarn.timeline-service.principal` | The Kerberos principal for the timeline server. | | `yarn.timeline-service.principal` | The Kerberos principal for the timeline server. |
| `yarn.timeline-service.keytab` | The Kerberos keytab for the timeline server. Defaults on Unix to to `/etc/krb5.keytab`. | | `yarn.timeline-service.keytab` | The Kerberos keytab for the timeline server. Defaults on Unix to `/etc/krb5.keytab`. |
| `yarn.timeline-service.delegation.key.update-interval` | Defaults to `86400000` (1 day). | | `yarn.timeline-service.delegation.key.update-interval` | Defaults to `86400000` (1 day). |
| `yarn.timeline-service.delegation.token.renew-interval` | Defaults to `86400000` (1 day). | | `yarn.timeline-service.delegation.token.renew-interval` | Defaults to `86400000` (1 day). |
| `yarn.timeline-service.delegation.token.max-lifetime` | Defaults to `604800000` (7 days). | | `yarn.timeline-service.delegation.token.max-lifetime` | Defaults to `604800000` (7 days). |

View File

@ -865,7 +865,7 @@ none of the apps match the predicates, an empty list will be returned.
1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal 1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters. to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the 1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/> metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
"(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)".<br/> "(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)".<br/>
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/> Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is "eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
@ -998,7 +998,7 @@ match the predicates, an empty list will be returned.
1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal 1. `conffilters` - If specified, matched applications must have exact matches to the given config name and must be either equal or not equal
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters. to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the 1. `metricfilters` - If specified, matched applications must have exact matches to the given metric and satisfy the specified relation with the
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/> metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
"(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)".<br/> "(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)".<br/>
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/> Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is "eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
@ -1205,7 +1205,7 @@ If none of the entities match the predicates, an empty list will be returned.
1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal 1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters. to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the 1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/> metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
"(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)"<br/> "(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)"<br/>
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/> Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is "eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is
@ -1341,7 +1341,7 @@ If none of the entities match the predicates, an empty list will be returned.
1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal 1. `conffilters` - If specified, matched entities must have exact matches to the given config name and must be either equal or not equal
to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters. to the given config value. Both the config name and value must be strings. conffilters are represented in the same form as infofilters.
1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the 1. `metricfilters` - If specified, matched entities must have exact matches to the given metric and satisfy the specified relation with the
metric value. Metric id must be a string and and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/> metric value. Metric id must be a string and metric value must be an integral value. metricfilters are represented as an expression of the form :<br/>
"(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)"<br/> "(&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;) &lt;op&gt; (&lt;metricid&gt; &lt;compareop&gt; &lt;metricvalue&gt;)"<br/>
Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/> Here op can be either of AND or OR. And compareop can be either of "eq", "ne", "ene", "gt", "ge", "lt" and "le".<br/>
"eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is "eq" means equals, "ne" means not equals and existence of metric is not required for a match, "ene" means not equals but existence of metric is

View File

@ -30,7 +30,7 @@ YARN has an option parsing framework that employs parsing generic options as wel
|:---- |:---- | |:---- |:---- |
| SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. | | SHELL\_OPTIONS | The common set of shell options. These are documented on the [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Shell_Options) page. |
| GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. | | GENERIC\_OPTIONS | The common set of options supported by multiple commands. See the Hadoop [Commands Manual](../../hadoop-project-dist/hadoop-common/CommandsManual.html#Generic_Options) for more information. |
| COMMAND COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). | | COMMAND\_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into [User Commands](#User_Commands) and [Administration Commands](#Administration_Commands). |
User Commands User Commands
------------- -------------

View File

@ -535,7 +535,7 @@ PUT URL - http://localhost:8088/app/v1/services/hello-world
#### PUT Request JSON #### PUT Request JSON
Note, irrespective of what the current lifetime value is, this update request will set the lifetime of the service to be 3600 seconds (1 hour) from the time the request is submitted. Hence, if a a service has remaining lifetime of 5 mins (say) and would like to extend it to an hour OR if an application has remaining lifetime of 5 hours (say) and would like to reduce it down to an hour, then for both scenarios you need to submit the same request below. Note, irrespective of what the current lifetime value is, this update request will set the lifetime of the service to be 3600 seconds (1 hour) from the time the request is submitted. Hence, if a service has remaining lifetime of 5 mins (say) and would like to extend it to an hour OR if an application has remaining lifetime of 5 hours (say) and would like to reduce it down to an hour, then for both scenarios you need to submit the same request below.
```json ```json
{ {