YARN-11444. Improve YARN md documentation format. (#6711) Contributed by Shilun Fan.

Reviewed-by: Ayush Saxena <ayushsaxena@apache.org>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
This commit is contained in:
slfan1989 2024-04-07 20:50:46 +08:00 committed by GitHub
parent 73e6931ed0
commit 8c378d1ea1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
14 changed files with 22 additions and 22 deletions

View File

@ -633,7 +633,7 @@ The following configuration parameters can be configured in yarn-site.xml to con
| `yarn.resourcemanager.reservation-system.planfollower.time-step` | *Optional* parameter: the frequency in milliseconds of the `PlanFollower` timer. Long value expected. The default value is *1000*. | | `yarn.resourcemanager.reservation-system.planfollower.time-step` | *Optional* parameter: the frequency in milliseconds of the `PlanFollower` timer. Long value expected. The default value is *1000*. |
The `ReservationSystem` is integrated with the `CapacityScheduler` queue hierachy and can be configured for any **LeafQueue** currently. The `CapacityScheduler` supports the following parameters to tune the `ReservationSystem`: The `ReservationSystem` is integrated with the `CapacityScheduler` queue hierarchy and can be configured for any **LeafQueue** currently. The `CapacityScheduler` supports the following parameters to tune the `ReservationSystem`:
| Property | Description | | Property | Description |
|:---- |:---- | |:---- |:---- |
@ -879,7 +879,7 @@ Changing queue/scheduler properties and adding/removing queues can be done in tw
Remove the queue configurations from the file and run refresh as described above Remove the queue configurations from the file and run refresh as described above
### Enabling periodic configuration refresh ### Enabling periodic configuration refresh
Enabling queue configuration periodic refresh allows reloading and applying the configuration by editing the *conf/capacity-scheduler.xml* without the necessicity of calling yarn rmadmin -refreshQueues. Enabling queue configuration periodic refresh allows reloading and applying the configuration by editing the *conf/capacity-scheduler.xml* without the necessity of calling yarn rmadmin -refreshQueues.
| Property | Description | | Property | Description |
|:---- |:---- | |:---- |:---- |

View File

@ -173,5 +173,5 @@ class and want to give it a try in your Hadoop cluster.
Firstly, put the jar file under a directory in Hadooop classpath. Firstly, put the jar file under a directory in Hadooop classpath.
(recommend $HADOOP_COMMOND_HOME/share/hadoop/yarn). Secondly, (recommend $HADOOP_COMMAND_HOME/share/hadoop/yarn). Secondly,
follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN. follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN.

View File

@ -216,7 +216,7 @@ The following properties should be set in yarn-site.xml:
Optional. This configuration setting determines the capabilities Optional. This configuration setting determines the capabilities
assigned to docker containers when they are launched. While these may not assigned to docker containers when they are launched. While these may not
be case-sensitive from a docker perspective, it is best to keep these be case-sensitive from a docker perspective, it is best to keep these
uppercase. To run without any capabilites, set this value to uppercase. To run without any capabilities, set this value to
"none" or "NONE" "none" or "NONE"
</description> </description>
</property> </property>
@ -568,7 +568,7 @@ There are several challenges with this bind mount approach that need to be
considered. considered.
1. Any users and groups defined in the image will be overwritten by the host's users and groups 1. Any users and groups defined in the image will be overwritten by the host's users and groups
2. No users and groups can be added once the container is started, as /etc/passwd and /etc/group are immutible in the container. Do not mount these read-write as it can render the host inoperable. 2. No users and groups can be added once the container is started, as /etc/passwd and /etc/group are immutable in the container. Do not mount these read-write as it can render the host inoperable.
This approach is not recommended beyond testing given the inflexibility to This approach is not recommended beyond testing given the inflexibility to
modify running containers. modify running containers.
@ -715,7 +715,7 @@ Fine grained access control can also be defined using `docker.privileged-contain
docker.trusted.registries=library docker.trusted.registries=library
``` ```
In development environment, local images can be tagged with a repository name prefix to enable trust. The recommendation of choosing a repository name is using a local hostname and port number to prevent accidentially pulling docker images from Docker Hub or use reserved Docker Hub keyword: "local". Docker run will look for docker images on Docker Hub, if the image does not exist locally. Using a local hostname and port in image name can prevent accidental pulling of canonical images from docker hub. Example of tagging image with localhost:5000 as trusted registry: In development environment, local images can be tagged with a repository name prefix to enable trust. The recommendation of choosing a repository name is using a local hostname and port number to prevent accidentally pulling docker images from Docker Hub or use reserved Docker Hub keyword: "local". Docker run will look for docker images on Docker Hub, if the image does not exist locally. Using a local hostname and port in image name can prevent accidental pulling of canonical images from docker hub. Example of tagging image with localhost:5000 as trusted registry:
``` ```
docker tag centos:latest localhost:5000/centos:latest docker tag centos:latest localhost:5000/centos:latest

View File

@ -41,7 +41,7 @@ Graceful Decommission of YARN Nodes is the mechanism to decommission NMs while m
To do a normal decommissioning: To do a normal decommissioning:
1. Start a YARN cluster (with NodeManageres and ResourceManager) 1. Start a YARN cluster (with NodeManagers and ResourceManager)
2. Start a yarn job (for example with `yarn jar...` ) 2. Start a yarn job (for example with `yarn jar...` )
3. Add `yarn.resourcemanager.nodes.exclude-path` property to your `yarn-site.xml` (Note: you don't need to restart the ResourceManager) 3. Add `yarn.resourcemanager.nodes.exclude-path` property to your `yarn-site.xml` (Note: you don't need to restart the ResourceManager)
4. Create a text file (the location is defined in the previous step) with one line which contains the name of a selected NodeManager 4. Create a text file (the location is defined in the previous step) with one line which contains the name of a selected NodeManager
@ -112,7 +112,7 @@ host3
Note: In the future more file formats are planned with timeout support. Follow the [YARN-5536](https://issues.apache.org/jira/browse/YARN-5536) if you are interested. Note: In the future more file formats are planned with timeout support. Follow the [YARN-5536](https://issues.apache.org/jira/browse/YARN-5536) if you are interested.
Important to mention, that the timeout is not persited. In case of a RM restart/failover the node will be immediatelly decommission. (Follow the [YARN-5464](https://issues.apache.org/jira/browse/YARN-5464) for changes in this behavior). Important to mention, that the timeout is not persisted. In case of a RM restart/failover the node will be immediately decommission. (Follow the [YARN-5464](https://issues.apache.org/jira/browse/YARN-5464) for changes in this behavior).
### Client or server side timeout ### Client or server side timeout

View File

@ -106,7 +106,7 @@ Step 4. Configure a valid RPC address for the NodeManager.
Step 5. Auxiliary services. Step 5. Auxiliary services.
* NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitializes the auxiliary service. * NodeManagers in a YARN cluster can be configured to run auxiliary services. For a completely functional NM restart, YARN relies on any auxiliary service configured to also support recovery. This usually includes (1) avoiding usage of ephemeral ports so that previously running clients (in this case, usually containers) are not disrupted after restart and (2) having the auxiliary service itself support recoverability by reloading any previous state when NodeManager restarts and reinitialized the auxiliary service.
* A simple example for the above is the auxiliary service 'ShuffleHandler' for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don't have to do anything for it to support NM restart: (1) The configuration property **mapreduce.shuffle.port** controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts. * A simple example for the above is the auxiliary service 'ShuffleHandler' for MapReduce (MR). ShuffleHandler respects the above two requirements already, so users/admins don't have to do anything for it to support NM restart: (1) The configuration property **mapreduce.shuffle.port** controls which port the ShuffleHandler on a NodeManager host binds to, and it defaults to a non-ephemeral port. (2) The ShuffleHandler service also already supports recovery of previous state after NM restarts.

View File

@ -52,7 +52,7 @@ There is no reason to set them both. If the system runs with swap disabled, both
Virtual memory measurement and swapping Virtual memory measurement and swapping
-------------------------------------------- --------------------------------------------
There is a difference between the virtual memory reported by the container monitor and the virtual memory limit specified in the elastic memory control feature. The container monitor uses `ProcfsBasedProcessTree` by default for measurements that returns values from the `proc` file system. The virtual memory returned is the size of the address space of all the processes in each container. This includes anonymous pages, pages swapped out to disk, mapped files and reserved pages among others. Reserved pages are not backed by either physical or swapped memory. They can be a large part of the virtual memory usage. The reservabe address space was limited on 32 bit processors but it is very large on 64-bit ones making this metric less useful. Some Java Virtual Machines reserve large amounts of pages but they do not actually use it. This will result in gigabytes of virtual memory usage shown. However, this does not mean that anything is wrong with the container. There is a difference between the virtual memory reported by the container monitor and the virtual memory limit specified in the elastic memory control feature. The container monitor uses `ProcfsBasedProcessTree` by default for measurements that returns values from the `proc` file system. The virtual memory returned is the size of the address space of all the processes in each container. This includes anonymous pages, pages swapped out to disk, mapped files and reserved pages among others. Reserved pages are not backed by either physical or swapped memory. They can be a large part of the virtual memory usage. The reservable address space was limited on 32 bit processors but it is very large on 64-bit ones making this metric less useful. Some Java Virtual Machines reserve large amounts of pages but they do not actually use it. This will result in gigabytes of virtual memory usage shown. However, this does not mean that anything is wrong with the container.
Because of this you can now use `CGroupsResourceCalculator`. This shows only the sum of the physical memory usage and swapped pages as virtual memory usage excluding the reserved address space. This reflects much better what the application and the container allocated. Because of this you can now use `CGroupsResourceCalculator`. This shows only the sum of the physical memory usage and swapped pages as virtual memory usage excluding the reserved address space. This reflects much better what the application and the container allocated.

View File

@ -29,7 +29,7 @@ Some of the pain points for current device plugin development and integration
are listed below: are listed below:
* At least 6 classes to be implemented (If you wanna support * At least 6 classes to be implemented (If you want to support
Docker, youll implement one more “DockerCommandPlugin”). Docker, youll implement one more “DockerCommandPlugin”).
* When implementing the “ResourceHandler” interface, * When implementing the “ResourceHandler” interface,
the developer must understand the YARN NM internal concepts like container the developer must understand the YARN NM internal concepts like container

View File

@ -41,7 +41,7 @@ With reference to the figure above, a typical reservation proceeds as follows:
* **Step 2** The ReservationSystem leverages a ReservationAgent (GREE in the figure) to find a plausible allocation for the reservation in the Plan, a data structure tracking all reservation currently accepted and the available resources in the system. * **Step 2** The ReservationSystem leverages a ReservationAgent (GREE in the figure) to find a plausible allocation for the reservation in the Plan, a data structure tracking all reservation currently accepted and the available resources in the system.
* **Step 3** The SharingPolicy provides a way to enforce invariants on the reservation being accepted, potentially rejecting reservations. For example, the CapacityOvertimePolicy allows enforcement of both instantaneous max-capacity a user can request across all of his/her reservations and a limit on the integral of resources over a period of time, e.g., the user can reserve up to 50% of the cluster capacity instantanesouly, but in any 24h period of time he/she cannot exceed 10% average. * **Step 3** The SharingPolicy provides a way to enforce invariants on the reservation being accepted, potentially rejecting reservations. For example, the CapacityOvertimePolicy allows enforcement of both instantaneous max-capacity a user can request across all of his/her reservations and a limit on the integral of resources over a period of time, e.g., the user can reserve up to 50% of the cluster capacity instantaneously, but in any 24h period of time he/she cannot exceed 10% average.
* **Step 4** Upon a successful validation the ReservationSystem returns to the user a ReservationId (think of it as an airline ticket). * **Step 4** Upon a successful validation the ReservationSystem returns to the user a ReservationId (think of it as an airline ticket).

View File

@ -130,7 +130,7 @@ resource may also have optional minimum and maximum properties. The properties
must be named `yarn.resource-types.<resource>.minimum-allocation` and must be named `yarn.resource-types.<resource>.minimum-allocation` and
`yarn.resource-types.<resource>.maximum-allocation`. `yarn.resource-types.<resource>.maximum-allocation`.
The `yarn.resource-types` property and any unit, mimimum, or maximum properties The `yarn.resource-types` property and any unit, minimum, or maximum properties
may be defined in either the usual `yarn-site.xml` file or in a file named may be defined in either the usual `yarn-site.xml` file or in a file named
`resource-types.xml`. For example, the following could appear in either file: `resource-types.xml`. For example, the following could appear in either file:

View File

@ -651,7 +651,7 @@ There are several challenges with this bind mount approach that need to be
considered. considered.
1. Any users and groups defined in the image will be overwritten by the host's users and groups 1. Any users and groups defined in the image will be overwritten by the host's users and groups
2. No users and groups can be added once the container is started, as /etc/passwd and /etc/group are immutible in the container. Do not mount these read-write as it can render the host inoperable. 2. No users and groups can be added once the container is started, as /etc/passwd and /etc/group are immutable in the container. Do not mount these read-write as it can render the host inoperable.
This approach is not recommended beyond testing given the inflexibility to This approach is not recommended beyond testing given the inflexibility to
modify running containers. modify running containers.

View File

@ -859,7 +859,7 @@ Below is the elements of a single event object. Note that `value` of
| Item | Data Type | Description| | Item | Data Type | Description|
|:---- |:---- |:---- | |:---- |:---- |:---- |
| `eventtype` | string | The event type | | `eventtype` | string | The event type |
| `eventinfo` | map | The information of the event, which is orgainzied in a map of `key` : `value` | | `eventinfo` | map | The information of the event, which is organized in a map of `key` : `value` |
| `timestamp` | long | The timestamp of the event | | `timestamp` | long | The timestamp of the event |
### Response Examples: ### Response Examples:
@ -1317,7 +1317,7 @@ None
| `queue` | string | The queue to which the application submitted | | `queue` | string | The queue to which the application submitted |
| `appState` | string | The application state according to the ResourceManager - valid values are members of the YarnApplicationState enum: `FINISHED`, `FAILED`, `KILLED` | | `appState` | string | The application state according to the ResourceManager - valid values are members of the YarnApplicationState enum: `FINISHED`, `FAILED`, `KILLED` |
| `finalStatus` | string | The final status of the application if finished - reported by the application itself - valid values are: `UNDEFINED`, `SUCCEEDED`, `FAILED`, `KILLED` | | `finalStatus` | string | The final status of the application if finished - reported by the application itself - valid values are: `UNDEFINED`, `SUCCEEDED`, `FAILED`, `KILLED` |
| `progress` | float | The reported progress of the application as a percent. Long-lived YARN services may not provide a meaninful value here —or use it as a metric of actual vs desired container counts | | `progress` | float | The reported progress of the application as a percent. Long-lived YARN services may not provide a meaningful value here —or use it as a metric of actual vs desired container counts |
| `trackingUrl` | string | The web URL of the application (via the RM Proxy) | | `trackingUrl` | string | The web URL of the application (via the RM Proxy) |
| `originalTrackingUrl` | string | The actual web URL of the application | | `originalTrackingUrl` | string | The actual web URL of the application |
| `diagnosticsInfo` | string | Detailed diagnostics information on a completed application| | `diagnosticsInfo` | string | Detailed diagnostics information on a completed application|
@ -2019,7 +2019,7 @@ querying some entities, such as Domains; here the API deliberately
downgrades permission-denied outcomes as empty and not-founds responses. downgrades permission-denied outcomes as empty and not-founds responses.
This hides details of other domains from an unauthorized caller. This hides details of other domains from an unauthorized caller.
1. If the content of timeline entity PUT operations is invalid, 1. If the content of timeline entity PUT operations is invalid,
this failure *will not* result in an HTTP error code being retured. this failure *will not* result in an HTTP error code being returned.
A status code of 200 will be returned —however, there will be an error code A status code of 200 will be returned —however, there will be an error code
in the list of failed entities for each entity which could not be added. in the list of failed entities for each entity which could not be added.

View File

@ -80,7 +80,7 @@ By default, YARN will automatically detect and config GPUs when above config is
device number of GPUs is using `nvidia-smi -q` and search `Minor Number` device number of GPUs is using `nvidia-smi -q` and search `Minor Number`
output. output.
When minor numbers are specified manually, admin needs to include indice of GPUs When minor numbers are specified manually, admin needs to include indices of GPUs
as well, format is `index:minor_number[,index:minor_number...]`. An example as well, format is `index:minor_number[,index:minor_number...]`. An example
of manual specification is `0:0,1:1,2:2,3:4"`to allow YARN NodeManager to of manual specification is `0:0,1:1,2:2,3:4"`to allow YARN NodeManager to
manage GPU devices with indices `0/1/2/3` and minor number `0/1/2/4`. manage GPU devices with indices `0/1/2/3` and minor number `0/1/2/4`.

View File

@ -48,7 +48,7 @@ Currently only GET is supported. It retrieves information about the resource spe
### Security ### Security
The web service REST API's go through the same security as the web UI. If your cluster adminstrators have filters enabled you must authenticate via the mechanism they specified. The web service REST API's go through the same security as the web UI. If your cluster administrators have filters enabled you must authenticate via the mechanism they specified.
### Headers Supported ### Headers Supported
@ -70,7 +70,7 @@ This release supports gzip compression if you specify gzip in the Accept-Encodin
This release of the web service REST APIs supports responses in JSON and XML formats. JSON is the default. To set the response format, you can specify the format in the Accept header of the HTTP request. This release of the web service REST APIs supports responses in JSON and XML formats. JSON is the default. To set the response format, you can specify the format in the Accept header of the HTTP request.
As specified in HTTP Response Codes, the response body can contain the data that represents the resource or an error message. In the case of success, the response body is in the selected format, either JSON or XML. In the case of error, the resonse body is in either JSON or XML based on the format requested. The Content-Type header of the response contains the format requested. If the application requests an unsupported format, the response status code is 500. Note that the order of the fields within response body is not specified and might change. Also, additional fields might be added to a response body. Therefore, your applications should use parsing routines that can extract data from a response body in any order. As specified in HTTP Response Codes, the response body can contain the data that represents the resource or an error message. In the case of success, the response body is in the selected format, either JSON or XML. In the case of error, the response body is in either JSON or XML based on the format requested. The Content-Type header of the response contains the format requested. If the application requests an unsupported format, the response status code is 500. Note that the order of the fields within response body is not specified and might change. Also, additional fields might be added to a response body. Therefore, your applications should use parsing routines that can extract data from a response body in any order.
### Response Errors ### Response Errors
@ -101,7 +101,7 @@ Response Body:
```json ```json
{ {
app": "app":
{ {
"id":"application_1324057493980_0001", "id":"application_1324057493980_0001",
"user":"user1", "user":"user1",

View File

@ -576,7 +576,7 @@ system property in AM).
`[ ]` Web browser interaction verified in secure cluster. `[ ]` Web browser interaction verified in secure cluster.
`[ ]` REST client interation (GET operations) tested. `[ ]` REST client integration (GET operations) tested.
`[ ]` Application continues to run after Kerberos Token expiry. `[ ]` Application continues to run after Kerberos Token expiry.