YARN-11524. Improve the Policy Description in Federation.md. (#5797)

This commit is contained in:
slfan1989 2023-07-12 00:54:06 +08:00 committed by GitHub
parent c13d92996d
commit 33b1677e9e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -157,7 +157,7 @@ Configuration
To configure the `YARN` to use the `Federation`, set the following property in the **conf/yarn-site.xml**:
###EVERYWHERE:
### EVERYWHERE:
These are common configurations that should appear in the **conf/yarn-site.xml** at each machine in the federation.
@ -167,7 +167,7 @@ These are common configurations that should appear in the **conf/yarn-site.xml**
|`yarn.federation.enabled` | `true` | Whether federation is enabled or not |
|`yarn.resourcemanager.cluster-id` | `<unique-subcluster-id>` | The unique subcluster identifier for this RM (same as the one used for HA). |
####State-Store:
#### State-Store:
Currently, we support ZooKeeper and SQL based implementations of the state-store.
@ -192,7 +192,7 @@ SQL: one must setup the following parameters:
We provide scripts for **MySQL** and **Microsoft SQL Server**.
> MySQL
- MySQL
For MySQL, one must download the latest jar version 5.x from [MVN Repository](https://mvnrepository.com/artifact/mysql/mysql-connector-java) and add it to the CLASSPATH.
Then the DB schema is created by executing the following SQL scripts in the database:
@ -211,7 +211,7 @@ In the same directory we provide scripts to drop the Stored Procedures, the Tabl
1. MySQL 5.7
2. MySQL 8.0
> Microsoft SQL Server
- Microsoft SQL Server
For SQL-Server, the process is similar, but the jdbc driver is already included.
SQL-Server scripts are located in **sbin/FederationStateStore/SQLServer/**.
@ -221,10 +221,10 @@ SQL-Server scripts are located in **sbin/FederationStateStore/SQLServer/**.
1. SQL Server 2008 R2 Enterprise
2. SQL Server 2012 Enterprise
3. SQL Server 2016 Enterprise
4. SQL Server 2017 Enterprise
4. SQL Server 2017 Enterprise
5. SQL Server 2019 Enterprise
####Optional:
#### Optional:
| Property | Example | Description |
|:---- |:---- |:---- |
@ -235,7 +235,88 @@ SQL-Server scripts are located in **sbin/FederationStateStore/SQLServer/**.
|`yarn.federation.subcluster-resolver.class` | `org.apache.hadoop.yarn.server.federation.resolver.DefaultSubClusterResolverImpl` | The class used to resolve which subcluster a node belongs to, and which subcluster(s) a rack belongs to. |
|`yarn.federation.machine-list` | `<path of machine-list file>` | Path of machine-list file used by `SubClusterResolver`. Each line of the file is a node with sub-cluster and rack information. Below is the example: <br/> <br/> node1, subcluster1, rack1 <br/> node2, subcluster2, rack1 <br/> node3, subcluster3, rack2 <br/> node4, subcluster3, rack2 |
###ON RMs:
**How to configure the policy-manager?**
- Router Policy
Router Policy defines the logic for determining the routing of an application submission and determines the HomeSubCluster for the application.
- HashBasedRouterPolicy
- This policy selects a sub-cluster based on the hash of the job's queue name. It is particularly useful when dealing with a large number of queues in a system, providing a default behavior. Furthermore, it ensures that all jobs belonging to the same queue are consistently mapped to the same sub-cluster, which can improve locality and performance.
- LoadBasedRouterPolicy
- This is a simplified load-balancing policy implementation. The policy utilizes binary weights (0/1 values) to enable or disable each sub-cluster. It selects the sub-cluster with the least load to forward the application traffic, ensuring optimal distribution.
- LocalityRouterPolicy
- This policy selects the sub-cluster based on the node specified by the client for running its application. Follows these conditions:
- It succeeds if
- There are three AMContainerResourceRequests in the order NODE, RACK, ANY
- Falls back to WeightedRandomRouterPolicy
- Null or empty AMContainerResourceRequests;
- One AMContainerResourceRequests and it has ANY as ResourceName;
- The node is in blacklisted SubClusters.
- It fails if
- The node does not exist and RelaxLocality is False;
- We have an invalid number (not 0, 1 or 3) resource requests
- RejectRouterPolicy
- This policy simply rejects all incoming requests.
- UniformRandomRouterPolicy
- This simple policy picks at uniform random among any of the currently active sub-clusters. This policy is easy to use and good for testing.
- WeightedRandomRouterPolicy
- This policy implements a weighted random sample among currently active sub-clusters.
- AMRM Policy
AMRM Proxy defines the logic to split the resource request list received by AM among RMs.
- BroadcastAMRMProxyPolicy
- This policy simply broadcasts each ResourceRequest to all the available sub-clusters.
- HomeAMRMProxyPolicy
- This policy simply sends the ResourceRequest to the home sub-cluster.
- LocalityMulticastAMRMProxyPolicy
- Host localized ResourceRequests are always forwarded to the RM that owns the corresponding node, based on the feedback of a SubClusterResolver
If the SubClusterResolver cannot resolve this node we default to forwarding the ResourceRequest to the home sub-cluster.
- Rack localized ResourceRequests are forwarded to the RMs that owns the corresponding rack. Note that in some deployments each rack could be
striped across multiple RMs. This policy respects that. If the SubClusterResolver cannot resolve this rack we default to forwarding
the ResourceRequest to the home sub-cluster.
- ANY requests corresponding to node/rack local requests are forwarded only to the set of RMs that owns the corresponding localized requests. The number of
containers listed in each ANY is proportional to the number of localized container requests (associated to this ANY via the same allocateRequestId).
- RejectAMRMProxyPolicy
- This policy simply rejects all requests. Useful to prevent apps from accessing any sub-cluster.
- Policy Manager
The PolicyManager is providing a combination of RouterPolicy and AMRMPolicy.
We can set policy-manager like this:
```xml
<!--
We provide 6 PolicyManagers, They have a common prefix: org.apache.hadoop.yarn.server.federation.policies.manager
1. HashBroadcastPolicyManager
2. HomePolicyManager
3. PriorityBroadcastPolicyManager
4. RejectAllPolicyManager
5. UniformBroadcastPolicyManager
6. WeightedLocalityPolicyManager
-->
<property>
<name>yarn.federation.policy-manager</name>
<value>org.apache.hadoop.yarn.server.federation.policies.manager.HashBroadcastPolicyManager</value>
</property>
```
- HashBroadcastPolicyManager
- Policy that routes applications via hashing of their queuename, and broadcast resource requests. This picks a HashBasedRouterPolicy for the router and a BroadcastAMRMProxyPolicy for the amrmproxy as they are designed to work together.
- HomePolicyManager
- Policy manager which uses the UniformRandomRouterPolicy for the Router and HomeAMRMProxyPolicy as the AMRMProxy policy to find the RM.
- PriorityBroadcastPolicyManager
- Policy that allows operator to configure "weights" for routing. This picks a PriorityRouterPolicy for the router and a BroadcastAMRMProxyPolicy for the amrmproxy as they are designed to work together.
- RejectAllPolicyManager
- This policy rejects all requests for both router and amrmproxy routing. This picks a RejectRouterPolicy for the router and a RejectAMRMProxyPolicy for the amrmproxy as they are designed to work together.
- UniformBroadcastPolicyManager
- It combines the basic policies: UniformRandomRouterPolicy and BroadcastAMRMProxyPolicy, which are designed to work together and "spread" the load among sub-clusters uniformly. This simple policy might impose heavy load on the RMs and return more containers than a job requested as all requests are (replicated and) broadcasted.
- WeightedLocalityPolicyManager
- Policy that allows operator to configure "weights" for routing. This picks a LocalityRouterPolicy for the router and a LocalityMulticastAMRMProxyPolicy for the amrmproxy as they are designed to work together.
### ON RMs:
These are extra configurations that should appear in the **conf/yarn-site.xml** at each ResourceManager.