HADOOP-16097. Provide proper documentation for FairCallQueue. Contributed by Erik Krogen.

This commit is contained in:
Yiqun Lin 2019-02-13 11:16:04 +08:00
parent 06d7890bdd
commit 7b11b404a3
3 changed files with 151 additions and 0 deletions

View File

@ -0,0 +1,150 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
Fair Call Queue Guide
=================================
<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
Purpose
-------
This document describes how to configure and manage the Fair Call Queue for Hadoop.
Prerequisites
-------------
Make sure Hadoop is installed, configured and setup correctly. For more information see:
* [Single Node Setup](./SingleCluster.html) for first-time users.
* [Cluster Setup](./ClusterSetup.html) for large, distributed clusters.
Overview
--------
Hadoop server components, in particular the HDFS NameNode, experience very heavy RPC load from clients. By default,
all client requests are routed through a first-in, first-out queue and serviced in the order they arrive. This means
that a single user submitting a very large number of requests can easily overwhelm the service, causing degraded service
for all other users. The Fair Call Queue, and related components, aim to mitigate this impact.
Design Details
--------------
There are a few components in the IPC stack which have a complex interplay, each with their own tuning parameters.
The image below presents a schematic overview of their interactions, which will be explained below.
![FairCallQueue Overview](./images/faircallqueue-overview.png)
In the following explanation, **bolded** words refer to named entities or configurables.
When a client makes a request to an IPC server, this request first lands in a **listen queue**. **Reader** threads
remove requests from this queue and pass them to a configurable **RpcScheduler** to be assigned a priority and placed
into a **call queue**; this is where FairCallQueue sits as a pluggable implementation (the other existing
implementation being a FIFO queue). **Handler** threads accept requests out of the call queue, process them, and
respond to the client.
The implementation of RpcScheduler used with FairCallQueue by default is **DecayRpcScheduler**, which maintains a
count of requests received for each user. This count _decays_ over time; every **sweep period** (5s by default),
the number of requests per user is multiplied by a **decay factor** (0.5 by default). This maintains a weighted/rolling
average of request count per user. Every time that a sweep is performed, the call counts for all known users are
ranked from highest to lowest. Each user is assigned a **priority** (0-3 by default, with 0 being highest priority)
based on the proportion of calls originating from that user. The default **priority thresholds** are (0.125, 0.25, 0.5),
meaning that users whose calls make up more than 50% of the total (there can be at most one such user) are placed into
the lowest priority, users whose calls make up between 25% and 50% of the total are in the 2nd lowest, users whose calls
make up between 12.5% and 25% are in the 2nd highest priority, and all other users are placed in the highest priority.
At the end of the sweep, each known user has a cached priority which will be used until the next sweep; new users which
appear between sweeps will have their priority calculated on-the-fly.
Within FairCallQueue, there are multiple **priority queues**, each of which is designated a **weight**. When a request
arrives at the call queue, the request is placed into one of these priority queues based on the current priority
assigned to the call (by the RpcScheduler). When a handler thread attempts to fetch an item from the call queue, which
queue it pulls from is decided via an **RpcMultiplexer**; currently this is hard-coded to be a
**WeightedRoundRobinMultiplexer**. The WRRM serves requests from queues based on their weights; the default weights
for the default 4 priority levels are (8, 4, 2, 1). Thus, the WRRM would serve 8 requests from the highest priority
queue, 4 from the second highest, 2 from the third highest, 1 from the lowest, then serve 8 more from the highest
priority queue, and so on.
In addition to the priority-weighting mechanisms discussed above, there is also a configurable **backoff** mechanism,
in which the server will throw an exception to the client rather than handling it; the client is expected to wait some
time (i.e., via exponential backoff) before trying again. Typically, backoff is triggered when a request is attempted
to be placed in a priority queue (of FCQ) when that queue is full. This helps to push back further on impactful
clients, reducing load, and can have substantial benefit. There is also a feature, **backoff by response time**, which
will cause requests in lower priority levels to back off if requests in higher priority levels are being serviced
too slowly. For example, if the response time threshold for priority 1 is set to be 10 seconds, but the average
response time in that queue is 12 seconds, an incoming request at priority levels 2 or lower would receive a backoff
exception, while requests at priority levels 0 and 1 would proceed as normal. The intent is to force heavier clients to
back off when overall system load is heavy enough to cause high priority clients to be impacted.
The discussion above refers to the **user** of a request when discussing how to group together requests for throttling.
This is configurable via the **identity provider**, which defaults to the **UserIdentityProvider**. The user identity
provider simply uses the username of the client submitting the request. However, a custom identity provider can be used
to performing throttling based on other groupings, or using an external identity provider.
Configuration
-------------
This section describes how to configure the fair call queue.
### Configuration Prefixes
All call queue-related configurations are relevant to only a single IPC server. This allows for a single configuration
file to be used to configure different components, or even different IPC servers within a component, to have uniquely
configured call queues. Each configuration is prefixed with `ipc.<port_number>`, where `<port_number>` is the port
used by the IPC server to be configured. For example, `ipc.8020.callqueue.impl` will adjust the call queue
implementation for the IPC server running at port 8020. For the remainder of this section, this prefix will be
omitted.
### Full List of Configurations
| Configuration Key | Applicable Component | Description | Default |
|:---- |:---- |:---- |:--- |
| backoff.enable | General | Whether or not to enable client backoff when a queue is full. | false |
| callqueue.impl | General | The fully qualified name of a class to use as the implementation of a call queue. Use `org.apache.hadoop.ipc.FairCallQueue` for the Fair Call Queue. | `java.util.concurrent.LinkedBlockingQueue` (FIFO queue) |
| scheduler.impl | General | The fully qualified name of a class to use as the implementation of the scheduler. Use `org.apache.hadoop.ipc.DecayRpcScheduler` in conjunction with the Fair Call Queue. | `org.apache.hadoop.ipc.DefaultRpcScheduler` (no-op scheduler) <br/> If using FairCallQueue, defaults to `org.apache.hadoop.ipc.DecayRpcScheduler` |
| scheduler.priority.levels | RpcScheduler, CallQueue | How many priority levels to use within the scheduler and call queue. | 4 |
| faircallqueue.multiplexer.weights | WeightedRoundRobinMultiplexer | How much weight to give to each priority queue. This should be a comma-separated list of length equal to the number of priority levels. | Weights descend by a factor of 2 (e.g., for 4 levels: `8,4,2,1`) |
| identity-provider.impl | DecayRpcScheduler | The identity provider mapping user requests to their identity. | org.apache.hadoop.ipc.UserIdentityProvider |
| decay-scheduler.period-ms | DecayRpcScheduler | How frequently the decay factor should be applied to the operation counts of users. Higher values have less overhead, but respond less quickly to changes in client behavior. | 5000 |
| decay-scheduler.decay-factor | DecayRpcScheduler | When decaying the operation counts of users, the multiplicative decay factor to apply. Higher values will weight older operations more strongly, essentially giving the scheduler a longer memory, and penalizing heavy clients for a longer period of time. | 0.5 |
| decay-scheduler.thresholds | DecayRpcScheduler | The client load threshold, as an integer percentage, for each priority queue. Clients producing less load, as a percent of total operations, than specified at position _i_ will be given priority _i_. This should be a comma-separated list of length equal to the number of priority levels minus 1 (the last is implicitly 100). | Thresholds ascend by a factor of 2 (e.g., for 4 levels: `13,25,50`) |
| decay-scheduler.backoff.responsetime.enable | DecayRpcScheduler | Whether or not to enable the backoff by response time feature. | false |
| decay-scheduler.backoff.responsetime.thresholds | DecayRpcScheduler | The response time thresholds, as time durations, for each priority queue. If the average response time for a queue is above this threshold, backoff will occur in lower priority queues. This should be a comma-separated list of length equal to the number of priority levels. | Threshold increases by 10s per level (e.g., for 4 levels: `10s,20s,30s,40s`) |
| decay-scheduler.metrics.top.user.count | DecayRpcScheduler | The number of top (i.e., heaviest) users to emit metric information about. | 10 |
### Example Configuration
This is an example of configuration an IPC server at port 8020 to use `FairCallQueue` with the `DecayRpcScheduler`
and only 2 priority levels. The heaviest 10% of users are penalized heavily, given only 1% of the total requests
processed.
<property>
<name>ipc.8020.callqueue.impl</name>
<value>org.apache.hadoop.ipc.FairCallQueue</value>
</property>
<property>
<name>ipc.8020.scheduler.impl</name>
<value>org.apache.hadoop.ipc.DecayRpcScheduler</value>
</property>
<property>
<name>ipc.8020.scheduler.priority.levels</name>
<value>2</value>
</property>
<property>
<name>ipc.8020.faircallqueue.multiplexer.weights</name>
<value>99,1</value>
</property>
<property>
<name>ipc.8020.decay-scheduler.thresholds</name>
<value>90</value>
</property>

Binary file not shown.

After

Width:  |  Height:  |  Size: 46 KiB

View File

@ -62,6 +62,7 @@
<menu name="Common" inherit="top"> <menu name="Common" inherit="top">
<item name="CLI Mini Cluster" href="hadoop-project-dist/hadoop-common/CLIMiniCluster.html"/> <item name="CLI Mini Cluster" href="hadoop-project-dist/hadoop-common/CLIMiniCluster.html"/>
<item name="Fair Call Queue" href="hadoop-project-dist/hadoop-common/FairCallQueue.html"/>
<item name="Native Libraries" href="hadoop-project-dist/hadoop-common/NativeLibraries.html"/> <item name="Native Libraries" href="hadoop-project-dist/hadoop-common/NativeLibraries.html"/>
<item name="Proxy User" href="hadoop-project-dist/hadoop-common/Superusers.html"/> <item name="Proxy User" href="hadoop-project-dist/hadoop-common/Superusers.html"/>
<item name="Rack Awareness" href="hadoop-project-dist/hadoop-common/RackAwareness.html"/> <item name="Rack Awareness" href="hadoop-project-dist/hadoop-common/RackAwareness.html"/>