From 9636fe4114eed9035cdc80108a026c657cd196d9 Mon Sep 17 00:00:00 2001
From: Sunil G <sunilg@apache.org>
Date: Fri, 22 Feb 2019 20:00:13 +0530
Subject: [PATCH] YARN-8891. Documentation of the pluggable device framework.
 Contributed by Zhankun Tang.

---
 .../markdown/DevelopYourOwnDevicePlugin.md    | 177 ++++++++++++++++++
 .../site/markdown/PluggableDeviceFramework.md | 151 +++++++++++++++
 2 files changed, 328 insertions(+)
 create mode 100644 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md
 create mode 100644 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md
new file mode 100644
index 0000000000..0331f72615
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md
@@ -0,0 +1,177 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Develop Your Own Plugin
+
+A device plugin is loaded into the framework when
+starting NM. Your plugin class only needs to consider two interfaces provided
+by the framework. The `DevicePlugin` is a must to implement and the
+`DevicePluginScheduler` is optional.
+
+## DevicePlugin Interface
+
+```
+/**
+ * A must interface for vendor plugin to implement.
+ * */
+public interface DevicePlugin {
+  /**
+   * Called first when device plugin framework wants to register.
+   * @return DeviceRegisterRequest {@link DeviceRegisterRequest}
+   * @throws Exception
+   * */
+  DeviceRegisterRequest getRegisterRequestInfo()
+      throws Exception;
+
+  /**
+   * Called when update node resource.
+   * @return a set of {@link Device}, {@link java.util.TreeSet} recommended
+   * @throws Exception
+   * */
+  Set<Device> getDevices() throws Exception;
+
+  /**
+   * Asking how these devices should be prepared/used
+   * before/when container launch. A plugin can do some tasks in its own or
+   * define it in DeviceRuntimeSpec to let the framework do it.
+   * For instance, define {@code VolumeSpec} to let the
+   * framework to create volume before running container.
+   *
+   * @param allocatedDevices A set of allocated {@link Device}.
+   * @param yarnRuntime Indicate which runtime YARN will use
+   *        Could be {@code RUNTIME_DEFAULT} or {@code RUNTIME_DOCKER}
+   *        in {@link DeviceRuntimeSpec} constants. The default means YARN's
+   *        non-docker container runtime is used. The docker means YARN's
+   *        docker container runtime is used.
+   * @return a {@link DeviceRuntimeSpec} description about environment,
+   * {@link         VolumeSpec}, {@link MountVolumeSpec}. etc
+   * @throws Exception
+   * */
+  DeviceRuntimeSpec onDevicesAllocated(Set<Device>; allocatedDevices,
+      YarnRuntimeType yarnRuntime) throws Exception;
+
+  /**
+   * Called after device released.
+   * @param releasedDevices A set of released devices
+   * @throws Exception
+   * */
+  void onDevicesReleased(Set<Device> releasedDevices)
+      throws Exception;
+}
+
+```
+The above code shows the `DevicePlugin` interface you need to implement.
+Let’s go through the methods that a your plugin should implement.
+
+
+* getRegisterRequestInfo(): DeviceRegisterRequest
+* getDevices: Set&lt;Device&gt;
+* onDevicesAllocated(Set&lt;Device&gt;, YarnRuntimeType yarnRuntime): DeviceRuntimeSpec
+* onDeviceReleased(Set&lt;Device&gt;): void
+
+
+The getRegisterRequestInfo interface is used for the plugin to advertise a
+new resource type name and then the ResourceManager. The “DeviceRegisterRequest”
+returned by the method consists a plugin version and a resource type name
+like “nvidia.com/gpu”.
+
+
+The getDevices interface is used to get latest vendor device list in this NM
+node.
+The resource count pre-defined in node-resources.xml will be overridden.
+And it’s recommended that the vendor plugin manages allowed devices reported
+to YARN in its own configuration. YARN can only have a blacklist
+configuration `devices.denied-numbers` in `container-executor.cfg`.
+In this method, you may invoke shell command or invoke RESTful/RPC to remote
+service to get the devices at your convenience.
+
+
+Please note that the `Device` object can describe a fake device. If the major
+device number, minor device number and device path is left unset, the
+framework won't do isolation for it. This provide feasibility for user to
+define a fake device without real hardware.
+
+The onDevicesAllocated interface is invoked to tell the framework how to use these devices.
+The NM invoke this interface to let the plugin do some preparation work like create volume before container launch
+and give hints on how to expose the devices to container when launch it. The
+`DeviceRuntimeSpec` is the structure of the hints. For instance,
+`DeviceRuntimeSpec` can describes the container launch requirements like
+environment variables, device and volume mounts, Docker runtime type.etc.
+
+
+The onDeviceReleased  interface is used for the plugin to do some cleanup work
+after container finish.
+
+## Optional DevicePluginScheduler Interface
+
+```
+/**
+ * An optional interface to implement if custom device scheduling is needed.
+ * If this is not implemented, the device framework will do scheduling.
+ * */
+public interface DevicePluginScheduler {
+  /**
+   * Called when allocating devices. The framework will do all device book
+   * keeping and fail recovery. So this hook could be stateless and only do
+   * scheduling based on available devices passed in. It could be
+   * invoked multiple times by the framework. The hint in environment variables
+   * passed in could be potentially used in making better scheduling decision.
+   * For instance, GPU scheduling might support different kind of policy. The
+   * container can set it through environment variables.
+   * @param availableDevices Devices allowed to be chosen from.
+   * @param count Number of device to be allocated.
+   * @param env Environment variables of the container.
+   * @return A set of {@link Device} allocated
+   * */
+  Set<Device> allocateDevices(Set<Device> availableDevices, int count,
+      Map<String, String> env);
+}
+```
+The above code shows the `DevicePluginScheduler` interface that you might
+needed if you want to arm the plugin with a more efficient scheduler.
+This `allocateDevices` method is invoked by YARN each time when asking the
+plugin's recommendation devices for one container.
+This interface is optional because YARN will provide a very basic scheduler.
+
+You can refer to `NvidiaGPUPluginForRuntimeV2` plugin for a plugin customized
+scheduler. Its scheduler is targeting for Nvidia GPU topology aware
+scheduling and can get considerable performance boost for the container.
+
+## Dependency in Plugin Project
+
+When developing the plugin, you need to add below dependency property into
+your projects's `pom.xml`. For instance,
+```
+<dependencies>
+  <dependency>
+    <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-nodemanager</artifactId>
+      <version>3.3.0</version>
+      <scope>provided</scope>
+  </dependency>
+</dependencies>
+```
+
+And after this, you can implement the above interfaces based on classes
+provided in `org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin`.
+Please note that the plugin project is coupled with the Hadoop YARN NM version.
+
+## Test And Use Your Own Plugin
+Once you build your project and package a jar which contains your plugin
+class and want to give it a try in your Hadoop cluster.
+
+
+Firstly, put the jar file under a directory in Hadooop classpath.
+(recommend $HADOOP_COMMOND_HOME/share/hadoop/yarn). Secondly,
+follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN.
\ No newline at end of file
diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md
new file mode 100644
index 0000000000..d8733754ed
--- /dev/null
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md
@@ -0,0 +1,151 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# YARN Pluggable Device Framework
+
+<!-- MACRO{toc|fromDepth=0|toDepth=2} -->
+
+## Introduction
+
+At present, YARN supports GPU/FPGA device through a native, coupling way.
+But it's difficult for a vendor to implement such a device plugin
+because the developer needs to understand various integration points with
+YARN and also a deeper understanding YARN internals related to NodeManager.
+
+### Pain Points Of Current Device Plugin
+
+Some of the pain points for current device plugin development and integration
+ are listed below:
+
+
+* At least 6 classes to be implemented (If you wanna support
+Docker, you’ll implement one more “DockerCommandPlugin”).
+* When implementing the “ResourceHandler” interface,
+the developer must understand the YARN NM internal concepts like container
+launch mechanism, cgroups operations, docker runtime operations.
+* If one wants isolation, the native container-executor also need a new module
+written in C language.
+
+
+This brings burdens to the community to maintain both YARN
+core and vendor-specific code. For more details, check YARN-8851 design document.
+
+
+Based on the above reasons and in order for YARN and vendor-specific plugin to
+evolve independently, we developed a new pluggable device framework to ease
+vendor device plugin development and provide a more flexible way to integrate with YARN.
+
+## Quick Start
+
+This pluggable device framework not only simplifies the plugin development but
+also the number of configurations in YARN which are needed for plugin integration.
+Before we go through how to implement
+your own device plugin, let's first see how to use an existing plugin.
+
+
+As an example, the new framework includes a sample implementation of Nvidia
+GPU plugin supporting detecting Nvidia GPUs, the custom scheduler and isolating
+containers run with both YARN cgroups and Nvidia Docker runtime v2.
+
+### Prerequisites
+1. The pluggable device framework depends on LinuxContainerExecutor to handle
+resource isolation and Docker stuff. So LCE and Docker enabled on YARN is a
+must.
+See [Using CGroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html)
+
+2. The sample plugin `NvidiaGPUPluginForRuntimeV2` requires Nvidia GPU drivers
+and Nvidia Docker runtime v2 installed in the nodes. See Nvidia official
+documents for this.
+
+3. If you use YARN capacity scheduler, below
+`DominantResourceCalculator` configuration is needed (In `capacity-scheduler.xml`):
+```
+<property>
+  <name>yarn.scheduler.capacity.resource-calculator</name>
+  <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
+</property>
+```
+
+### Enable Device Plugin Framework
+Two properties to enable the pluggable framework support. First one is
+in `yarn-site.xml`:
+```
+<property>
+  <name>yarn.nodemanager.pluggable-device-framework.enabled</name>
+  <value>true</value>
+</property>
+```
+And then enable the isolation native module in `container-executor.cfg`:
+```
+# The configs below deal with settings for resource handled by pluggable device plugin framework
+[devices]
+  module.enabled=true
+#  devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". For instance, "195:1,195:2". Leave it empty means default devices reported by device plugin are all allowed.
+```
+
+### Configure Sample Nvidia GPU Plugin
+The pluggable device framework loads one plugin and talks to it to know
+which resource name the plugin is handling. And the resource name should be
+pre-defined in `resource-types.xml`. Here we already know the resource name is
+`nvidia.com/gpu` from the plugin implementation.
+```
+<property>
+  <name>yarn.resource-types</name>
+  <value>nvidia.com/gpu</value>
+</property>
+```
+After define the resource name handled by the plugin. We can configure the
+plugin name in `yarn-site.xml now:
+```
+<property>
+  <name>yarn.nodemanager.pluggable-device-framework.device-classes</name>
+  <value>org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.com.nvidia.NvidiaGPUPluginForRuntimeV2</value>
+</property>
+```
+Note that the property value must be a full class name of the plugin.
+
+### Restart YARN And Run Job
+After restarting YARN, you should see the `nvidia.com/gpu` resource count displayed
+ while accessing YARN UI2 Overview and NodeManages page or issuing command:
+```
+yarn node -list -showDetails
+```
+
+Then you can run job requesting several `nvidia.com/gpu` as usual:
+```
+yarn jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+       -jar <path/to/hadoop-yarn-applications-distributedshell.jar> \
+       -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \
+       -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=<docker-image-name> \
+       -shell_command nvidia-smi \
+       -container_resources memory-mb=3072,vcores=1,nvidia.com/gpu=2 \
+       -num_containers 2
+```
+
+### NM API To Query Resource Allocation
+When a job run with resource like `nvidia.com/gpu`, you can query a NM node's
+resource allocation through below RESTful API. Note that the resource name
+should be URL encoded format (in this case, "nvidia.com%2Fgpu").
+```
+node:port/ws/v1/node/resources/nvidia.com%2Fgpu
+```
+For instance, use below command to get the JSON format resource allocation:
+```
+curl localhost:8042/ws/v1/node/resources/nvidia.com%2Fgpu | jq .
+```
+
+## Develop Your Own Plugin
+
+Configure an existing plugin is easy. But how about implementing my own one?
+It's easy too! See [Develop Device Plugin](./DevelopYourOwnDevicePlugin.html)
\ No newline at end of file