From 9636fe4114eed9035cdc80108a026c657cd196d9 Mon Sep 17 00:00:00 2001 From: Sunil G Date: Fri, 22 Feb 2019 20:00:13 +0530 Subject: [PATCH] YARN-8891. Documentation of the pluggable device framework. Contributed by Zhankun Tang. --- .../markdown/DevelopYourOwnDevicePlugin.md | 177 ++++++++++++++++++ .../site/markdown/PluggableDeviceFramework.md | 151 +++++++++++++++ 2 files changed, 328 insertions(+) create mode 100644 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md create mode 100644 hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md new file mode 100644 index 0000000000..0331f72615 --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/DevelopYourOwnDevicePlugin.md @@ -0,0 +1,177 @@ + + +# Develop Your Own Plugin + +A device plugin is loaded into the framework when +starting NM. Your plugin class only needs to consider two interfaces provided +by the framework. The `DevicePlugin` is a must to implement and the +`DevicePluginScheduler` is optional. + +## DevicePlugin Interface + +``` +/** + * A must interface for vendor plugin to implement. + * */ +public interface DevicePlugin { + /** + * Called first when device plugin framework wants to register. + * @return DeviceRegisterRequest {@link DeviceRegisterRequest} + * @throws Exception + * */ + DeviceRegisterRequest getRegisterRequestInfo() + throws Exception; + + /** + * Called when update node resource. + * @return a set of {@link Device}, {@link java.util.TreeSet} recommended + * @throws Exception + * */ + Set getDevices() throws Exception; + + /** + * Asking how these devices should be prepared/used + * before/when container launch. A plugin can do some tasks in its own or + * define it in DeviceRuntimeSpec to let the framework do it. + * For instance, define {@code VolumeSpec} to let the + * framework to create volume before running container. + * + * @param allocatedDevices A set of allocated {@link Device}. + * @param yarnRuntime Indicate which runtime YARN will use + * Could be {@code RUNTIME_DEFAULT} or {@code RUNTIME_DOCKER} + * in {@link DeviceRuntimeSpec} constants. The default means YARN's + * non-docker container runtime is used. The docker means YARN's + * docker container runtime is used. + * @return a {@link DeviceRuntimeSpec} description about environment, + * {@link VolumeSpec}, {@link MountVolumeSpec}. etc + * @throws Exception + * */ + DeviceRuntimeSpec onDevicesAllocated(Set; allocatedDevices, + YarnRuntimeType yarnRuntime) throws Exception; + + /** + * Called after device released. + * @param releasedDevices A set of released devices + * @throws Exception + * */ + void onDevicesReleased(Set releasedDevices) + throws Exception; +} + +``` +The above code shows the `DevicePlugin` interface you need to implement. +Let’s go through the methods that a your plugin should implement. + + +* getRegisterRequestInfo(): DeviceRegisterRequest +* getDevices: Set<Device> +* onDevicesAllocated(Set<Device>, YarnRuntimeType yarnRuntime): DeviceRuntimeSpec +* onDeviceReleased(Set<Device>): void + + +The getRegisterRequestInfo interface is used for the plugin to advertise a +new resource type name and then the ResourceManager. The “DeviceRegisterRequest” +returned by the method consists a plugin version and a resource type name +like “nvidia.com/gpu”. + + +The getDevices interface is used to get latest vendor device list in this NM +node. +The resource count pre-defined in node-resources.xml will be overridden. +And it’s recommended that the vendor plugin manages allowed devices reported +to YARN in its own configuration. YARN can only have a blacklist +configuration `devices.denied-numbers` in `container-executor.cfg`. +In this method, you may invoke shell command or invoke RESTful/RPC to remote +service to get the devices at your convenience. + + +Please note that the `Device` object can describe a fake device. If the major +device number, minor device number and device path is left unset, the +framework won't do isolation for it. This provide feasibility for user to +define a fake device without real hardware. + +The onDevicesAllocated interface is invoked to tell the framework how to use these devices. +The NM invoke this interface to let the plugin do some preparation work like create volume before container launch +and give hints on how to expose the devices to container when launch it. The +`DeviceRuntimeSpec` is the structure of the hints. For instance, +`DeviceRuntimeSpec` can describes the container launch requirements like +environment variables, device and volume mounts, Docker runtime type.etc. + + +The onDeviceReleased interface is used for the plugin to do some cleanup work +after container finish. + +## Optional DevicePluginScheduler Interface + +``` +/** + * An optional interface to implement if custom device scheduling is needed. + * If this is not implemented, the device framework will do scheduling. + * */ +public interface DevicePluginScheduler { + /** + * Called when allocating devices. The framework will do all device book + * keeping and fail recovery. So this hook could be stateless and only do + * scheduling based on available devices passed in. It could be + * invoked multiple times by the framework. The hint in environment variables + * passed in could be potentially used in making better scheduling decision. + * For instance, GPU scheduling might support different kind of policy. The + * container can set it through environment variables. + * @param availableDevices Devices allowed to be chosen from. + * @param count Number of device to be allocated. + * @param env Environment variables of the container. + * @return A set of {@link Device} allocated + * */ + Set allocateDevices(Set availableDevices, int count, + Map env); +} +``` +The above code shows the `DevicePluginScheduler` interface that you might +needed if you want to arm the plugin with a more efficient scheduler. +This `allocateDevices` method is invoked by YARN each time when asking the +plugin's recommendation devices for one container. +This interface is optional because YARN will provide a very basic scheduler. + +You can refer to `NvidiaGPUPluginForRuntimeV2` plugin for a plugin customized +scheduler. Its scheduler is targeting for Nvidia GPU topology aware +scheduling and can get considerable performance boost for the container. + +## Dependency in Plugin Project + +When developing the plugin, you need to add below dependency property into +your projects's `pom.xml`. For instance, +``` + + + org.apache.hadoop + hadoop-yarn-server-nodemanager + 3.3.0 + provided + + +``` + +And after this, you can implement the above interfaces based on classes +provided in `org.apache.hadoop.yarn.server.nodemanager.api.deviceplugin`. +Please note that the plugin project is coupled with the Hadoop YARN NM version. + +## Test And Use Your Own Plugin +Once you build your project and package a jar which contains your plugin +class and want to give it a try in your Hadoop cluster. + + +Firstly, put the jar file under a directory in Hadooop classpath. +(recommend $HADOOP_COMMOND_HOME/share/hadoop/yarn). Secondly, +follow the configurations described in [Pluggable Device Framework](./PluggableDeviceFramework.html) and restart YARN. \ No newline at end of file diff --git a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md new file mode 100644 index 0000000000..d8733754ed --- /dev/null +++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/PluggableDeviceFramework.md @@ -0,0 +1,151 @@ + + +# YARN Pluggable Device Framework + + + +## Introduction + +At present, YARN supports GPU/FPGA device through a native, coupling way. +But it's difficult for a vendor to implement such a device plugin +because the developer needs to understand various integration points with +YARN and also a deeper understanding YARN internals related to NodeManager. + +### Pain Points Of Current Device Plugin + +Some of the pain points for current device plugin development and integration + are listed below: + + +* At least 6 classes to be implemented (If you wanna support +Docker, you’ll implement one more “DockerCommandPlugin”). +* When implementing the “ResourceHandler” interface, +the developer must understand the YARN NM internal concepts like container +launch mechanism, cgroups operations, docker runtime operations. +* If one wants isolation, the native container-executor also need a new module +written in C language. + + +This brings burdens to the community to maintain both YARN +core and vendor-specific code. For more details, check YARN-8851 design document. + + +Based on the above reasons and in order for YARN and vendor-specific plugin to +evolve independently, we developed a new pluggable device framework to ease +vendor device plugin development and provide a more flexible way to integrate with YARN. + +## Quick Start + +This pluggable device framework not only simplifies the plugin development but +also the number of configurations in YARN which are needed for plugin integration. +Before we go through how to implement +your own device plugin, let's first see how to use an existing plugin. + + +As an example, the new framework includes a sample implementation of Nvidia +GPU plugin supporting detecting Nvidia GPUs, the custom scheduler and isolating +containers run with both YARN cgroups and Nvidia Docker runtime v2. + +### Prerequisites +1. The pluggable device framework depends on LinuxContainerExecutor to handle +resource isolation and Docker stuff. So LCE and Docker enabled on YARN is a +must. +See [Using CGroups with YARN](./NodeManagerCgroups.html) and [Docker on YARN](./DockerContainers.html) + +2. The sample plugin `NvidiaGPUPluginForRuntimeV2` requires Nvidia GPU drivers +and Nvidia Docker runtime v2 installed in the nodes. See Nvidia official +documents for this. + +3. If you use YARN capacity scheduler, below +`DominantResourceCalculator` configuration is needed (In `capacity-scheduler.xml`): +``` + + yarn.scheduler.capacity.resource-calculator + org.apache.hadoop.yarn.util.resource.DominantResourceCalculator + +``` + +### Enable Device Plugin Framework +Two properties to enable the pluggable framework support. First one is +in `yarn-site.xml`: +``` + + yarn.nodemanager.pluggable-device-framework.enabled + true + +``` +And then enable the isolation native module in `container-executor.cfg`: +``` +# The configs below deal with settings for resource handled by pluggable device plugin framework +[devices] + module.enabled=true +# devices.denied-numbers=## Blacklisted devices not permitted to use. The format is comma separated "majorNumber:minorNumber". For instance, "195:1,195:2". Leave it empty means default devices reported by device plugin are all allowed. +``` + +### Configure Sample Nvidia GPU Plugin +The pluggable device framework loads one plugin and talks to it to know +which resource name the plugin is handling. And the resource name should be +pre-defined in `resource-types.xml`. Here we already know the resource name is +`nvidia.com/gpu` from the plugin implementation. +``` + + yarn.resource-types + nvidia.com/gpu + +``` +After define the resource name handled by the plugin. We can configure the +plugin name in `yarn-site.xml now: +``` + + yarn.nodemanager.pluggable-device-framework.device-classes + org.apache.hadoop.yarn.server.nodemanager.containermanager.resourceplugin.com.nvidia.NvidiaGPUPluginForRuntimeV2 + +``` +Note that the property value must be a full class name of the plugin. + +### Restart YARN And Run Job +After restarting YARN, you should see the `nvidia.com/gpu` resource count displayed + while accessing YARN UI2 Overview and NodeManages page or issuing command: +``` +yarn node -list -showDetails +``` + +Then you can run job requesting several `nvidia.com/gpu` as usual: +``` +yarn jar \ + -jar \ + -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker \ + -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE= \ + -shell_command nvidia-smi \ + -container_resources memory-mb=3072,vcores=1,nvidia.com/gpu=2 \ + -num_containers 2 +``` + +### NM API To Query Resource Allocation +When a job run with resource like `nvidia.com/gpu`, you can query a NM node's +resource allocation through below RESTful API. Note that the resource name +should be URL encoded format (in this case, "nvidia.com%2Fgpu"). +``` +node:port/ws/v1/node/resources/nvidia.com%2Fgpu +``` +For instance, use below command to get the JSON format resource allocation: +``` +curl localhost:8042/ws/v1/node/resources/nvidia.com%2Fgpu | jq . +``` + +## Develop Your Own Plugin + +Configure an existing plugin is easy. But how about implementing my own one? +It's easy too! See [Develop Device Plugin](./DevelopYourOwnDevicePlugin.html) \ No newline at end of file