diff --git a/hadoop-submarine/hadoop-submarine-core/README.md b/hadoop-submarine/hadoop-submarine-core/README.md index cb2e2da107..cc137ea5db 100644 --- a/hadoop-submarine/hadoop-submarine-core/README.md +++ b/hadoop-submarine/hadoop-submarine-core/README.md @@ -37,11 +37,12 @@ \__________________________________________________________/ (_) ``` -Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN. +Submarine is a project which allows infra engineer / data scientist to run +*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes. Goals of Submarine: - It allows jobs easy access data/models in HDFS and other storages. -- Can launch services to serve Tensorflow/MXNet models. +- Can launch services to serve Tensorflow/PyTorch models. - Support run distributed Tensorflow jobs with simple configs. - Support run user-specified Docker images. - Support specify GPU and other resources. @@ -51,5 +52,3 @@ Goals of Submarine: Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework. Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10. - -If you're a developer, please find [Developer](src/site/markdown/DeveloperGuide.md) guide for more details. diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/DeveloperGuide.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/DeveloperGuide.md deleted file mode 100644 index 9ab0641235..0000000000 --- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/DeveloperGuide.md +++ /dev/null @@ -1,24 +0,0 @@ - - -# Developer Guide - -By default, Submarine uses YARN service framework as runtime. If you want to add your own implementation, you can add a new `RuntimeFactory` implementation and configure following option to `submarine.xml` (which should be placed under same `$HADOOP_CONF_DIR`) - -``` - - submarine.runtime.class - ... full qualified class name for your runtime factory ... - -``` diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md index d878adde25..b66b32d403 100644 --- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md +++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Examples.md @@ -18,6 +18,4 @@ Here're some examples about Submarine usage. [Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html) -[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html) - -[Running Zeppelin Notebook on YARN](RunningZeppelinOnYARN.html) \ No newline at end of file +[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html) \ No newline at end of file diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md index f8556a6c10..d11fa4572a 100644 --- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md +++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/Index.md @@ -12,7 +12,8 @@ limitations under the License. See accompanying LICENSE file. --> -Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN. +Submarine is a project which allows infra engineer / data scientist to run +*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes. Goals of Submarine: @@ -43,6 +44,4 @@ Click below contents if you want to understand more. - [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html) -- [Developer guide](DeveloperGuide.html) - - [Installation guides](HowToInstall.html) diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md index f693917d90..e2df213dc4 100644 --- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md +++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/QuickStart.md @@ -18,7 +18,7 @@ Must: -- Apache Hadoop 3.1.x, YARN service enabled. +- Apache Hadoop version newer than 2.7.3 Optional: @@ -37,6 +37,20 @@ For more details, please refer to: - [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html) +## Submarine runtimes +After submarine 0.2.0, it supports two runtimes which are YARN native service + runtime and Linkedin's TonY runtime. Each runtime can support both Tensorflow + and Pytorch framework. And the user don't need to worry about the usage + because the two runtime implements the same interface. + +To use the TonY runtime, please set below value in the submarine configuration. + +|Configuration Name | Description | +|:---- |:---- | +| `submarine.runtime.class` | org.apache.hadoop.yarn.submarine.runtimes.tony.TonyRuntimeFactory | + +For more details of TonY runtime, please check [TonY runtime guide](TonYRuntimeGuide.html) + ## Run jobs ### Commandline options @@ -164,7 +178,8 @@ See below screenshot: ![alt text](./images/tensorboard-service.png "Tensorboard service") -If there is no hadoop client, we can also use the java command and the uber jar, hadoop-submarine-all-*.jar, to submit the job. +After v0.2.0, if there is no hadoop client, we can also use the java command +and the uber jar, hadoop-submarine-all-*.jar, to submit the job. ``` java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \ diff --git a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningZeppelinOnYARN.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningZeppelinOnYARN.md deleted file mode 100644 index e06526c2e6..0000000000 --- a/hadoop-submarine/hadoop-submarine-core/src/site/markdown/RunningZeppelinOnYARN.md +++ /dev/null @@ -1,37 +0,0 @@ - - -# Running Zeppelin Notebook On Submarine - -This is a simple example about how to run Zeppelin notebook by using Submarine. - -## Step 1: Build Docker Image - -Go to `src/main/docker/zeppelin-notebook-example`, build the Docker image. Or you can use the prebuilt one: `hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1` - -## Step 2: Launch the notebook on YARN - -Submit command to YARN: - -`yarn app -destroy zeppelin-notebook; -yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \ - job run --name zeppelin-notebook \ - --docker_image hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1 \ - --worker_resources memory=8G,vcores=2,gpu=1 \ - --num_workers 1 \ - -worker_launch_cmd "/usr/local/bin/run_container.sh"` - -Once the container got launched, you can go to `YARN services` UI page, access the `zeppelin-notebook` job, and go to the quicklink `notebook` by clicking `...`. - -The notebook is secured by admin/admin user name and password. \ No newline at end of file diff --git a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TonYRuntimeGuide.md similarity index 98% rename from hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md rename to hadoop-submarine/hadoop-submarine-core/src/site/markdown/TonYRuntimeGuide.md index 864aebcea3..105a72431d 100644 --- a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/markdown/QuickStart.md +++ b/hadoop-submarine/hadoop-submarine-core/src/site/markdown/TonYRuntimeGuide.md @@ -247,16 +247,16 @@ CLASSPATH=$(hadoop classpath --glob): \ /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \ - --framework tensorflow \ --num_workers 2 \ --worker_resources memory=3G,vcores=2 \ --num_ps 2 \ --ps_resources memory=3G,vcores=2 \ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \ - --insecure + --insecure \ --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \ -PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar +PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \ +--conf tony.application.framework=pytorch ``` You should then be able to see links and status of the jobs from command line: @@ -284,7 +284,6 @@ CLASSPATH=$(hadoop classpath --glob): \ /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \ - --framework tensorflow \ --docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \ --input_path hdfs://pi-aw:9000/dataset/cifar-10-data \ --worker_resources memory=3G,vcores=2 \ @@ -297,5 +296,6 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \ --env HADOOP_COMMON_HOME=/hadoop-3.1.0 \ --env HADOOP_HDFS_HOME=/hadoop-3.1.0 \ --env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \ - --conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar + --conf tony.containers.resources=PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \ + --conf tony.application.framework=pytorch ``` diff --git a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/resources/css/site.css b/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/resources/css/site.css deleted file mode 100644 index 7315db31e5..0000000000 --- a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/resources/css/site.css +++ /dev/null @@ -1,29 +0,0 @@ -/* -* Licensed to the Apache Software Foundation (ASF) under one or more -* contributor license agreements. See the NOTICE file distributed with -* this work for additional information regarding copyright ownership. -* The ASF licenses this file to You under the Apache License, Version 2.0 -* (the "License"); you may not use this file except in compliance with -* the License. You may obtain a copy of the License at -* -* http://www.apache.org/licenses/LICENSE-2.0 -* -* Unless required by applicable law or agreed to in writing, software -* distributed under the License is distributed on an "AS IS" BASIS, -* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -* See the License for the specific language governing permissions and -* limitations under the License. -*/ -#banner { - height: 93px; - background: none; -} - -#bannerLeft img { - margin-left: 30px; - margin-top: 10px; -} - -#bannerRight img { - margin: 17px; -} diff --git a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/site.xml b/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/site.xml deleted file mode 100644 index 5feae9a879..0000000000 --- a/hadoop-submarine/hadoop-submarine-tony-runtime/src/site/site.xml +++ /dev/null @@ -1,28 +0,0 @@ - - - - - org.apache.maven.skins - maven-stylus-skin - ${maven-stylus-skin.version} - - - - - - - - -