SUBMARINE-83. Refine the documents of submarine targeting 0.2.0 release. Contributed by Zhankun Tang.

This commit is contained in:
Zhankun Tang 2019-05-23 10:02:00 +08:00
parent 5565f2c532
commit 03aa70fe19
9 changed files with 28 additions and 135 deletions

View File

@ -37,11 +37,12 @@
\__________________________________________________________/ (_) \__________________________________________________________/ (_)
``` ```
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN. Submarine is a project which allows infra engineer / data scientist to run
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
Goals of Submarine: Goals of Submarine:
- It allows jobs easy access data/models in HDFS and other storages. - It allows jobs easy access data/models in HDFS and other storages.
- Can launch services to serve Tensorflow/MXNet models. - Can launch services to serve Tensorflow/PyTorch models.
- Support run distributed Tensorflow jobs with simple configs. - Support run distributed Tensorflow jobs with simple configs.
- Support run user-specified Docker images. - Support run user-specified Docker images.
- Support specify GPU and other resources. - Support specify GPU and other resources.
@ -51,5 +52,3 @@ Goals of Submarine:
Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework. Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework.
Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10. Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10.
If you're a developer, please find [Developer](src/site/markdown/DeveloperGuide.md) guide for more details.

View File

@ -1,24 +0,0 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
# Developer Guide
By default, Submarine uses YARN service framework as runtime. If you want to add your own implementation, you can add a new `RuntimeFactory` implementation and configure following option to `submarine.xml` (which should be placed under same `$HADOOP_CONF_DIR`)
```
<property>
<name>submarine.runtime.class</name>
<value>... full qualified class name for your runtime factory ... </value>
</property>
```

View File

@ -19,5 +19,3 @@ Here're some examples about Submarine usage.
[Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html) [Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html)
[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html) [Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html)
[Running Zeppelin Notebook on YARN](RunningZeppelinOnYARN.html)

View File

@ -12,7 +12,8 @@
limitations under the License. See accompanying LICENSE file. limitations under the License. See accompanying LICENSE file.
--> -->
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN. Submarine is a project which allows infra engineer / data scientist to run
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
Goals of Submarine: Goals of Submarine:
@ -43,6 +44,4 @@ Click below contents if you want to understand more.
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html) - [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
- [Developer guide](DeveloperGuide.html)
- [Installation guides](HowToInstall.html) - [Installation guides](HowToInstall.html)

View File

@ -18,7 +18,7 @@
Must: Must:
- Apache Hadoop 3.1.x, YARN service enabled. - Apache Hadoop version newer than 2.7.3
Optional: Optional:
@ -37,6 +37,20 @@ For more details, please refer to:
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html) - [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
## Submarine runtimes
After submarine 0.2.0, it supports two runtimes which are YARN native service
runtime and Linkedin's TonY runtime. Each runtime can support both Tensorflow
and Pytorch framework. And the user don't need to worry about the usage
because the two runtime implements the same interface.
To use the TonY runtime, please set below value in the submarine configuration.
|Configuration Name | Description |
|:---- |:---- |
| `submarine.runtime.class` | org.apache.hadoop.yarn.submarine.runtimes.tony.TonyRuntimeFactory |
For more details of TonY runtime, please check [TonY runtime guide](TonYRuntimeGuide.html)
## Run jobs ## Run jobs
### Commandline options ### Commandline options
@ -164,7 +178,8 @@ See below screenshot:
![alt text](./images/tensorboard-service.png "Tensorboard service") ![alt text](./images/tensorboard-service.png "Tensorboard service")
If there is no hadoop client, we can also use the java command and the uber jar, hadoop-submarine-all-*.jar, to submit the job. After v0.2.0, if there is no hadoop client, we can also use the java command
and the uber jar, hadoop-submarine-all-*.jar, to submit the job.
``` ```
java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \ java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \

View File

@ -1,37 +0,0 @@
<!---
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
# Running Zeppelin Notebook On Submarine
This is a simple example about how to run Zeppelin notebook by using Submarine.
## Step 1: Build Docker Image
Go to `src/main/docker/zeppelin-notebook-example`, build the Docker image. Or you can use the prebuilt one: `hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1`
## Step 2: Launch the notebook on YARN
Submit command to YARN:
`yarn app -destroy zeppelin-notebook;
yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
job run --name zeppelin-notebook \
--docker_image hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1 \
--worker_resources memory=8G,vcores=2,gpu=1 \
--num_workers 1 \
-worker_launch_cmd "/usr/local/bin/run_container.sh"`
Once the container got launched, you can go to `YARN services` UI page, access the `zeppelin-notebook` job, and go to the quicklink `notebook` by clicking `...`.
The notebook is secured by admin/admin user name and password.

View File

@ -247,16 +247,16 @@ CLASSPATH=$(hadoop classpath --glob): \
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \ /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
--framework tensorflow \
--num_workers 2 \ --num_workers 2 \
--worker_resources memory=3G,vcores=2 \ --worker_resources memory=3G,vcores=2 \
--num_ps 2 \ --num_ps 2 \
--ps_resources memory=3G,vcores=2 \ --ps_resources memory=3G,vcores=2 \
--worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \ --worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
--ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \ --ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
--insecure --insecure \
--conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \ --conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
--conf tony.application.framework=pytorch
``` ```
You should then be able to see links and status of the jobs from command line: You should then be able to see links and status of the jobs from command line:
@ -284,7 +284,6 @@ CLASSPATH=$(hadoop classpath --glob): \
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \ /home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
--framework tensorflow \
--docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \ --docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
--input_path hdfs://pi-aw:9000/dataset/cifar-10-data \ --input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
--worker_resources memory=3G,vcores=2 \ --worker_resources memory=3G,vcores=2 \
@ -297,5 +296,6 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
--env HADOOP_COMMON_HOME=/hadoop-3.1.0 \ --env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
--env HADOOP_HDFS_HOME=/hadoop-3.1.0 \ --env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
--env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \ --env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
--conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar --conf tony.containers.resources=PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
--conf tony.application.framework=pytorch
``` ```

View File

@ -1,29 +0,0 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#banner {
height: 93px;
background: none;
}
#bannerLeft img {
margin-left: 30px;
margin-top: 10px;
}
#bannerRight img {
margin: 17px;
}

View File

@ -1,28 +0,0 @@
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<project name="Apache Hadoop ${project.version}">
<skin>
<groupId>org.apache.maven.skins</groupId>
<artifactId>maven-stylus-skin</artifactId>
<version>${maven-stylus-skin.version}</version>
</skin>
<body>
<links>
<item name="Apache Hadoop" href="http://hadoop.apache.org/"/>
</links>
</body>
</project>