SUBMARINE-83. Refine the documents of submarine targeting 0.2.0 release. Contributed by Zhankun Tang.
This commit is contained in:
parent
5565f2c532
commit
03aa70fe19
@ -37,11 +37,12 @@
|
|||||||
\__________________________________________________________/ (_)
|
\__________________________________________________________/ (_)
|
||||||
```
|
```
|
||||||
|
|
||||||
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN.
|
Submarine is a project which allows infra engineer / data scientist to run
|
||||||
|
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
|
||||||
|
|
||||||
Goals of Submarine:
|
Goals of Submarine:
|
||||||
- It allows jobs easy access data/models in HDFS and other storages.
|
- It allows jobs easy access data/models in HDFS and other storages.
|
||||||
- Can launch services to serve Tensorflow/MXNet models.
|
- Can launch services to serve Tensorflow/PyTorch models.
|
||||||
- Support run distributed Tensorflow jobs with simple configs.
|
- Support run distributed Tensorflow jobs with simple configs.
|
||||||
- Support run user-specified Docker images.
|
- Support run user-specified Docker images.
|
||||||
- Support specify GPU and other resources.
|
- Support specify GPU and other resources.
|
||||||
@ -51,5 +52,3 @@ Goals of Submarine:
|
|||||||
Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework.
|
Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework.
|
||||||
|
|
||||||
Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10.
|
Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10.
|
||||||
|
|
||||||
If you're a developer, please find [Developer](src/site/markdown/DeveloperGuide.md) guide for more details.
|
|
||||||
|
@ -1,24 +0,0 @@
|
|||||||
<!---
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
you may not use this file except in compliance with the License.
|
|
||||||
You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software
|
|
||||||
distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
See the License for the specific language governing permissions and
|
|
||||||
limitations under the License. See accompanying LICENSE file.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Developer Guide
|
|
||||||
|
|
||||||
By default, Submarine uses YARN service framework as runtime. If you want to add your own implementation, you can add a new `RuntimeFactory` implementation and configure following option to `submarine.xml` (which should be placed under same `$HADOOP_CONF_DIR`)
|
|
||||||
|
|
||||||
```
|
|
||||||
<property>
|
|
||||||
<name>submarine.runtime.class</name>
|
|
||||||
<value>... full qualified class name for your runtime factory ... </value>
|
|
||||||
</property>
|
|
||||||
```
|
|
@ -19,5 +19,3 @@ Here're some examples about Submarine usage.
|
|||||||
[Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html)
|
[Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html)
|
||||||
|
|
||||||
[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html)
|
[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html)
|
||||||
|
|
||||||
[Running Zeppelin Notebook on YARN](RunningZeppelinOnYARN.html)
|
|
@ -12,7 +12,8 @@
|
|||||||
limitations under the License. See accompanying LICENSE file.
|
limitations under the License. See accompanying LICENSE file.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN.
|
Submarine is a project which allows infra engineer / data scientist to run
|
||||||
|
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
|
||||||
|
|
||||||
Goals of Submarine:
|
Goals of Submarine:
|
||||||
|
|
||||||
@ -43,6 +44,4 @@ Click below contents if you want to understand more.
|
|||||||
|
|
||||||
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
||||||
|
|
||||||
- [Developer guide](DeveloperGuide.html)
|
|
||||||
|
|
||||||
- [Installation guides](HowToInstall.html)
|
- [Installation guides](HowToInstall.html)
|
||||||
|
@ -18,7 +18,7 @@
|
|||||||
|
|
||||||
Must:
|
Must:
|
||||||
|
|
||||||
- Apache Hadoop 3.1.x, YARN service enabled.
|
- Apache Hadoop version newer than 2.7.3
|
||||||
|
|
||||||
Optional:
|
Optional:
|
||||||
|
|
||||||
@ -37,6 +37,20 @@ For more details, please refer to:
|
|||||||
|
|
||||||
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
||||||
|
|
||||||
|
## Submarine runtimes
|
||||||
|
After submarine 0.2.0, it supports two runtimes which are YARN native service
|
||||||
|
runtime and Linkedin's TonY runtime. Each runtime can support both Tensorflow
|
||||||
|
and Pytorch framework. And the user don't need to worry about the usage
|
||||||
|
because the two runtime implements the same interface.
|
||||||
|
|
||||||
|
To use the TonY runtime, please set below value in the submarine configuration.
|
||||||
|
|
||||||
|
|Configuration Name | Description |
|
||||||
|
|:---- |:---- |
|
||||||
|
| `submarine.runtime.class` | org.apache.hadoop.yarn.submarine.runtimes.tony.TonyRuntimeFactory |
|
||||||
|
|
||||||
|
For more details of TonY runtime, please check [TonY runtime guide](TonYRuntimeGuide.html)
|
||||||
|
|
||||||
## Run jobs
|
## Run jobs
|
||||||
|
|
||||||
### Commandline options
|
### Commandline options
|
||||||
@ -164,7 +178,8 @@ See below screenshot:
|
|||||||
|
|
||||||
![alt text](./images/tensorboard-service.png "Tensorboard service")
|
![alt text](./images/tensorboard-service.png "Tensorboard service")
|
||||||
|
|
||||||
If there is no hadoop client, we can also use the java command and the uber jar, hadoop-submarine-all-*.jar, to submit the job.
|
After v0.2.0, if there is no hadoop client, we can also use the java command
|
||||||
|
and the uber jar, hadoop-submarine-all-*.jar, to submit the job.
|
||||||
|
|
||||||
```
|
```
|
||||||
java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \
|
java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \
|
||||||
|
@ -1,37 +0,0 @@
|
|||||||
<!---
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
you may not use this file except in compliance with the License.
|
|
||||||
You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software
|
|
||||||
distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
See the License for the specific language governing permissions and
|
|
||||||
limitations under the License. See accompanying LICENSE file.
|
|
||||||
-->
|
|
||||||
|
|
||||||
# Running Zeppelin Notebook On Submarine
|
|
||||||
|
|
||||||
This is a simple example about how to run Zeppelin notebook by using Submarine.
|
|
||||||
|
|
||||||
## Step 1: Build Docker Image
|
|
||||||
|
|
||||||
Go to `src/main/docker/zeppelin-notebook-example`, build the Docker image. Or you can use the prebuilt one: `hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1`
|
|
||||||
|
|
||||||
## Step 2: Launch the notebook on YARN
|
|
||||||
|
|
||||||
Submit command to YARN:
|
|
||||||
|
|
||||||
`yarn app -destroy zeppelin-notebook;
|
|
||||||
yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
|
|
||||||
job run --name zeppelin-notebook \
|
|
||||||
--docker_image hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1 \
|
|
||||||
--worker_resources memory=8G,vcores=2,gpu=1 \
|
|
||||||
--num_workers 1 \
|
|
||||||
-worker_launch_cmd "/usr/local/bin/run_container.sh"`
|
|
||||||
|
|
||||||
Once the container got launched, you can go to `YARN services` UI page, access the `zeppelin-notebook` job, and go to the quicklink `notebook` by clicking `...`.
|
|
||||||
|
|
||||||
The notebook is secured by admin/admin user name and password.
|
|
@ -247,16 +247,16 @@ CLASSPATH=$(hadoop classpath --glob): \
|
|||||||
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
||||||
|
|
||||||
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
||||||
--framework tensorflow \
|
|
||||||
--num_workers 2 \
|
--num_workers 2 \
|
||||||
--worker_resources memory=3G,vcores=2 \
|
--worker_resources memory=3G,vcores=2 \
|
||||||
--num_ps 2 \
|
--num_ps 2 \
|
||||||
--ps_resources memory=3G,vcores=2 \
|
--ps_resources memory=3G,vcores=2 \
|
||||||
--worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
--worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
||||||
--ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
--ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
||||||
--insecure
|
--insecure \
|
||||||
--conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
|
--conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
|
||||||
PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar
|
PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
|
||||||
|
--conf tony.application.framework=pytorch
|
||||||
|
|
||||||
```
|
```
|
||||||
You should then be able to see links and status of the jobs from command line:
|
You should then be able to see links and status of the jobs from command line:
|
||||||
@ -284,7 +284,6 @@ CLASSPATH=$(hadoop classpath --glob): \
|
|||||||
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
||||||
|
|
||||||
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
||||||
--framework tensorflow \
|
|
||||||
--docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
|
--docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
|
||||||
--input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
|
--input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
|
||||||
--worker_resources memory=3G,vcores=2 \
|
--worker_resources memory=3G,vcores=2 \
|
||||||
@ -297,5 +296,6 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
|||||||
--env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
|
--env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
|
||||||
--env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
|
--env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
|
||||||
--env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
|
--env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
|
||||||
--conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
|
--conf tony.containers.resources=PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
|
||||||
|
--conf tony.application.framework=pytorch
|
||||||
```
|
```
|
@ -1,29 +0,0 @@
|
|||||||
/*
|
|
||||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
|
||||||
* contributor license agreements. See the NOTICE file distributed with
|
|
||||||
* this work for additional information regarding copyright ownership.
|
|
||||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
|
||||||
* (the "License"); you may not use this file except in compliance with
|
|
||||||
* the License. You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
#banner {
|
|
||||||
height: 93px;
|
|
||||||
background: none;
|
|
||||||
}
|
|
||||||
|
|
||||||
#bannerLeft img {
|
|
||||||
margin-left: 30px;
|
|
||||||
margin-top: 10px;
|
|
||||||
}
|
|
||||||
|
|
||||||
#bannerRight img {
|
|
||||||
margin: 17px;
|
|
||||||
}
|
|
@ -1,28 +0,0 @@
|
|||||||
<!--
|
|
||||||
Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
you may not use this file except in compliance with the License.
|
|
||||||
You may obtain a copy of the License at
|
|
||||||
|
|
||||||
http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
|
|
||||||
Unless required by applicable law or agreed to in writing, software
|
|
||||||
distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
See the License for the specific language governing permissions and
|
|
||||||
limitations under the License. See accompanying LICENSE file.
|
|
||||||
-->
|
|
||||||
<project name="Apache Hadoop ${project.version}">
|
|
||||||
|
|
||||||
<skin>
|
|
||||||
<groupId>org.apache.maven.skins</groupId>
|
|
||||||
<artifactId>maven-stylus-skin</artifactId>
|
|
||||||
<version>${maven-stylus-skin.version}</version>
|
|
||||||
</skin>
|
|
||||||
|
|
||||||
<body>
|
|
||||||
<links>
|
|
||||||
<item name="Apache Hadoop" href="http://hadoop.apache.org/"/>
|
|
||||||
</links>
|
|
||||||
</body>
|
|
||||||
|
|
||||||
</project>
|
|
Loading…
Reference in New Issue
Block a user