SUBMARINE-83. Refine the documents of submarine targeting 0.2.0 release. Contributed by Zhankun Tang.
This commit is contained in:
parent
5565f2c532
commit
03aa70fe19
@ -37,11 +37,12 @@
|
||||
\__________________________________________________________/ (_)
|
||||
```
|
||||
|
||||
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN.
|
||||
Submarine is a project which allows infra engineer / data scientist to run
|
||||
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
|
||||
|
||||
Goals of Submarine:
|
||||
- It allows jobs easy access data/models in HDFS and other storages.
|
||||
- Can launch services to serve Tensorflow/MXNet models.
|
||||
- Can launch services to serve Tensorflow/PyTorch models.
|
||||
- Support run distributed Tensorflow jobs with simple configs.
|
||||
- Support run user-specified Docker images.
|
||||
- Support specify GPU and other resources.
|
||||
@ -51,5 +52,3 @@ Goals of Submarine:
|
||||
Please jump to [QuickStart](src/site/markdown/QuickStart.md) guide to quickly understand how to use this framework.
|
||||
|
||||
Please jump to [Examples](src/site/markdown/Examples.md) to try other examples like running Distributed Tensorflow Training for CIFAR 10.
|
||||
|
||||
If you're a developer, please find [Developer](src/site/markdown/DeveloperGuide.md) guide for more details.
|
||||
|
@ -1,24 +0,0 @@
|
||||
<!---
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. See accompanying LICENSE file.
|
||||
-->
|
||||
|
||||
# Developer Guide
|
||||
|
||||
By default, Submarine uses YARN service framework as runtime. If you want to add your own implementation, you can add a new `RuntimeFactory` implementation and configure following option to `submarine.xml` (which should be placed under same `$HADOOP_CONF_DIR`)
|
||||
|
||||
```
|
||||
<property>
|
||||
<name>submarine.runtime.class</name>
|
||||
<value>... full qualified class name for your runtime factory ... </value>
|
||||
</property>
|
||||
```
|
@ -19,5 +19,3 @@ Here're some examples about Submarine usage.
|
||||
[Running Distributed CIFAR 10 Tensorflow Job](RunningDistributedCifar10TFJobs.html)
|
||||
|
||||
[Running Standalone CIFAR 10 PyTorch Job](RunningSingleNodeCifar10PTJobs.html)
|
||||
|
||||
[Running Zeppelin Notebook on YARN](RunningZeppelinOnYARN.html)
|
@ -12,7 +12,8 @@
|
||||
limitations under the License. See accompanying LICENSE file.
|
||||
-->
|
||||
|
||||
Submarine is a project which allows infra engineer / data scientist to run *unmodified* Tensorflow programs on YARN.
|
||||
Submarine is a project which allows infra engineer / data scientist to run
|
||||
*unmodified* Tensorflow or PyTorch programs on YARN or Kubernetes.
|
||||
|
||||
Goals of Submarine:
|
||||
|
||||
@ -43,6 +44,4 @@ Click below contents if you want to understand more.
|
||||
|
||||
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
||||
|
||||
- [Developer guide](DeveloperGuide.html)
|
||||
|
||||
- [Installation guides](HowToInstall.html)
|
||||
|
@ -18,7 +18,7 @@
|
||||
|
||||
Must:
|
||||
|
||||
- Apache Hadoop 3.1.x, YARN service enabled.
|
||||
- Apache Hadoop version newer than 2.7.3
|
||||
|
||||
Optional:
|
||||
|
||||
@ -37,6 +37,20 @@ For more details, please refer to:
|
||||
|
||||
- [How to write Dockerfile for Submarine PyTorch jobs](WriteDockerfilePT.html)
|
||||
|
||||
## Submarine runtimes
|
||||
After submarine 0.2.0, it supports two runtimes which are YARN native service
|
||||
runtime and Linkedin's TonY runtime. Each runtime can support both Tensorflow
|
||||
and Pytorch framework. And the user don't need to worry about the usage
|
||||
because the two runtime implements the same interface.
|
||||
|
||||
To use the TonY runtime, please set below value in the submarine configuration.
|
||||
|
||||
|Configuration Name | Description |
|
||||
|:---- |:---- |
|
||||
| `submarine.runtime.class` | org.apache.hadoop.yarn.submarine.runtimes.tony.TonyRuntimeFactory |
|
||||
|
||||
For more details of TonY runtime, please check [TonY runtime guide](TonYRuntimeGuide.html)
|
||||
|
||||
## Run jobs
|
||||
|
||||
### Commandline options
|
||||
@ -164,7 +178,8 @@ See below screenshot:
|
||||
|
||||
![alt text](./images/tensorboard-service.png "Tensorboard service")
|
||||
|
||||
If there is no hadoop client, we can also use the java command and the uber jar, hadoop-submarine-all-*.jar, to submit the job.
|
||||
After v0.2.0, if there is no hadoop client, we can also use the java command
|
||||
and the uber jar, hadoop-submarine-all-*.jar, to submit the job.
|
||||
|
||||
```
|
||||
java -cp /path-to/hadoop-conf:/path-to/hadoop-submarine-all-*.jar \
|
||||
|
@ -1,37 +0,0 @@
|
||||
<!---
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. See accompanying LICENSE file.
|
||||
-->
|
||||
|
||||
# Running Zeppelin Notebook On Submarine
|
||||
|
||||
This is a simple example about how to run Zeppelin notebook by using Submarine.
|
||||
|
||||
## Step 1: Build Docker Image
|
||||
|
||||
Go to `src/main/docker/zeppelin-notebook-example`, build the Docker image. Or you can use the prebuilt one: `hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1`
|
||||
|
||||
## Step 2: Launch the notebook on YARN
|
||||
|
||||
Submit command to YARN:
|
||||
|
||||
`yarn app -destroy zeppelin-notebook;
|
||||
yarn jar path-to/hadoop-yarn-applications-submarine-3.2.0-SNAPSHOT.jar \
|
||||
job run --name zeppelin-notebook \
|
||||
--docker_image hadoopsubmarine/zeppelin-on-yarn-gpu:0.0.1 \
|
||||
--worker_resources memory=8G,vcores=2,gpu=1 \
|
||||
--num_workers 1 \
|
||||
-worker_launch_cmd "/usr/local/bin/run_container.sh"`
|
||||
|
||||
Once the container got launched, you can go to `YARN services` UI page, access the `zeppelin-notebook` job, and go to the quicklink `notebook` by clicking `...`.
|
||||
|
||||
The notebook is secured by admin/admin user name and password.
|
@ -247,16 +247,16 @@ CLASSPATH=$(hadoop classpath --glob): \
|
||||
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
||||
|
||||
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
||||
--framework tensorflow \
|
||||
--num_workers 2 \
|
||||
--worker_resources memory=3G,vcores=2 \
|
||||
--num_ps 2 \
|
||||
--ps_resources memory=3G,vcores=2 \
|
||||
--worker_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
||||
--ps_launch_cmd "venv.zip/venv/bin/python mnist_distributed.py" \
|
||||
--insecure
|
||||
--insecure \
|
||||
--conf tony.containers.resources=PATH_TO_VENV_YOU_CREATED/venv.zip#archive,PATH_TO_MNIST_EXAMPLE/mnist_distributed.py, \
|
||||
PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar
|
||||
PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
|
||||
--conf tony.application.framework=pytorch
|
||||
|
||||
```
|
||||
You should then be able to see links and status of the jobs from command line:
|
||||
@ -284,7 +284,6 @@ CLASSPATH=$(hadoop classpath --glob): \
|
||||
/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar \
|
||||
|
||||
java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
||||
--framework tensorflow \
|
||||
--docker_image hadoopsubmarine/tf-1.8.0-cpu:0.0.3 \
|
||||
--input_path hdfs://pi-aw:9000/dataset/cifar-10-data \
|
||||
--worker_resources memory=3G,vcores=2 \
|
||||
@ -297,5 +296,6 @@ java org.apache.hadoop.yarn.submarine.client.cli.Cli job run --name tf-job-001 \
|
||||
--env HADOOP_COMMON_HOME=/hadoop-3.1.0 \
|
||||
--env HADOOP_HDFS_HOME=/hadoop-3.1.0 \
|
||||
--env HADOOP_CONF_DIR=/hadoop-3.1.0/etc/hadoop \
|
||||
--conf tony.containers.resources=--conf tony.containers.resources=/home/pi/hadoop/TonY/tony-cli/build/libs/tony-cli-0.3.2-all.jar
|
||||
--conf tony.containers.resources=PATH_TO_TONY_CLI_JAR/tony-cli-0.3.2-all.jar \
|
||||
--conf tony.application.framework=pytorch
|
||||
```
|
@ -1,29 +0,0 @@
|
||||
/*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
* contributor license agreements. See the NOTICE file distributed with
|
||||
* this work for additional information regarding copyright ownership.
|
||||
* The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
* (the "License"); you may not use this file except in compliance with
|
||||
* the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
#banner {
|
||||
height: 93px;
|
||||
background: none;
|
||||
}
|
||||
|
||||
#bannerLeft img {
|
||||
margin-left: 30px;
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
#bannerRight img {
|
||||
margin: 17px;
|
||||
}
|
@ -1,28 +0,0 @@
|
||||
<!--
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License. See accompanying LICENSE file.
|
||||
-->
|
||||
<project name="Apache Hadoop ${project.version}">
|
||||
|
||||
<skin>
|
||||
<groupId>org.apache.maven.skins</groupId>
|
||||
<artifactId>maven-stylus-skin</artifactId>
|
||||
<version>${maven-stylus-skin.version}</version>
|
||||
</skin>
|
||||
|
||||
<body>
|
||||
<links>
|
||||
<item name="Apache Hadoop" href="http://hadoop.apache.org/"/>
|
||||
</links>
|
||||
</body>
|
||||
|
||||
</project>
|
Loading…
Reference in New Issue
Block a user