hadoop/hadoop-submarine/hadoop-submarine-core
2019-08-07 21:29:24 -07:00
..
src SUBMARINE-72. Kill and destroy the job through the submarine client (#1090) Contributed by kevin su. 2019-08-07 21:29:24 -07:00
pom.xml Preparing for submarine-0.3.0 development 2019-06-03 10:41:25 +05:30
README.md SUBMARINE-83. Refine the documents of submarine targeting 0.2.0 release. Contributed by Zhankun Tang. 2019-05-23 10:02:00 +08:00

Overview

              _                              _
             | |                            (_)
  ___  _   _ | |__   _ __ ___    __ _  _ __  _  _ __    ___
 / __|| | | || '_ \ | '_ ` _ \  / _` || '__|| || '_ \  / _ \
 \__ \| |_| || |_) || | | | | || (_| || |   | || | | ||  __/
 |___/ \__,_||_.__/ |_| |_| |_| \__,_||_|   |_||_| |_| \___|

                             ?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~|^"~~~~~~~~~~~~~~~~~~~~~~~~~o~~~~~~~~~~~
        o                   |                  o      __o
         o                  |                 o     |X__>
       ___o                 |                __o
     (X___>--             __|__            |X__>     o
                         |     \                   __o
                         |      \                |X__>
  _______________________|_______\________________
 <                                                \____________   _
  \                                                            \ (_)
   \    O       O       O                                       >=)
    \__________________________________________________________/ (_)

Submarine is a project which allows infra engineer / data scientist to run unmodified Tensorflow or PyTorch programs on YARN or Kubernetes.

Goals of Submarine:

  • It allows jobs easy access data/models in HDFS and other storages.
  • Can launch services to serve Tensorflow/PyTorch models.
  • Support run distributed Tensorflow jobs with simple configs.
  • Support run user-specified Docker images.
  • Support specify GPU and other resources.
  • Support launch tensorboard for training jobs if user specified.
  • Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

Please jump to QuickStart guide to quickly understand how to use this framework.

Please jump to Examples to try other examples like running Distributed Tensorflow Training for CIFAR 10.