HADOOP-13349. HADOOP_CLASSPATH vs HADOOP_USER_CLASSPATH (aw)

2016-07-07 07:55:02 -07:00 · 2016-07-07 07:55:02 -07:00 · a0035661c1
commit a0035661c1
parent ab092c56c2
3 changed files with 30 additions and 25 deletions
--- a/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
+++ b/hadoop-common-project/hadoop-common/src/main/conf/hadoop-env.sh
@ -115,29 +115,34 @@ esac
 #
 # A note about classpaths.
 #
-# The classpath is configured such that entries are stripped prior
-# to handing to Java based either upon duplication or non-existence.
-# Wildcards and/or directories are *NOT* expanded as the
-# de-duplication is fairly simple.  So if two directories are in
-# the classpath that both contain awesome-methods-1.0.jar,
-# awesome-methods-1.0.jar will still be seen by java.  But if
-# the classpath specifically has awesome-methods-1.0.jar from the
-# same directory listed twice, the last one will be removed.
-#
+# By default, Apache Hadoop overrides Java's CLASSPATH
+# environment variable.  It is configured such
+# that it sarts out blank with new entries added after passing
+# a series of checks (file/dir exists, not already listed aka
+# de-deduplication).  During de-depulication, wildcards and/or
+# directories are *NOT* expanded to keep it simple. Therefore,
+# if the computed classpath has two specific mentions of
+# awesome-methods-1.0.jar, only the first one added will be seen.
+# If two directories are in the classpath that both contain
+# awesome-methods-1.0.jar, then Java will pick up both versions.

-# An additional, custom CLASSPATH.  This is really meant for
-# end users, but as an administrator, one might want to push
-# something extra in here too, such as the jar to the topology
-# method.  Just be sure to append to the existing HADOOP_USER_CLASSPATH
-# so end users have a way to add stuff.
-# export HADOOP_USER_CLASSPATH="/some/cool/path/on/your/machine"
+# An additional, custom CLASSPATH. Site-wide configs should be
+# handled via the shellprofile functionality, utilizing the
+# hadoop_add_classpath function for greater control and much
+# harder for apps/end-users to accidentally override.
+# Similarly, end users should utilize ${HOME}/.hadooprc .
+# This variable should ideally only be used as a short-cut,
+# interactive way for temporary additions on the command line.
+# export HADOOP_CLASSPATH="/some/cool/path/on/your/machine"

-# Should HADOOP_USER_CLASSPATH be first in the official CLASSPATH?
+# Should HADOOP_CLASSPATH be first in the official CLASSPATH?
 # export HADOOP_USER_CLASSPATH_FIRST="yes"

-# If HADOOP_USE_CLIENT_CLASSLOADER is set, HADOOP_CLASSPATH along with the main
-# jar are handled by a separate isolated client classloader. If it is set,
-# HADOOP_USER_CLASSPATH_FIRST is ignored. Can be defined by doing
+# If HADOOP_USE_CLIENT_CLASSLOADER is set, the classpath along
+# with the main jar are handled by a separate isolated
+# client classloader when 'hadoop jar', 'yarn jar', or 'mapred job'
+# is utilized. If it is set, HADOOP_CLASSPATH and
+# HADOOP_USER_CLASSPATH_FIRST are ignored.
 # export HADOOP_USE_CLIENT_CLASSLOADER=true

 # HADOOP_CLIENT_CLASSLOADER_SYSTEM_CLASSES overrides the default definition of
--- a/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
@ -32,12 +32,14 @@ HADOOP_CLIENT_OPTS="-Xmx1g -Dhadoop.socks.server=localhost:4000" hadoop fs -ls /

 will increase the memory and send this command via a SOCKS proxy server.

-### `HADOOP_USER_CLASSPATH`
+### `HADOOP_CLASSPATH`
+
+  NOTE: Site-wide settings should be configured via a shellprofile entry and permanent user-wide settings should be configured via ${HOME}/.hadooprc using the `hadoop_add_classpath` function. See below for more information.

 The Apache Hadoop scripts have the capability to inject more content into the classpath of the running command by setting this environment variable.  It should be a colon delimited list of directories, files, or wildcard locations.

 ```bash
-HADOOP_USER_CLASSPATH=${HOME}/lib/myjars/*.jar hadoop classpath
+HADOOP_CLASSPATH=${HOME}/lib/myjars/*.jar hadoop classpath
 ```

 A user can provides hints to the location of the paths via the `HADOOP_USER_CLASSPATH_FIRST` variable.  Setting this to any value will tell the system to try and push these paths near the front.
@ -53,8 +55,6 @@ For example:
 # my custom Apache Hadoop settings!
 #

-HADOOP_USER_CLASSPATH=${HOME}/hadoopjars/*
-HADOOP_USER_CLASSPATH_FIRST=yes
 HADOOP_CLIENT_OPTS="-Xmx1g"
 ```

--- a/hadoop-yarn-project/hadoop-yarn/bin/yarn-config.sh
+++ b/hadoop-yarn-project/hadoop-yarn/bin/yarn-config.sh
@ -56,10 +56,10 @@ function hadoop_subproject_init
  HADOOP_YARN_HOME="${HADOOP_YARN_HOME:-$HADOOP_HOME}"

  # YARN-1429 added the completely superfluous YARN_USER_CLASSPATH
-  # env var.  We're going to override HADOOP_USER_CLASSPATH to keep
+  # env var.  We're going to override HADOOP_CLASSPATH to keep
  # consistency with the rest of the duplicate/useless env vars

-  hadoop_deprecate_envvar YARN_USER_CLASSPATH HADOOP_USER_CLASSPATH
+  hadoop_deprecate_envvar YARN_USER_CLASSPATH HADOOP_CLASSPATH

  hadoop_deprecate_envvar YARN_USER_CLASSPATH_FIRST HADOOP_USER_CLASSPATH_FIRST
 }