f7bb4f1595
This PR enables one to create the Hadoop release tarball on Windows, complete with the native binaries (including winutils.exe). This PR contains the following changes - * Prevents splitting during array element expansion - this is needed since we need to pass the arguments correctly to maven. * Install Python 3.11.8 and pip to the Windows docker image for building Hadoop. * pom file changes to get maven to invoke the releasedocmaker script through bash.exe on Windows.
694 lines
32 KiB
Plaintext
694 lines
32 KiB
Plaintext
Build instructions for Hadoop
|
|
|
|
----------------------------------------------------------------------------------
|
|
Requirements:
|
|
|
|
* Unix System
|
|
* JDK 1.8
|
|
* Maven 3.3 or later
|
|
* Boost 1.72 (if compiling native code)
|
|
* Protocol Buffers 3.21.12 (if compiling native code)
|
|
* CMake 3.19 or newer (if compiling native code)
|
|
* Zlib devel (if compiling native code)
|
|
* Cyrus SASL devel (if compiling native code)
|
|
* One of the compilers that support thread_local storage: GCC 9.3.0 or later, Visual Studio,
|
|
Clang (community version), Clang (version for iOS 9 and later) (if compiling native code)
|
|
* openssl devel (if compiling native hadoop-pipes and to get the best HDFS encryption performance)
|
|
* Linux FUSE (Filesystem in Userspace) version 2.6 or above (if compiling fuse_dfs)
|
|
* Doxygen ( if compiling libhdfspp and generating the documents )
|
|
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
|
|
* python (for releasedocs)
|
|
* bats (for shell code testing)
|
|
* Node.js / bower / Ember-cli (for YARN UI v2 building)
|
|
|
|
----------------------------------------------------------------------------------
|
|
The easiest way to get an environment with all the appropriate tools is by means
|
|
of the provided Docker config.
|
|
This requires a recent version of docker (1.4.1 and higher are known to work).
|
|
|
|
On Linux / Mac:
|
|
Install Docker and run this command:
|
|
|
|
$ ./start-build-env.sh
|
|
|
|
The prompt which is then presented is located at a mounted version of the source tree
|
|
and all required tools for testing and building have been installed and configured.
|
|
|
|
Note that from within this docker environment you ONLY have access to the Hadoop source
|
|
tree from where you started. So if you need to run
|
|
dev-support/bin/test-patch /path/to/my.patch
|
|
then the patch must be placed inside the hadoop source tree.
|
|
|
|
Known issues:
|
|
- On Mac with Boot2Docker the performance on the mounted directory is currently extremely slow.
|
|
This is a known problem related to boot2docker on the Mac.
|
|
See:
|
|
https://github.com/boot2docker/boot2docker/issues/593
|
|
This issue has been resolved as a duplicate, and they point to a new feature for utilizing NFS mounts
|
|
as the proposed solution:
|
|
https://github.com/boot2docker/boot2docker/issues/64
|
|
An alternative solution to this problem is to install Linux native inside a virtual machine
|
|
and run your IDE and Docker etc inside that VM.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Installing required packages for clean install of Ubuntu 18.04 LTS Desktop.
|
|
(For Ubuntu 20.04, gcc/g++ and cmake bundled with Ubuntu can be used.
|
|
Refer to dev-support/docker/Dockerfile):
|
|
|
|
* Open JDK 1.8
|
|
$ sudo apt-get update
|
|
$ sudo apt-get -y install openjdk-8-jdk
|
|
* Maven
|
|
$ sudo apt-get -y install maven
|
|
* Native libraries
|
|
$ sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev libsasl2-dev
|
|
* GCC 9.3.0
|
|
$ sudo apt-get -y install software-properties-common
|
|
$ sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
|
|
$ sudo apt-get update
|
|
$ sudo apt-get -y install g++-9 gcc-9
|
|
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 60 --slave /usr/bin/g++ g++ /usr/bin/g++-9
|
|
* CMake 3.19
|
|
$ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz
|
|
$ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0
|
|
$ ./bootstrap
|
|
$ make -j$(nproc)
|
|
$ sudo make install
|
|
* Protocol Buffers 3.21.12 (required to build native code)
|
|
$ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz
|
|
$ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12
|
|
$ ./autogen.sh
|
|
$ ./configure
|
|
$ make -j$(nproc)
|
|
$ sudo make install
|
|
* Boost
|
|
$ curl -L https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download > boost_1_72_0.tar.bz2
|
|
$ tar --bzip2 -xf boost_1_72_0.tar.bz2 && cd boost_1_72_0
|
|
$ ./bootstrap.sh --prefix=/usr/
|
|
$ ./b2 --without-python
|
|
$ sudo ./b2 --without-python install
|
|
|
|
Optional packages:
|
|
|
|
* Snappy compression (only used for hadoop-mapreduce-client-nativetask)
|
|
$ sudo apt-get install libsnappy-dev
|
|
* Intel ISA-L library for erasure coding
|
|
Please refer to https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
|
|
(OR https://github.com/01org/isa-l)
|
|
* Bzip2
|
|
$ sudo apt-get install bzip2 libbz2-dev
|
|
* Linux FUSE
|
|
$ sudo apt-get install fuse libfuse-dev
|
|
* ZStandard compression
|
|
$ sudo apt-get install libzstd1-dev
|
|
* PMDK library for storage class memory(SCM) as HDFS cache backend
|
|
Please refer to http://pmem.io/ and https://github.com/pmem/pmdk
|
|
|
|
----------------------------------------------------------------------------------
|
|
Maven main modules:
|
|
|
|
hadoop (Main Hadoop project)
|
|
- hadoop-project (Parent POM for all Hadoop Maven modules. )
|
|
(All plugins & dependencies versions are defined here.)
|
|
- hadoop-project-dist (Parent POM for modules that generate distributions.)
|
|
- hadoop-annotations (Generates the Hadoop doclet used to generate the Javadocs)
|
|
- hadoop-assemblies (Maven assemblies used by the different modules)
|
|
- hadoop-maven-plugins (Maven plugins used in project)
|
|
- hadoop-build-tools (Build tools like checkstyle, etc.)
|
|
- hadoop-common-project (Hadoop Common)
|
|
- hadoop-hdfs-project (Hadoop HDFS)
|
|
- hadoop-yarn-project (Hadoop YARN)
|
|
- hadoop-mapreduce-project (Hadoop MapReduce)
|
|
- hadoop-tools (Hadoop tools like Streaming, Distcp, etc.)
|
|
- hadoop-dist (Hadoop distribution assembler)
|
|
- hadoop-client-modules (Hadoop client modules)
|
|
- hadoop-minicluster (Hadoop minicluster artifacts)
|
|
- hadoop-cloud-storage-project (Generates artifacts to access cloud storage like aws, azure, etc.)
|
|
|
|
----------------------------------------------------------------------------------
|
|
Where to run Maven from?
|
|
|
|
It can be run from any module. The only catch is that if not run from trunk
|
|
all modules that are not part of the build run must be installed in the local
|
|
Maven cache or available in a Maven repository.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Maven build goals:
|
|
|
|
* Clean : mvn clean [-Preleasedocs]
|
|
* Compile : mvn compile [-Pnative]
|
|
* Run tests : mvn test [-Pnative] [-Pshelltest]
|
|
* Create JAR : mvn package
|
|
* Run spotbugs : mvn compile spotbugs:spotbugs
|
|
* Run checkstyle : mvn compile checkstyle:checkstyle
|
|
* Install JAR in M2 cache : mvn install
|
|
* Deploy JAR to Maven repo : mvn deploy
|
|
* Run clover : mvn test -Pclover
|
|
* Run Rat : mvn apache-rat:check
|
|
* Build javadocs : mvn javadoc:javadoc
|
|
* Build distribution : mvn package [-Pdist][-Pdocs][-Psrc][-Pnative][-Dtar][-Preleasedocs][-Pyarn-ui]
|
|
* Change Hadoop version : mvn versions:set -DnewVersion=NEWVERSION
|
|
|
|
Build options:
|
|
|
|
* Use -Pnative to compile/bundle native code
|
|
* Use -Pdocs to generate & bundle the documentation in the distribution (using -Pdist)
|
|
* Use -Psrc to create a project source TAR.GZ
|
|
* Use -Dtar to create a TAR with the distribution (using -Pdist)
|
|
* Use -Preleasedocs to include the changelog and release docs (requires Internet connectivity)
|
|
* Use -Pyarn-ui to build YARN UI v2. (Requires Internet connectivity)
|
|
* Use -DskipShade to disable client jar shading to speed up build times (in
|
|
development environments only, not to build release artifacts)
|
|
|
|
YARN Application Timeline Service V2 build options:
|
|
|
|
YARN Timeline Service v.2 chooses Apache HBase as the primary backing storage. The supported
|
|
versions of Apache HBase are 1.7.1 (default) and 2.2.4.
|
|
|
|
* HBase 1.7.1 is used by default to build Hadoop. The official releases are ready to use if you
|
|
plan on running Timeline Service v2 with HBase 1.7.1.
|
|
|
|
* Use -Dhbase.profile=2.0 to build Hadoop with HBase 2.2.4. Provide this option if you plan
|
|
on running Timeline Service v2 with HBase 2.x.
|
|
|
|
|
|
Snappy build options:
|
|
|
|
Snappy is a compression library that can be utilized by the native code.
|
|
It is currently an optional component, meaning that Hadoop can be built with
|
|
or without this dependency. Snappy library as optional dependency is only
|
|
used for hadoop-mapreduce-client-nativetask.
|
|
|
|
* Use -Drequire.snappy to fail the build if libsnappy.so is not found.
|
|
If this option is not specified and the snappy library is missing,
|
|
we silently build a version of libhadoop.so that cannot make use of snappy.
|
|
This option is recommended if you plan on making use of snappy and want
|
|
to get more repeatable builds.
|
|
* Use -Dsnappy.prefix to specify a nonstandard location for the libsnappy
|
|
header files and library files. You do not need this option if you have
|
|
installed snappy using a package manager.
|
|
* Use -Dsnappy.lib to specify a nonstandard location for the libsnappy library
|
|
files. Similarly to snappy.prefix, you do not need this option if you have
|
|
installed snappy using a package manager.
|
|
* Use -Dbundle.snappy to copy the contents of the snappy.lib directory into
|
|
the final tar file. This option requires that -Dsnappy.lib is also given,
|
|
and it ignores the -Dsnappy.prefix option. If -Dsnappy.lib isn't given, the
|
|
bundling and building will fail.
|
|
|
|
|
|
ZStandard build options:
|
|
|
|
ZStandard is a compression library that can be utilized by the native code.
|
|
It is currently an optional component, meaning that Hadoop can be built with
|
|
or without this dependency.
|
|
|
|
* Use -Drequire.zstd to fail the build if libzstd.so is not found.
|
|
If this option is not specified and the zstd library is missing.
|
|
|
|
* Use -Dzstd.prefix to specify a nonstandard location for the libzstd
|
|
header files and library files. You do not need this option if you have
|
|
installed zstandard using a package manager.
|
|
|
|
* Use -Dzstd.lib to specify a nonstandard location for the libzstd library
|
|
files. Similarly to zstd.prefix, you do not need this option if you have
|
|
installed using a package manager.
|
|
|
|
* Use -Dbundle.zstd to copy the contents of the zstd.lib directory into
|
|
the final tar file. This option requires that -Dzstd.lib is also given,
|
|
and it ignores the -Dzstd.prefix option. If -Dzstd.lib isn't given, the
|
|
bundling and building will fail.
|
|
|
|
OpenSSL build options:
|
|
|
|
OpenSSL includes a crypto library that can be utilized by the native code.
|
|
It is currently an optional component, meaning that Hadoop can be built with
|
|
or without this dependency.
|
|
|
|
* Use -Drequire.openssl to fail the build if libcrypto.so is not found.
|
|
If this option is not specified and the openssl library is missing,
|
|
we silently build a version of libhadoop.so that cannot make use of
|
|
openssl. This option is recommended if you plan on making use of openssl
|
|
and want to get more repeatable builds.
|
|
* Use -Dopenssl.prefix to specify a nonstandard location for the libcrypto
|
|
header files and library files. You do not need this option if you have
|
|
installed openssl using a package manager.
|
|
* Use -Dopenssl.lib to specify a nonstandard location for the libcrypto library
|
|
files. Similarly to openssl.prefix, you do not need this option if you have
|
|
installed openssl using a package manager.
|
|
* Use -Dbundle.openssl to copy the contents of the openssl.lib directory into
|
|
the final tar file. This option requires that -Dopenssl.lib is also given,
|
|
and it ignores the -Dopenssl.prefix option. If -Dopenssl.lib isn't given, the
|
|
bundling and building will fail.
|
|
|
|
Tests options:
|
|
|
|
* Use -DskipTests to skip tests when running the following Maven goals:
|
|
'package', 'install', 'deploy' or 'verify'
|
|
* -Dtest=<TESTCLASSNAME>,<TESTCLASSNAME#METHODNAME>,....
|
|
* -Dtest.exclude=<TESTCLASSNAME>
|
|
* -Dtest.exclude.pattern=**/<TESTCLASSNAME1>.java,**/<TESTCLASSNAME2>.java
|
|
* To run all native unit tests, use: mvn test -Pnative -Dtest=allNative
|
|
* To run a specific native unit test, use: mvn test -Pnative -Dtest=<test>
|
|
For example, to run test_bulk_crc32, you would use:
|
|
mvn test -Pnative -Dtest=test_bulk_crc32
|
|
|
|
Intel ISA-L build options:
|
|
|
|
Intel ISA-L is an erasure coding library that can be utilized by the native code.
|
|
It is currently an optional component, meaning that Hadoop can be built with
|
|
or without this dependency. Note the library is used via dynamic module. Please
|
|
reference the official site for the library details.
|
|
https://01.org/intel%C2%AE-storage-acceleration-library-open-source-version
|
|
(OR https://github.com/01org/isa-l)
|
|
|
|
* Use -Drequire.isal to fail the build if libisal.so is not found.
|
|
If this option is not specified and the isal library is missing,
|
|
we silently build a version of libhadoop.so that cannot make use of ISA-L and
|
|
the native raw erasure coders.
|
|
This option is recommended if you plan on making use of native raw erasure
|
|
coders and want to get more repeatable builds.
|
|
* Use -Disal.prefix to specify a nonstandard location for the libisal
|
|
library files. You do not need this option if you have installed ISA-L to the
|
|
system library path.
|
|
* Use -Disal.lib to specify a nonstandard location for the libisal library
|
|
files.
|
|
* Use -Dbundle.isal to copy the contents of the isal.lib directory into
|
|
the final tar file. This option requires that -Disal.lib is also given,
|
|
and it ignores the -Disal.prefix option. If -Disal.lib isn't given, the
|
|
bundling and building will fail.
|
|
|
|
Special plugins: OWASP's dependency-check:
|
|
|
|
OWASP's dependency-check plugin will scan the third party dependencies
|
|
of this project for known CVEs (security vulnerabilities against them).
|
|
It will produce a report in target/dependency-check-report.html. To
|
|
invoke, run 'mvn dependency-check:aggregate'. Note that this plugin
|
|
requires maven 3.1.1 or greater.
|
|
|
|
PMDK library build options:
|
|
|
|
The Persistent Memory Development Kit (PMDK), formerly known as NVML, is a growing
|
|
collection of libraries which have been developed for various use cases, tuned,
|
|
validated to production quality, and thoroughly documented. These libraries are built
|
|
on the Direct Access (DAX) feature available in both Linux and Windows, which allows
|
|
applications directly load/store access to persistent memory by memory-mapping files
|
|
on a persistent memory aware file system.
|
|
|
|
It is currently an optional component, meaning that Hadoop can be built without
|
|
this dependency. Please Note the library is used via dynamic module. For getting
|
|
more details please refer to the official sites:
|
|
http://pmem.io/ and https://github.com/pmem/pmdk.
|
|
|
|
* -Drequire.pmdk is used to build the project with PMDK libraries forcibly. With this
|
|
option provided, the build will fail if libpmem library is not found. If this option
|
|
is not given, the build will generate a version of Hadoop with libhadoop.so.
|
|
And storage class memory(SCM) backed HDFS cache is still supported without PMDK involved.
|
|
Because PMDK can bring better caching write/read performance, it is recommended to build
|
|
the project with this option if user plans to use SCM backed HDFS cache.
|
|
* -Dpmdk.lib is used to specify a nonstandard location for PMDK libraries if they are not
|
|
under /usr/lib or /usr/lib64.
|
|
* -Dbundle.pmdk is used to copy the specified libpmem libraries into the distribution tar
|
|
package. This option requires that -Dpmdk.lib is specified. With -Dbundle.pmdk provided,
|
|
the build will fail if -Dpmdk.lib is not specified.
|
|
|
|
Controlling the redistribution of the protobuf-2.5 dependency
|
|
|
|
The protobuf 2.5.0 library is used at compile time to compile the class
|
|
org.apache.hadoop.ipc.ProtobufHelper; this class known to have been used by
|
|
external projects in the past. Protobuf 2.5 is not used directly in
|
|
the Hadoop codebase; alongside the move to Protobuf 3.x a private successor
|
|
class, org.apache.hadoop.ipc.internal.ShadedProtobufHelper is now used.
|
|
|
|
The hadoop-common module no longer exports its compile-time dependency on
|
|
protobuf-java-2.5.
|
|
Any application declaring a dependency on hadoop-commmon will no longer get
|
|
the artifact added to their classpath.
|
|
If is still required, then they must explicitly declare it:
|
|
|
|
<dependency>
|
|
<groupId>com.google.protobuf</groupId>
|
|
<artifactId>protobuf-java</artifactId>
|
|
<version>2.5.0</version>
|
|
</dependency>
|
|
|
|
In Hadoop builds the scope of the dependency can be set with the
|
|
option "common.protobuf2.scope".
|
|
This can be upgraded from "provided" to "compile" on the maven command line:
|
|
|
|
-Dcommon.protobuf2.scope=compile
|
|
|
|
If this is done then protobuf-java-2.5.0.jar will again be exported as a
|
|
hadoop-common dependency, and included in the share/hadoop/common/lib/
|
|
directory of any Hadoop distribution built.
|
|
|
|
Note that protobuf-java-2.5.0.jar is still placed in
|
|
share/hadoop/yarn/timelineservice/lib; this is needed by the hbase client
|
|
library.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Building components separately
|
|
|
|
If you are building a submodule directory, all the hadoop dependencies this
|
|
submodule has will be resolved as all other 3rd party dependencies. This is,
|
|
from the Maven cache or from a Maven repository (if not available in the cache
|
|
or the SNAPSHOT 'timed out').
|
|
An alternative is to run 'mvn install -DskipTests' from Hadoop source top
|
|
level once; and then work from the submodule. Keep in mind that SNAPSHOTs
|
|
time out after a while, using the Maven '-nsu' will stop Maven from trying
|
|
to update SNAPSHOTs from external repos.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Importing projects to eclipse
|
|
|
|
At first, install artifacts including hadoop-maven-plugins at the top of the source tree.
|
|
|
|
$ mvn clean install -DskipTests -DskipShade
|
|
|
|
Then, import to eclipse by specifying the root directory of the project via
|
|
[File] > [Import] > [Maven] > [Existing Maven Projects].
|
|
|
|
----------------------------------------------------------------------------------
|
|
Building distributions:
|
|
|
|
Create binary distribution without native code and without Javadocs:
|
|
|
|
$ mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true
|
|
|
|
Create binary distribution with native code:
|
|
|
|
$ mvn package -Pdist,native -DskipTests -Dtar
|
|
|
|
Create source distribution:
|
|
|
|
$ mvn package -Psrc -DskipTests
|
|
|
|
Create source and binary distributions with native code:
|
|
|
|
$ mvn package -Pdist,native,src -DskipTests -Dtar
|
|
|
|
Create a local staging version of the website (in /tmp/hadoop-site)
|
|
|
|
$ mvn site site:stage -Preleasedocs,docs -DstagingDirectory=/tmp/hadoop-site
|
|
|
|
Note that the site needs to be built in a second pass after other artifacts.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Installing Hadoop
|
|
|
|
Look for these HTML files after you build the document by the above commands.
|
|
|
|
* Single Node Setup:
|
|
hadoop-project-dist/hadoop-common/SingleCluster.html
|
|
|
|
* Cluster Setup:
|
|
hadoop-project-dist/hadoop-common/ClusterSetup.html
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
Handling out of memory errors in builds
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
If the build process fails with an out of memory error, you should be able to fix
|
|
it by increasing the memory used by maven which can be done via the environment
|
|
variable MAVEN_OPTS.
|
|
|
|
Here is an example setting to allocate between 256 MB and 1.5 GB of heap space to
|
|
Maven
|
|
|
|
export MAVEN_OPTS="-Xms256m -Xmx1536m"
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
Building on macOS (without Docker)
|
|
|
|
----------------------------------------------------------------------------------
|
|
Installing required dependencies for clean install of macOS 10.14:
|
|
|
|
* Install Xcode Command Line Tools
|
|
$ xcode-select --install
|
|
* Install Homebrew
|
|
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
|
|
* Install OpenJDK 8
|
|
$ brew tap AdoptOpenJDK/openjdk
|
|
$ brew cask install adoptopenjdk8
|
|
* Install maven and tools
|
|
$ brew install maven autoconf automake cmake wget
|
|
* Install native libraries, only openssl is required to compile native code,
|
|
you may optionally install zlib, lz4, etc.
|
|
$ brew install openssl
|
|
* Protocol Buffers 3.21.12 (required to compile native code)
|
|
$ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz
|
|
$ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12
|
|
$ ./autogen.sh
|
|
$ ./configure
|
|
$ make
|
|
$ make check
|
|
$ make install
|
|
$ protoc --version
|
|
|
|
Note that building Hadoop 3.1.1/3.1.2/3.2.0 native code from source is broken
|
|
on macOS. For 3.1.1/3.1.2, you need to manually backport YARN-8622. For 3.2.0,
|
|
you need to backport both YARN-8622 and YARN-9487 in order to build native code.
|
|
|
|
----------------------------------------------------------------------------------
|
|
Building command example:
|
|
|
|
* Create binary distribution with native code but without documentation:
|
|
$ mvn package -Pdist,native -DskipTests -Dmaven.javadoc.skip \
|
|
-Dopenssl.prefix=/usr/local/opt/openssl
|
|
|
|
Note that the command above manually specified the openssl library and include
|
|
path. This is necessary at least for Homebrewed OpenSSL.
|
|
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
Building on CentOS 8
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
|
|
* Install development tools such as GCC, autotools, OpenJDK and Maven.
|
|
$ sudo dnf group install --with-optional 'Development Tools'
|
|
$ sudo dnf install java-1.8.0-openjdk-devel maven
|
|
|
|
* Install python2 for building documentation.
|
|
$ sudo dnf install python2
|
|
|
|
* Install Protocol Buffers v3.21.12.
|
|
$ curl -L https://github.com/protocolbuffers/protobuf/archive/refs/tags/v3.21.12.tar.gz > protobuf-3.21.12.tar.gz
|
|
$ tar -zxvf protobuf-3.21.12.tar.gz && cd protobuf-3.21.12
|
|
$ ./autogen.sh
|
|
$ ./configure --prefix=/usr/local
|
|
$ make
|
|
$ sudo make install
|
|
$ cd ..
|
|
|
|
* Install libraries provided by CentOS 8.
|
|
$ sudo dnf install libtirpc-devel zlib-devel lz4-devel bzip2-devel openssl-devel cyrus-sasl-devel libpmem-devel
|
|
|
|
* Install GCC 9.3.0
|
|
$ sudo dnf -y install gcc-toolset-9-gcc gcc-toolset-9-gcc-c++
|
|
$ source /opt/rh/gcc-toolset-9/enable
|
|
|
|
* Install CMake 3.19
|
|
$ curl -L https://cmake.org/files/v3.19/cmake-3.19.0.tar.gz > cmake-3.19.0.tar.gz
|
|
$ tar -zxvf cmake-3.19.0.tar.gz && cd cmake-3.19.0
|
|
$ ./bootstrap
|
|
$ make -j$(nproc)
|
|
$ sudo make install
|
|
|
|
* Install boost.
|
|
$ curl -L -o boost_1_72_0.tar.bz2 https://sourceforge.net/projects/boost/files/boost/1.72.0/boost_1_72_0.tar.bz2/download
|
|
$ tar xjf boost_1_72_0.tar.bz2
|
|
$ cd boost_1_72_0
|
|
$ ./bootstrap.sh --prefix=/usr/local
|
|
$ ./b2
|
|
$ sudo ./b2 install
|
|
|
|
* Install optional dependencies (snappy-devel).
|
|
$ sudo dnf --enablerepo=PowerTools install snappy-devel
|
|
|
|
* Install optional dependencies (libzstd-devel).
|
|
$ sudo dnf install https://dl.fedoraproject.org/pub/epel/epel-release-latest-8.noarch.rpm
|
|
$ sudo dnf --enablerepo=epel install libzstd-devel
|
|
|
|
* Install optional dependencies (isa-l).
|
|
$ sudo dnf --enablerepo=PowerTools install nasm
|
|
$ git clone https://github.com/intel/isa-l
|
|
$ cd isa-l/
|
|
$ ./autogen.sh
|
|
$ ./configure
|
|
$ make
|
|
$ sudo make install
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
Building on Windows 10
|
|
|
|
----------------------------------------------------------------------------------
|
|
Requirements:
|
|
|
|
* Windows 10
|
|
* JDK 1.8
|
|
* Maven 3.0 or later (maven.apache.org)
|
|
* Boost 1.72 (boost.org)
|
|
* Protocol Buffers 3.21.12 (https://github.com/protocolbuffers/protobuf/tags)
|
|
* CMake 3.19 or newer (cmake.org)
|
|
* Visual Studio 2019 (visualstudio.com)
|
|
* Windows SDK 8.1 (optional, if building CPU rate control for the container executor. Get this from
|
|
http://msdn.microsoft.com/en-us/windows/bg162891.aspx)
|
|
* Zlib (zlib.net, if building native code bindings for zlib)
|
|
* Git (preferably, get this from https://git-scm.com/download/win since the package also contains
|
|
Unix command-line tools that are needed during packaging).
|
|
* Python (python.org, for generation of docs using 'mvn site')
|
|
* Internet connection for first build (to fetch all Maven and Hadoop dependencies)
|
|
|
|
----------------------------------------------------------------------------------
|
|
|
|
Building guidelines:
|
|
|
|
Hadoop repository provides the Dockerfile for building Hadoop on Windows 10, located at
|
|
dev-support/docker/Dockerfile_windows_10. It is highly recommended to use this and create the
|
|
Docker image for building Hadoop on Windows 10, since you don't have to install anything else
|
|
other than Docker and no additional steps are required in terms of aligning the environment with
|
|
the necessary paths etc.
|
|
|
|
However, if you still prefer taking the route of not using Docker, this Dockerfile_windows_10 will
|
|
still be immensely useful as a raw guide for all the steps involved in creating the environment
|
|
needed to build Hadoop on Windows 10.
|
|
|
|
Building using the Docker:
|
|
We first need to build the Docker image for building Hadoop on Windows 10. Run this command from
|
|
the root of the Hadoop repository.
|
|
> docker build -t hadoop-windows-10-builder -f .\dev-support\docker\Dockerfile_windows_10 .\dev-support\docker\
|
|
|
|
Start the container with the image that we just built.
|
|
> docker run --rm -it hadoop-windows-10-builder
|
|
|
|
You can now clone the Hadoop repo inside this container and proceed with the build.
|
|
|
|
NOTE:
|
|
While one may perceive the idea of mounting the locally cloned (on the host filesystem) Hadoop
|
|
repository into the container (using the -v option), we have seen the build to fail owing to some
|
|
files not being able to be located by Maven. Thus, we suggest cloning the Hadoop repository to a
|
|
non-mounted folder inside the container and proceed with the build. When the build is completed,
|
|
you may use the "docker cp" command to copy the built Hadoop tar.gz file from the docker container
|
|
to the host filesystem. If you still would like to mount the Hadoop codebase, a workaround would
|
|
be to copy the mounted Hadoop codebase into another folder (which doesn't point to a mount) in the
|
|
container's filesystem and use this for building.
|
|
|
|
However, we noticed no build issues when the Maven repository from the host filesystem was mounted
|
|
into the container. One may use this to greatly reduce the build time. Assuming that the Maven
|
|
repository is located at D:\Maven\Repository in the host filesystem, one can use the following
|
|
command to mount the same onto the default Maven repository location while launching the container.
|
|
> docker run --rm -v D:\Maven\Repository:C:\Users\ContainerAdministrator\.m2\repository -it hadoop-windows-10-builder
|
|
|
|
Building:
|
|
|
|
Keep the source code tree in a short path to avoid running into problems related
|
|
to Windows maximum path length limitation (for example, C:\hdc).
|
|
|
|
There is one support command file located in dev-support called win-paths-eg.cmd.
|
|
It should be copied somewhere convenient and modified to fit your needs.
|
|
|
|
win-paths-eg.cmd sets up the environment for use. You will need to modify this
|
|
file. It will put all of the required components in the command path,
|
|
configure the bit-ness of the build, and set several optional components.
|
|
|
|
Several tests require that the user must have the Create Symbolic Links
|
|
privilege.
|
|
|
|
To simplify the installation of Boost, Protocol buffers, OpenSSL and Zlib dependencies we can use
|
|
vcpkg (https://github.com/Microsoft/vcpkg.git). Upon cloning the vcpkg repo, checkout the commit
|
|
7ffa425e1db8b0c3edf9c50f2f3a0f25a324541d to get the required versions of the dependencies
|
|
mentioned above.
|
|
> git clone https://github.com/Microsoft/vcpkg.git
|
|
> cd vcpkg
|
|
> git checkout 7ffa425e1db8b0c3edf9c50f2f3a0f25a324541d
|
|
> .\bootstrap-vcpkg.bat
|
|
> .\vcpkg.exe install boost:x64-windows
|
|
> .\vcpkg.exe install protobuf:x64-windows
|
|
> .\vcpkg.exe install openssl:x64-windows
|
|
> .\vcpkg.exe install zlib:x64-windows
|
|
|
|
Set the following environment variables -
|
|
(Assuming that vcpkg was checked out at C:\vcpkg)
|
|
> set PROTOBUF_HOME=C:\vcpkg\installed\x64-windows
|
|
> set MAVEN_OPTS=-Xmx2048M -Xss128M
|
|
|
|
All Maven goals are the same as described above with the exception that
|
|
native code is built by enabling the 'native-win' Maven profile. -Pnative-win
|
|
is enabled by default when building on Windows since the native components
|
|
are required (not optional) on Windows.
|
|
|
|
If native code bindings for zlib are required, then the zlib headers must be
|
|
deployed on the build machine. Set the ZLIB_HOME environment variable to the
|
|
directory containing the headers.
|
|
|
|
set ZLIB_HOME=C:\zlib-1.2.7
|
|
|
|
At runtime, zlib1.dll must be accessible on the PATH. Hadoop has been tested
|
|
with zlib 1.2.7, built using Visual Studio 2010 out of contrib\vstudio\vc10 in
|
|
the zlib 1.2.7 source tree.
|
|
|
|
http://www.zlib.net/
|
|
|
|
|
|
Build command:
|
|
The following command builds all the modules in the Hadoop project and generates the tar.gz file in
|
|
hadoop-dist/target upon successful build. Run these commands from an
|
|
"x64 Native Tools Command Prompt for VS 2019" which can be found under "Visual Studio 2019" in the
|
|
Windows start menu. If you're using the Docker image from Dockerfile_windows_10, you'll be
|
|
logged into "x64 Native Tools Command Prompt for VS 2019" automatically when you start the
|
|
container.
|
|
|
|
> set classpath=
|
|
> set PROTOBUF_HOME=C:\vcpkg\installed\x64-windows
|
|
> mvn clean package -Dhttps.protocols=TLSv1.2 -DskipTests -DskipDocs -Pnative-win,dist^
|
|
-Drequire.openssl -Drequire.test.libhadoop -Pyarn-ui -Dshell-executable=C:\Git\bin\bash.exe^
|
|
-Dtar -Dopenssl.prefix=C:\vcpkg\installed\x64-windows^
|
|
-Dcmake.prefix.path=C:\vcpkg\installed\x64-windows^
|
|
-Dwindows.cmake.toolchain.file=C:\vcpkg\scripts\buildsystems\vcpkg.cmake -Dwindows.cmake.build.type=RelWithDebInfo^
|
|
-Dwindows.build.hdfspp.dll=off -Dwindows.no.sasl=on -Duse.platformToolsetVersion=v142
|
|
|
|
Building the release tarball:
|
|
Assuming that we're still running in the Docker container hadoop-windows-10-builder, run the
|
|
following command to create the Apache Hadoop release tarball -
|
|
|
|
> set IS_WINDOWS=1
|
|
> set MVN_ARGS="-Dshell-executable=C:\Git\bin\bash.exe -Dhttps.protocols=TLSv1.2 -Pnative-win -Drequire.openssl -Dopenssl.prefix=C:\vcpkg\installed\x64-windows -Dcmake.prefix.path=C:\vcpkg\installed\x64-windows -Dwindows.cmake.toolchain.file=C:\vcpkg\scripts\buildsystems\vcpkg.cmake -Dwindows.cmake.build.type=RelWithDebInfo -Dwindows.build.hdfspp.dll=off -Duse.platformToolsetVersion=v142 -Dwindows.no.sasl=on -DskipTests -DskipDocs -Drequire.test.libhadoop"
|
|
> C:\Git\bin\bash.exe C:\hadoop\dev-support\bin\create-release --mvnargs=%MVN_ARGS%
|
|
|
|
Note:
|
|
If the building fails due to an issue with long paths, rename the Hadoop root directory to just a
|
|
letter (like 'h') and rebuild -
|
|
|
|
> C:\Git\bin\bash.exe C:\h\dev-support\bin\create-release --mvnargs=%MVN_ARGS%
|
|
|
|
----------------------------------------------------------------------------------
|
|
Building distributions:
|
|
|
|
* Build distribution with native code : mvn package [-Pdist][-Pdocs][-Psrc][-Dtar][-Dmaven.javadoc.skip=true]
|
|
|
|
----------------------------------------------------------------------------------
|
|
Running compatibility checks with checkcompatibility.py
|
|
|
|
Invoke `./dev-support/bin/checkcompatibility.py` to run Java API Compliance Checker
|
|
to compare the public Java APIs of two git objects. This can be used by release
|
|
managers to compare the compatibility of a previous and current release.
|
|
|
|
As an example, this invocation will check the compatibility of interfaces annotated as Public or LimitedPrivate:
|
|
|
|
./dev-support/bin/checkcompatibility.py --annotation org.apache.hadoop.classification.InterfaceAudience.Public --annotation org.apache.hadoop.classification.InterfaceAudience.LimitedPrivate --include "hadoop.*" branch-2.7.2 trunk
|
|
|
|
----------------------------------------------------------------------------------
|
|
Changing the Hadoop version declared returned by VersionInfo
|
|
|
|
If for compatibility reasons the version of Hadoop has to be declared as a 2.x release in the information returned by
|
|
org.apache.hadoop.util.VersionInfo, set the property declared.hadoop.version to the desired version.
|
|
For example: mvn package -Pdist -Ddeclared.hadoop.version=2.11
|
|
|
|
If unset, the project version declared in the POM file is used.
|