146 lines
7.0 KiB
Plaintext
146 lines
7.0 KiB
Plaintext
|
<!---
|
|||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
|||
|
you may not use this file except in compliance with the License.
|
|||
|
You may obtain a copy of the License at
|
|||
|
|
|||
|
http://www.apache.org/licenses/LICENSE-2.0
|
|||
|
|
|||
|
Unless required by applicable law or agreed to in writing, software
|
|||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|||
|
See the License for the specific language governing permissions and
|
|||
|
limitations under the License. See accompanying LICENSE file.
|
|||
|
-->
|
|||
|
|
|||
|
Native Libraries Guide
|
|||
|
======================
|
|||
|
|
|||
|
* [Native Libraries Guide](#Native_Libraries_Guide)
|
|||
|
* [Overview](#Overview)
|
|||
|
* [Native Hadoop Library](#Native_Hadoop_Library)
|
|||
|
* [Usage](#Usage)
|
|||
|
* [Components](#Components)
|
|||
|
* [Supported Platforms](#Supported_Platforms)
|
|||
|
* [Download](#Download)
|
|||
|
* [Build](#Build)
|
|||
|
* [Runtime](#Runtime)
|
|||
|
* [Check](#Check)
|
|||
|
* [Native Shared Libraries](#Native_Shared_Libraries)
|
|||
|
|
|||
|
Overview
|
|||
|
--------
|
|||
|
|
|||
|
This guide describes the native hadoop library and includes a small discussion about native shared libraries.
|
|||
|
|
|||
|
Note: Depending on your environment, the term "native libraries" could refer to all \*.so's you need to compile; and, the term "native compression" could refer to all \*.so's you need to compile that are specifically related to compression. Currently, however, this document only addresses the native hadoop library (`libhadoop.so`). The document for libhdfs library (`libhdfs.so`) is [here](../hadoop-hdfs/LibHdfs.html).
|
|||
|
|
|||
|
Native Hadoop Library
|
|||
|
---------------------
|
|||
|
|
|||
|
Hadoop has native implementations of certain components for performance reasons and for non-availability of Java implementations. These components are available in a single, dynamically-linked native library called the native hadoop library. On the \*nix platforms the library is named `libhadoop.so`.
|
|||
|
|
|||
|
Usage
|
|||
|
-----
|
|||
|
|
|||
|
It is fairly easy to use the native hadoop library:
|
|||
|
|
|||
|
1. Review the components.
|
|||
|
2. Review the supported platforms.
|
|||
|
3. Either download a hadoop release, which will include a pre-built version of the native hadoop library, or build your own version of the native hadoop library. Whether you download or build, the name for the library is the same: libhadoop.so
|
|||
|
4. Install the compression codec development packages (\>zlib-1.2, \>gzip-1.2):
|
|||
|
* If you download the library, install one or more development packages - whichever compression codecs you want to use with your deployment.
|
|||
|
* If you build the library, it is mandatory to install both development packages.
|
|||
|
5. Check the runtime log files.
|
|||
|
|
|||
|
Components
|
|||
|
----------
|
|||
|
|
|||
|
The native hadoop library includes various components:
|
|||
|
|
|||
|
* Compression Codecs (bzip2, lz4, snappy, zlib)
|
|||
|
* Native IO utilities for [HDFS Short-Circuit Local Reads](../hadoop-hdfs/ShortCircuitLocalReads.html) and [Centralized Cache Management in HDFS](../hadoop-hdfs/CentralizedCacheManagement.html)
|
|||
|
* CRC32 checksum implementation
|
|||
|
|
|||
|
Supported Platforms
|
|||
|
-------------------
|
|||
|
|
|||
|
The native hadoop library is supported on \*nix platforms only. The library does not to work with Cygwin or the Mac OS X platform.
|
|||
|
|
|||
|
The native hadoop library is mainly used on the GNU/Linus platform and has been tested on these distributions:
|
|||
|
|
|||
|
* RHEL4/Fedora
|
|||
|
* Ubuntu
|
|||
|
* Gentoo
|
|||
|
|
|||
|
On all the above distributions a 32/64 bit native hadoop library will work with a respective 32/64 bit jvm.
|
|||
|
|
|||
|
Download
|
|||
|
--------
|
|||
|
|
|||
|
The pre-built 32-bit i386-Linux native hadoop library is available as part of the hadoop distribution and is located in the `lib/native` directory. You can download the hadoop distribution from Hadoop Common Releases.
|
|||
|
|
|||
|
Be sure to install the zlib and/or gzip development packages - whichever compression codecs you want to use with your deployment.
|
|||
|
|
|||
|
Build
|
|||
|
-----
|
|||
|
|
|||
|
The native hadoop library is written in ANSI C and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool). This means it should be straight-forward to build the library on any platform with a standards-compliant C compiler and the GNU autotools-chain (see the supported platforms).
|
|||
|
|
|||
|
The packages you need to install on the target platform are:
|
|||
|
|
|||
|
* C compiler (e.g. GNU C Compiler)
|
|||
|
* GNU Autools Chain: autoconf, automake, libtool
|
|||
|
* zlib-development package (stable version \>= 1.2.0)
|
|||
|
* openssl-development package(e.g. libssl-dev)
|
|||
|
|
|||
|
Once you installed the prerequisite packages use the standard hadoop pom.xml file and pass along the native flag to build the native hadoop library:
|
|||
|
|
|||
|
$ mvn package -Pdist,native -DskipTests -Dtar
|
|||
|
|
|||
|
You should see the newly-built library in:
|
|||
|
|
|||
|
$ hadoop-dist/target/hadoop-${project.version}/lib/native
|
|||
|
|
|||
|
Please note the following:
|
|||
|
|
|||
|
* It is mandatory to install both the zlib and gzip development packages on the target platform in order to build the native hadoop library; however, for deployment it is sufficient to install just one package if you wish to use only one codec.
|
|||
|
* It is necessary to have the correct 32/64 libraries for zlib, depending on the 32/64 bit jvm for the target platform, in order to build and deploy the native hadoop library.
|
|||
|
|
|||
|
Runtime
|
|||
|
-------
|
|||
|
|
|||
|
The bin/hadoop script ensures that the native hadoop library is on the library path via the system property: `-Djava.library.path=<path> `
|
|||
|
|
|||
|
During runtime, check the hadoop log files for your MapReduce tasks.
|
|||
|
|
|||
|
* If everything is all right, then: `DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library...` `INFO util.NativeCodeLoader - Loaded the native-hadoop library`
|
|||
|
* If something goes wrong, then: `INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable`
|
|||
|
|
|||
|
Check
|
|||
|
-----
|
|||
|
|
|||
|
NativeLibraryChecker is a tool to check whether native libraries are loaded correctly. You can launch NativeLibraryChecker as follows:
|
|||
|
|
|||
|
$ hadoop checknative -a
|
|||
|
14/12/06 01:30:45 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
|
|||
|
14/12/06 01:30:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
|
|||
|
Native library checking:
|
|||
|
hadoop: true /home/ozawa/hadoop/lib/native/libhadoop.so.1.0.0
|
|||
|
zlib: true /lib/x86_64-linux-gnu/libz.so.1
|
|||
|
snappy: true /usr/lib/libsnappy.so.1
|
|||
|
lz4: true revision:99
|
|||
|
bzip2: false
|
|||
|
|
|||
|
Native Shared Libraries
|
|||
|
-----------------------
|
|||
|
|
|||
|
You can load any native shared library using DistributedCache for distributing and symlinking the library files.
|
|||
|
|
|||
|
This example shows you how to distribute a shared library, mylib.so, and load it from a MapReduce task.
|
|||
|
|
|||
|
1. First copy the library to the HDFS: `bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1`
|
|||
|
2. The job launching program should contain the following: `DistributedCache.createSymlink(conf);` `DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so. 1#mylib.so", conf);`
|
|||
|
3. The MapReduce task can contain: `System.loadLibrary("mylib.so");`
|
|||
|
|
|||
|
Note: If you downloaded or built the native hadoop library, you don’t need to use DistibutedCache to make the library available to your MapReduce tasks.
|