95 lines
3.4 KiB
Plaintext
95 lines
3.4 KiB
Plaintext
|
~~ Licensed under the Apache License, Version 2.0 (the "License");
|
||
|
~~ you may not use this file except in compliance with the License.
|
||
|
~~ You may obtain a copy of the License at
|
||
|
~~
|
||
|
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||
|
~~
|
||
|
~~ Unless required by applicable law or agreed to in writing, software
|
||
|
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||
|
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||
|
~~ See the License for the specific language governing permissions and
|
||
|
~~ limitations under the License. See accompanying LICENSE file.
|
||
|
|
||
|
---
|
||
|
C API libhdfs
|
||
|
---
|
||
|
---
|
||
|
${maven.build.timestamp}
|
||
|
|
||
|
C API libhdfs
|
||
|
|
||
|
%{toc|section=1|fromDepth=0}
|
||
|
|
||
|
* Overview
|
||
|
|
||
|
libhdfs is a JNI based C API for Hadoop's Distributed File System
|
||
|
(HDFS). It provides C APIs to a subset of the HDFS APIs to manipulate
|
||
|
HDFS files and the filesystem. libhdfs is part of the Hadoop
|
||
|
distribution and comes pre-compiled in
|
||
|
<<<${HADOOP_PREFIX}/libhdfs/libhdfs.so>>> .
|
||
|
|
||
|
* The APIs
|
||
|
|
||
|
The libhdfs APIs are a subset of: {{{hadoop fs APIs}}}.
|
||
|
|
||
|
The header file for libhdfs describes each API in detail and is
|
||
|
available in <<<${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h>>>
|
||
|
|
||
|
* A Sample Program
|
||
|
|
||
|
----
|
||
|
\#include "hdfs.h"
|
||
|
|
||
|
int main(int argc, char **argv) {
|
||
|
|
||
|
hdfsFS fs = hdfsConnect("default", 0);
|
||
|
const char* writePath = "/tmp/testfile.txt";
|
||
|
hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
|
||
|
if(!writeFile) {
|
||
|
fprintf(stderr, "Failed to open %s for writing!\n", writePath);
|
||
|
exit(-1);
|
||
|
}
|
||
|
char* buffer = "Hello, World!";
|
||
|
tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
|
||
|
if (hdfsFlush(fs, writeFile)) {
|
||
|
fprintf(stderr, "Failed to 'flush' %s\n", writePath);
|
||
|
exit(-1);
|
||
|
}
|
||
|
hdfsCloseFile(fs, writeFile);
|
||
|
}
|
||
|
----
|
||
|
|
||
|
* How To Link With The Library
|
||
|
|
||
|
See the Makefile for <<<hdfs_test.c>>> in the libhdfs source directory
|
||
|
(<<<${HADOOP_PREFIX}/src/c++/libhdfs/Makefile>>>) or something like:
|
||
|
<<<gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample>>>
|
||
|
|
||
|
* Common Problems
|
||
|
|
||
|
The most common problem is the <<<CLASSPATH>>> is not set properly when
|
||
|
calling a program that uses libhdfs. Make sure you set it to all the
|
||
|
Hadoop jars needed to run Hadoop itself. Currently, there is no way to
|
||
|
programmatically generate the classpath, but a good bet is to include
|
||
|
all the jar files in <<<${HADOOP_PREFIX}>>> and <<<${HADOOP_PREFIX}/lib>>> as well
|
||
|
as the right configuration directory containing <<<hdfs-site.xml>>>
|
||
|
|
||
|
* Thread Safe
|
||
|
|
||
|
libdhfs is thread safe.
|
||
|
|
||
|
* Concurrency and Hadoop FS "handles"
|
||
|
|
||
|
The Hadoop FS implementation includes a FS handle cache which
|
||
|
caches based on the URI of the namenode along with the user
|
||
|
connecting. So, all calls to <<<hdfsConnect>>> will return the same
|
||
|
handle but calls to <<<hdfsConnectAsUser>>> with different users will
|
||
|
return different handles. But, since HDFS client handles are
|
||
|
completely thread safe, this has no bearing on concurrency.
|
||
|
|
||
|
* Concurrency and libhdfs/JNI
|
||
|
|
||
|
The libhdfs calls to JNI should always be creating thread local
|
||
|
storage, so (in theory), libhdfs should be as thread safe as the
|
||
|
underlying calls to the Hadoop FS.
|