Published May 2013
How to build and deploy native libraries that accelerate the performance of Hadoop on Oracle Solaris 11.
The Hadoop 1.0.4. download is a functional product that works out of the box with Oracle Solaris 11. Hadoop is able to use native platform libraries that accelerate the Hadoop suite. These native libraries need to be downloaded or built.
The steps to build and deploy the native libraries are:
The following conditions are assumed to be met:
/usr/local/hadoop
, and this directory is owned by the Hadoop administration user hadoop
and is writable for this user./usr/local/lib
.The Oracle Solaris 11 system needs the following packages to be installed in order to build the native Hadoop libraries:
solarisstudio-123
(Oracle Solaris Studio)automake
autoconf
ant
libtool
gcc
JDK 6
The JDK 6 package is the only package needed for Hadoop at runtime. The other packages are required only to build the libraries.
The Oracle Solaris 11 build system needs to be able to interact with its solaris
repository, and it needs to be able to access a repository that hosts Oracle Solaris Studio 12.3.
Oracle Solaris Studio is a free developer suite that is described and available for download on the Oracle Solaris Studio page.
You can install the other packages using the following command with root
privileges from an Oracle Solaris 11 IPS server:
$ pkg install automake autoconf ant libtool gcc-45 jdk-6
This installation command is idempotent, which means that it can be called multiple times without a negative effect. The packages will be installed when needed. The command will not have any effect if all packages are already installed.
It is assumed that you will build the native Hadoop libraries with the Hadoop administration account. The sources for the native libraries are part of the installation tarball for Hadoop. Using the Hadoop administration account ensures that the libraries have the right ownership and the correct access rights. The access rights of the Hadoop account that is used at Hadoop runtime are required.
You should have a few shell variables set in order to allow the build process to work with JDK 6. Also, the build process needs to be able to find the Hadoop configuration directory.
It is assumed that the Hadoop administration account is using the bash shell. You can set the variables by adding the following lines to the .profile
and .bashrc
file in the home directory of the Hadoop administration account:
export PATH=/usr/jdk/instances/jdk1.6.0/bin:$PATH:/usr/local/hadoop/bin
export JAVA_HOME=/usr/jdk/instances/jdk1.6.0
export HADOOP_CONF_DIR=/usr/local/hadoop/conf
The Google Snappy compression libraries speed up Hadoop compression tasks. You can find a description of the Oracle Solaris 11 build process for these libraries at Scalingbits.com.
For the remainder of this document, it is assumed that the HADOOP_DIR
variable is set to the appropriate value:
HADOOP_DIR=/usr/local/hadoop
Patch the NativeIO.java
File
Oracle Solaris has specific flags for file I/O. You need to update the value of these flags in the file $HADOOP_DIR/src/core/org/apache/hadoop/io/nativeio/NativeIO.java
.
The following are the Java constants you need to change:
public static final int O_CREAT = 0x100;
public static final int O_EXCL = 0x400;
public static final int O_NOCTTY = 0x800;
public static final int O_TRUNC = 0x200;
public static final int O_APPEND = 0x08;
public static final int O_NONBLOCK = 0x80;
public static final int O_SYNC = 0x10;
Patch the Makefile.am
File
Oracle Solaris requires some specific flags for the gcc
compiler. In the $HADOOP_DIR/src/native/Makefile.am
file, update the AM_CLFAGS
variable to the following value:
AM_CFLAGS = -D_POSIX_C_SOURCE=199506L -D__EXTENSIONS__ -g -Wall -fPIC -O2 -m$(JVM_DATA_MODEL)
Update the Hadoop Configuration
The Hadoop configuration file ${HADOOP_DIR}/conf/hadoop-env.sh
has to point to JDK 6. This might already be the case. If not, use the following command:
echo "JAVA_HOME=/usr/jdk/instances/jdk1.6.0" >>
${HADOOP_DIR}/conf/hadoop-env.sh
The build process requires access to Maven and Ivy repositories over the internet. The build system has to be able to reach out to the internet. Ant can be directed to use an HTTP proxy by setting an option in a shell variable as follows:
$ export ANT_OPTS="-Dhttp.proxyHost=myhttpprox.mydomain.com"
You can start the build from the Hadoop home directory, HADOOP_DIR
, using the ant tool. So, first change to that directory:
$ cd ${HADOOP_DIR}
Run the following commands to set all the required shell variables and kick off the build process:
export PATH=/usr/jdk/instances/jdk1.6.0/bin:$PATH
echo PATH $PATH
export JAVA_HOME=/usr/jdk/instances/jdk1.6.0
echo JAVA_HOME $JAVA_HOME
export LD_RUN_PATH=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib
ant -verbose -logfile ant.log -Dcompile.native=true -Dmake.cmd=gmake -Dos.name=SunOS -Dnonspace.os=SunOS
The file ant.log
keeps all the information that is generated while the libraries are built. This file is important for analyzing and fixing problems with the build process.
The build process creates a directory structure under ${HADOOP_DIR}/build
. The native libraries are stored in ${HADOOP_DIR}/build/native
. This directory contains a subdirectory whose name starts with SunOS
and ends with the name of the architecture. So, the name will be SunOS-x86-32
for an Intel-based Oracle Solaris system, and it will be SunOS-SPARC-32
for a SPARC-based system.
Copy the subdirectory structure under ${HADOOP_DIR}/build/native/SunOS*
to ${HADOOP_DIR}/lib/native/SunOS*
using a command such as this:
$ cp -R ${HADOOP_DIR}/build/native/SunOS-x86-32 ${HADOOP_DIR}/lib/native/SunOS-x86-32
The Snappy library needs to be available in the directory ${HADOOP_DIR}/lib/native/SunOS*
, so use a command such as this:
$ cp /usr/local/lib/libsnappy.so ${HADOOP_DIR}/lib/native/SunOS-x86-32
Hadoop will report at startup time whether it was able to load the native libraries. Therefore, you can validate the success of the deployment by checking the log files.
The log file's name will have an individual time stamp in the format of YYYY-MM-DD. An example log file name is hadoop-hadoop-jobtracker-solaris.log.YYYY-MM-DD
. The log file should contain a line similar to this:
2013-03-31 10:42:10,487 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
Additional links for Oracle Solaris:
Stefan Schneider worked with SunOS doing 3D robot simulation research while obtaining a PhD degree in object oriented databases for manufacturing. He then productized an object oriented database for a startup company in the early 1990s. He joined Sun in 1994 to port and optimize a SAP application to Solaris 2, and he worked with Sun's key partners to support the adoption of Sun technologies, such as Oracle Solaris, multithreading, Java SE, and Swing. As the CTO of Oracle's ISV Engineering group, he currently manages Oracle Solaris 11 adoption for all Oracle partners.
Revision 1.1, 05/31/2013