How to Build Native Hadoop Libraries for Oracle Solaris 11

by Stephan Schneider

How to build and deploy native libraries that accelerate the performance of Hadoop on Oracle Solaris 11.


Published May 2013


The Hadoop 1.0.4. download is a functional product that works out of the box with Oracle Solaris 11. Hadoop is able to use native platform libraries that accelerate the Hadoop suite. These native libraries need to be downloaded or built.

Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.

The steps to build and deploy the native libraries are:

Prerequisites

The following conditions are assumed to be met:

  • The Hadoop installation is available in the directory /usr/local/hadoop, and this directory is owned by the Hadoop administration user hadoop and is writable for this user.
  • The Snappy compression libraries are available in the directory /usr/local/lib.
  • The Oracle Solaris 11 system with Hadoop can use a repository to install additional packages.

Configure the Oracle Solaris 11 System

The Oracle Solaris 11 system needs the following packages to be installed in order to build the native Hadoop libraries:

  • solarisstudio-123 (Oracle Solaris Studio)
  • automake
  • autoconf
  • ant
  • libtool
  • gcc
  • JDK 6

The JDK 6 package is the only package needed for Hadoop at runtime. The other packages are required only to build the libraries.

The Oracle Solaris 11 build system needs to be able to interact with its solaris repository, and it needs to be able to access a repository that hosts Oracle Solaris Studio 12.3.

Oracle Solaris Studio is a free developer suite that is described and available for download on the Oracle Solaris Studio page.

You can install the other packages using the following command with root privileges from an Oracle Solaris 11 IPS server:

$ pkg install automake autoconf ant libtool gcc-45 jdk-6

This installation command is idempotent, which means that it can be called multiple times without a negative effect. The packages will be installed when needed. The command will not have any effect if all packages are already installed.

Configure a Hadoop Administration Account

It is assumed that you will build the native Hadoop libraries with the Hadoop administration account. The sources for the native libraries are part of the installation tarball for Hadoop. Using the Hadoop administration account ensures that the libraries have the right ownership and the correct access rights. The access rights of the Hadoop account that is used at Hadoop runtime are required.

You should have a few shell variables set in order to allow the build process to work with JDK 6. Also, the build process needs to be able to find the Hadoop configuration directory.

It is assumed that the Hadoop administration account is using the bash shell. You can set the variables by adding the following lines to the .profile and .bashrc file in the home directory of the Hadoop administration account:

export PATH=/usr/jdk/instances/jdk1.6.0/bin:$PATH:/usr/local/hadoop/bin
export JAVA_HOME=/usr/jdk/instances/jdk1.6.0
export HADOOP_CONF_DIR=/usr/local/hadoop/conf 

Download and Install Google Snappy

The Google Snappy compression libraries speed up Hadoop compression tasks. You can find a description of the Oracle Solaris 11 build process for these libraries at Scalingbits.com.

Build the Native Hadoop Libraries

For the remainder of this document, it is assumed that the HADOOP_DIR variable is set to the appropriate value:

HADOOP_DIR=/usr/local/hadoop

Patch the NativeIO.java File

Oracle Solaris has specific flags for file I/O. You need to update the value of these flags in the file $HADOOP_DIR/src/core/org/apache/hadoop/io/nativeio/NativeIO.java.

The following are the Java constants you need to change:

   public static final int O_CREAT    = 0x100;
   public static final int O_EXCL     = 0x400;
   public static final int O_NOCTTY   = 0x800;
   public static final int O_TRUNC    = 0x200;
   public static final int O_APPEND   = 0x08;
   public static final int O_NONBLOCK = 0x80;
   public static final int O_SYNC     = 0x10;

Patch the Makefile.am File

Oracle Solaris requires some specific flags for the gcc compiler. In the $HADOOP_DIR/src/native/Makefile.am file, update the AM_CLFAGS variable to the following value:

AM_CFLAGS = -D_POSIX_C_SOURCE=199506L -D__EXTENSIONS__ -g -Wall -fPIC -O2 -m$(JVM_DATA_MODEL)

Update the Hadoop Configuration

The Hadoop configuration file ${HADOOP_DIR}/conf/hadoop-env.sh has to point to JDK 6. This might already be the case. If not, use the following command:

echo "JAVA_HOME=/usr/jdk/instances/jdk1.6.0" >>
${HADOOP_DIR}/conf/hadoop-env.sh

Launch the Build Process

The build process requires access to Maven and Ivy repositories over the internet. The build system has to be able to reach out to the internet. Ant can be directed to use an HTTP proxy by setting an option in a shell variable as follows:

$ export ANT_OPTS="-Dhttp.proxyHost=myhttpprox.mydomain.com"

You can start the build from the Hadoop home directory, HADOOP_DIR, using the ant tool. So, first change to that directory:

$ cd  ${HADOOP_DIR}

Run the following commands to set all the required shell variables and kick off the build process:

export PATH=/usr/jdk/instances/jdk1.6.0/bin:$PATH
echo PATH $PATH
export JAVA_HOME=/usr/jdk/instances/jdk1.6.0
echo JAVA_HOME $JAVA_HOME
export LD_RUN_PATH=/usr/local/lib
export LD_LIBRARY_PATH=/usr/local/lib
ant -verbose -logfile ant.log -Dcompile.native=true -Dmake.cmd=gmake -Dos.name=SunOS -Dnonspace.os=SunOS

The file ant.log keeps all the information that is generated while the libraries are built. This file is important for analyzing and fixing problems with the build process.

The build process creates a directory structure under ${HADOOP_DIR}/build. The native libraries are stored in ${HADOOP_DIR}/build/native. This directory contains a subdirectory whose name starts with SunOS and ends with the name of the architecture. So, the name will be SunOS-x86-32 for an Intel-based Oracle Solaris system, and it will be SunOS-SPARC-32 for a SPARC-based system.

Deploy the Native Hadoop Libraries

Copy the subdirectory structure under ${HADOOP_DIR}/build/native/SunOS* to ${HADOOP_DIR}/lib/native/SunOS* using a command such as this:

$ cp -R ${HADOOP_DIR}/build/native/SunOS-x86-32 ${HADOOP_DIR}/lib/native/SunOS-x86-32

The Snappy library needs to be available in the directory ${HADOOP_DIR}/lib/native/SunOS*, so use a command such as this:

$ cp /usr/local/lib/libsnappy.so ${HADOOP_DIR}/lib/native/SunOS-x86-32

Validate the Deployment of the Native Hadoop Libraries

Hadoop will report at startup time whether it was able to load the native libraries. Therefore, you can validate the success of the deployment by checking the log files.

The log file's name will have an individual time stamp in the format of YYYY-MM-DD. An example log file name is hadoop-hadoop-jobtracker-solaris.log.YYYY-MM-DD. The log file should contain a line similar to this:

2013-03-31 10:42:10,487 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library

See Also

Additional links for Oracle Solaris:

About the Author

Stefan Schneider worked with SunOS doing 3D robot simulation research while obtaining a PhD degree in object oriented databases for manufacturing. He then productized an object oriented database for a startup company in the early 1990s. He joined Sun in 1994 to port and optimize a SAP application to Solaris 2, and he worked with Sun's key partners to support the adoption of Sun technologies, such as Oracle Solaris, multithreading, Java SE, and Swing. As the CTO of Oracle's ISV Engineering group, he currently manages Oracle Solaris 11 adoption for all Oracle partners.

Revision 1.1, 05/31/2013

Follow us:
Blog | Facebook | Twitter | YouTube