How to Set Up a Hadoop 2.2 Cluster From the Unified Archive

In Oracle Solaris 11
 

by Orgad Kimchi

How to set up an Apache Hadoop 2.2 (YARN) cluster on a single system using Oracle Solaris Zones, the ZFS file system, and the new Unified Archive capabilities of Oracle Solaris 11.2. Also see how to configure manual or automatic failover, and how to use the Unified Archive to create a "cloud in a box" and deploy a bare-metal system.


Published May 2014 (updated June 2014)


Want to comment on this article? Post the link on Facebook's OTN Garage page.  Have a similar article to share? Bring it up on Facebook or Twitter and let's discuss.
Table of Contents
About Hadoop and Oracle Solaris Zones
Download and Install Hadoop
Configure the Network Time Protocol
Create the Scripts
Create the NameNodes, DataNodes, and ResourceManager Zones
Configure the Active NameNode
Set Up SSH
Set Up the Standby NameNode and the ResourceManager
Set Up the DataNode Zones
Verify the SSH Setup
Verify Name Resolution
Format the Hadoop File System
Start the Hadoop Cluster
About Hadoop High Availability
Configure Manual Failover
About Apache ZooKeeper and Automatic Failover
Configure Automatic Failover
Create a "Cloud in a Box" Using Unified Archive
Deploy a Bare-Metal System from a Unified Archive
Conclusion
See Also
About the Author

This article starts with a brief overview of Hadoop and follows with an example of setting up a Hadoop cluster with two NameNodes, a Resource Manager, a History Server, and three DataNodes. As a prerequisite, you should have a basic understanding of Oracle Solaris Zones and network administration.

About Hadoop and Oracle Solaris Zones

The Apache Hadoop software is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.

The following are benefits of using Oracle Solaris Zones for a Hadoop cluster:

  • Fast provision of new cluster members using the zone cloning feature
  • Optimized disk I/O utilization for better I/O performance with ZFS built-in compression and the ability to tune ZFS for faster throughput for the I/O workload
  • Ability to build a "cloud in a box solution" for rapid cluster provisioning using the Unified Archive capability of Oracle Solaris

To store data, Hadoop uses the Hadoop Distributed File System (HDFS), which provides high-throughput access to application data and is suitable for applications that have large data sets. For more information about Hadoop and HDFS see http://hadoop.apache.org/.

The Hadoop cluster building blocks are as follows:

  • Active NameNode: The centerpiece of HDFS, which stores file system metadata and is responsible for all client operations
  • Standby NameNode: A secondary NameNode that synchronizes its state with the active NameNode in order to provide fast failover if the active NameNode goes down
  • ResourceManager: The global resource scheduler, which directs the slave NodeManager daemons to perform the low-level I/O tasks
  • DataNodes: Nodes that store the data in the HDFS file system and are also known as slaves; these nodes run the NodeManager process that communicates with the ResourceManager
  • History Server: Provides REST APIs in order to allow the user to get the status of finished applications and provides information about finished jobs

In the previous Hadoop version, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Hadoop version 2.2 provides the ability to build an HDFS cluster with high availability (HA), and this article describes the steps involved in building such a configuration.

In the example presented in this article, all the Hadoop cluster building blocks are installed using Oracle Solaris Zones, ZFS, and Unified Archive. Figure 1 shows the architecture:

Diagram of the cluster architecture

Figure 1. Diagram of the cluster architecture

Download and Install Hadoop

Important: In the examples presented in this article, the command prompt indicates which user needs to run each command in addition to indicating the environment where the command should be run. For example, the command prompt root@global _zone:~# indicates that user root needs to run the command from the global zone.

This article uses the Apache Hadoop "15 October, 2013: Release 2.2.0" release.

Note: Oracle Solaris 11.2 must be installed on the system on which you install Hadoop. Refer to the "See Also" section for Oracle Solaris download and documentation links.

  1. On the global zone, create the /usr/local directory if it doesn't exist.

    Note: The cluster configuration will share the Hadoop directory structure (/usr/local/hadoop) across the zones as a read-only file system. Every Hadoop cluster node needs to be able to write its logs to an individual directory. The directory /var/log is a best-practice directory for every Oracle Solaris Zone.

    root@global_zone:~# mkdir -p /usr/local
    
  2. Download the Hadoop tarball hadoop-2.2.0.tar.gz and the MD5 checksum file from http://www.apache.org/dyn/closer.cgi/hadoop/common/.
  3. Verify the Hadoop tarball's integrity:

    1. Run the md5sum command:

      root@global_zone:~# md5sum /tmp/hadoop-2.2.0.tar.gz
      25f27eb0b5617e47c032319c0bfd9962  /tmp/hadoop-2.2.0.tar.gz
      
    2. Compare the checksum in the md5sum output to that of the tarball file:

      root@global_zone:~# cat "hadoop-2.2.0.tar.gz.mds" | grep -i md5
      hadoop-2.2.0.tar.gz:   MD5 = 25 F2 7E B0 B5 61 7E 47  C0 32 31 9C 0B FD 99 62
      
  4. Copy the Hadoop tarball to /usr/local:

    root@global_zone:~# cp /tmp/hadoop-2.2.0.tar.gz /usr/local
    
  5. Unpack the tarball:

    root@global_zone:~# cd /usr/local
    root@global_zone:~# tar -xvfz /usr/local/hadoop-2.2.0.tar.gz
    
  6. Create the hadoop group:

    root@global_zone:~# groupadd -g 200 hadoop
    
  7. Create a symlink for the Hadoop binaries:

    root@global_zone:~# ln -s /usr/local/hadoop-2.2.0 /usr/local/hadoop
    
  8. Give ownership to the hadoop group:

    root@global_zone:~# chown -R root:hadoop /usr/local/hadoop-2.2.0
    
  9. Change the permissions:

    root@global_zone:~# chmod -R 755 /usr/local/hadoop-2.2.0
    
  10. Edit the Hadoop configuration files, which are shown in Table 1:

    Table 1. Hadoop Configuration Files
    File Name Description
    hadoop-env.sh Specifies environment variable settings used by Hadoop
    yarn-env.sh Specifies environment variable settings used by YARN
    mapred-env.sh Specifies environment variable settings used by MapReduce
    slaves Contains a list of machine names that run the DataNode and NodeManager pair of daemons
    core-site.xml Specifies parameters relevant to all Hadoop daemons and clients
    hdfs-site.xml Specifies parameters used by the HDFS daemons and clients
    mapred-site.xml Specifies parameters used by the MapReduce daemons and clients
    yarn-site.xml Specifies the configurations for the ResourceManager and NodeManager

    1. Run the following commands to change the hadoop-env.sh script:

      root@global_zone:~# export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
      root@global_zone:~# cd $HADOOP_CONF_DIR
      
    2. Append the following lines to the hadoop-env.sh script:

      root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> hadoop-env.sh
      root@global_zone:~# echo "export HADOOP_LOG_DIR=/var/log/hadoop/hdfs" >> hadoop-env.sh
      
    3. Append the following lines to the yarn-env.sh script:

      root@global_zone:~# vi yarn-env.sh
      
      export JAVA_HOME=/usr/java
      export YARN_LOG_DIR=/var/log/hadoop/yarn
      export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
      export HADOOP_HOME=/usr/local/hadoop
      export HADOOP_MAPRED_HOME=$HADOOP_HOME
      export HADOOP_COMMON_HOME=$HADOOP_HOME
      export HADOOP_HDFS_HOME=$HADOOP_HOME
      export YARN_HOME=$HADOOP_HOME
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
      export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
      
    4. Append the following lines to the mapred-env.sh script:

      root@global_zone:~# echo "export JAVA_HOME=/usr/java" >> mapred-env.sh
      root@global_zone:~# echo "export HADOOP_MAPRED_LOG_DIR=/var/log/hadoop/mapred" >> mapred-env.sh
      root@global_zone:~# echo "export HADOOP_MAPRED_IDENT_STRING=mapred" >> mapred-env.sh
      
    5. Edit the slaves file to replace the localhost entry with the following lines:

      root@global_zone:~# vi slaves
      
      data-node1
      data-node2
      data-node3
      
    6. Edit the core-site.xml file so it looks like the following:

      Note: fs.defaultFS is the URI that describes the NameNode address (protocol specifier, hostname, and port) for the cluster. Each DataNode instance will register with this NameNode and make its data available through it. In addition, the DataNodes send heartbeats to the NameNode to confirm that each DataNode is operating and the block replicas it hosts are available.

      root@global_zone:~# vi core-site.xml
      
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!-- Put site-specific property overrides in this file. -->
      <configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://name-node1</value>
        </property>
      </configuration>
      
    7. Edit the hdfs-site.xml file so it looks like the following.

      Notes:

      dfs.datanode.data.dir The path on the local file system in which the DataNode instance should store its data.
      dfs.namenode.name.dir The path on the local file system of the NameNode instance where the NameNode metadata is stored. It is used only by the NameNode instance to find its information.
      dfs.replication The default replication factor for each block of data in the file system. (For a production cluster, this should usually be left at its default value of 3).
      dfs.permission.supergroup Specifies the UNIX group containing users that will be treated as superusers by HDFS. You can stick with the value of hadoop or pick your own group depending on the security policies at your site.

      root@global_zone:~# vi hdfs-site.xml
      
      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <!-- Put site-specific property overrides in this file. -->
      <configuration>
        <property>
          <name>dfs.datanode.data.dir</name>
          <value>/var/data/1/dfs/dn</value>
        </property>
        <property>
          <name>dfs.namenode.name.dir</name>
          <value>/var/data/1/dfs/nn</value>
        </property>
        <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>
        <property>
          <name>dfs.permission.supergroup</name>
         <value>hadoop</value>
        </property>
      </configuration>
      
    8. Edit the mapred-site.xml file so it looks like the following:

      Notes:

      mapreduce.framework.name Sets the execution framework to Hadoop YARN
      mapreduce.jobhistory.address Specifies the MapReduce History Server's host:port
      mapreduce.jobhistory.webapp.address Specifies the MapReduce History Server's web UI host:port
      yarn.app.mapreduce.am.staging-dir Specifies a staging directory, which YARN requires for temporary files created by running jobs

      root@global_zone:~# vi mapred-site.xml
      
      <?xml version="1.0"?>
      <configuration>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
        <property>
          <name>mapreduce.jobhistory.address</name>
          <value>resource-manager:10020</value>
        </property>
        <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>resource-manager:19888</value>
        </property>
        <property>
          <name>yarn.app.mapreduce.am.staging-dir</name>
          <value>/user</value>
        </property>
      </configuration>
      
    9. Edit the yarn-site.xml file so it looks like the following:

      Notes:

      yarn.nodemanager.aux-services Specifies the shuffle service that needs to be set for MapReduce applications.
      yarn.nodemanager.aux-services.mapreduce.shuffle.class Specifies the exact name of the class for shuffle service.
      yarn.resourcemanager.hostname Specifies the ResourceManager's host name.
      yarn.nodemanager.local-dirs Is a comma-separated list of paths on the local file system where intermediate data is written.
      yarn.nodemanager.log-dirs Specifies the URIs of the directories where the NodeManager stores container log files.
      yarn.log-aggregation-enable Specifies the configuration to enable or disable log aggregation.
      yarn.nodemanager.log-dirs Specifies a comma-separated list of paths on the local file system where logs are written.

      root@global_zone:~# vi yarn-site.xml
      
      <?xml version="1.0"?>
      <configuration>
      <!-- Site specific YARN configuration properties -->
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>resource-manager</value>
        </property>
        <property>
          <name>yarn.nodemanager.local-dirs</name>
          <value>file:///var/data/1/yarn/local</value>
        </property>
        <property>
          <name>yarn.nodemanager.log-dirs</name>
          <value>file:///var/data/1/yarn/logs</value>
        </property>
        <property>
          <name>yarn.log.aggregation.enable</name>
          <value>true</value>
        </property>
        <property>
          <description>Where to aggregate logs</description>
          <name>yarn.nodemanager.remote-app-log-dir</name>
          <value>hdfs://var/log/hadoop-yarn/apps</value>
        </property>
      </configuration>
      

Configure the Network Time Protocol

We should ensure that the system clock on the Hadoop zones is synchronized by using the Network Time Protocol (NTP).

Note: It is best to select an NTP server that can be a dedicated time synchronization source so that other services are not negatively affected if the node is brought down for planned maintenance.

In the following example, the global zone is configured as an NTP server.

  1. Configure an NTP server:

    root@global_zone:~# cp /etc/inet/ntp.server /etc/inet/ntp.conf
    root@global_zone:~# chmod +w /etc/inet/ntp.conf
    root@global_zone:~# touch /var/ntp/ntp.drift
    
  2. Edit the NTP server configuration file:

    root@global_zone:~# vi /etc/inet/ntp.conf
    
    server 127.127.1.0 prefer
    broadcast 224.0.1.1 ttl 4
    enable auth monitor
    driftfile /var/ntp/ntp.drift
    statsdir /var/ntp/ntpstats/
    filegen peerstats file peerstats type day enable
    filegen loopstats file loopstats type day enable
    filegen clockstats file clockstats type day enable
    keys /etc/inet/ntp.keys
    trustedkey 0
    requestkey 0
    controlkey 0
    
  3. Enable the NTP server service:

    root@global_zone:~# svcadm enable ntp
    
  4. Verify that the NTP server is online by using the following command:

    root@global_zone:~# svcs  ntp
    STATE          STIME    FMRI
    online         15:27:55 svc:/network/ntp:default
    
    root@global_zone:~# mkdir /usr/local/Scripts
    

Create the Scripts

  1. Create the createzone script using your favorite editor, as shown in Listing 1. We will use this script to set up the Oracle Solaris Zones.

    root@global_zone:~# vi /usr/local/Scripts/createzone
    
    #!/bin/ksh
    
    # FILENAME:    createzone
    # Create a zone 
    # Usage:
    # createzone <zone name>
    if [ $# != 1 ]
    then
        echo "Usage: createzone <zone name> "
        exit 1
    fi
    
    ZONENAME=$1
    VNICNAME=$2
    
    zonecfg -z $ZONENAME > /dev/null 2>&1 << EOF
    create
    set autoboot=true
    set limitpriv=default,dtrace_proc,dtrace_user,sys_time
    set zonepath=/zones/$ZONENAME
    add fs
    set dir=/usr/local
    set special=/usr/local
    set type=lofs
    set options=[ro,nodevices]
    end
    verify
    exit
    EOF
    if [ $? == 0 ] ; then
    echo "Successfully created the $ZONENAME zone"
    else
    echo "Error: unable to create the $ZONENAME zone"
    exit 1
    fi
    

    Listing 1. createzone script

  2. Create the buildprofile script using your favorite editor, as shown in Listing 2. We will use this script to set up the Oracle Solaris Zones.

    root@global_zone:~# vi /usr/local/Scripts/buildprofile
    
    #!/bin/ksh
    #
    # Copyright 2006-2011 Oracle Corporation. All rights reserved.
    # Use is subject to license terms.
    #
    # This script serves as an example of how to instantiate several zones
    # with no administrative interaction.  Run the script with no arguments to
    # get a usage message.  The general flow of the script is:
    #
    export PATH=/usr/bin:/usr/sbin
    
    me=$(basename $0)
    
    function fail_usage {
        print -u2 "Usage:
        $me  <sysconfig.xml> 
    <zone> <ipaddr>"
        exit 2
    }
    
    function error {
        print -u2 "$me: ERROR: $@"
    }
    
    # Parse and check arguments
    #
    (( $# != 3 )) && fail_usage
    
    # Be sure the sysconfig profile is readable and ends in .xml
    sysconfig=$1
    zone=$2
    ipaddr=$3
    
    if [[ ! -f $sysconfig || ! -r $sysconfig || $sysconfig != *.xml ]] ; then
        error "sysconfig profile missing, unreadable, or not *.xml"
        fail_usage
    fi
    #
    # Create a temporary directory for all temp files
    #
    export TMPDIR=$(mktemp -d /tmp/$me.XXXXXX)
    if [[ -z $TMPDIR ]]; then
        error "Could not create temporary directory"
        exit 1
    fi
    trap 'rm -rf $TMPDIR' EXIT
    
    
        # Customize the nodename in the sysconfig profile
        z_sysconfig=$TMPDIR/{$zone}.xml
        z_sysconfig2=$TMPDIR/{$zone}2.xml
    
        search="<propval type=\"astring\" name=\"nodename\" value=\"name-node1\"/>"
        replace="<propval type=\"astring\" name=\"nodename\" value=\"$zone\"/>"
        sed "s|$search|$replace|" $sysconfig > $z_sysconfig
    
    
        search="<propval type=\"net_address_v4\" name=\"static_address\" value=\"192.168.1.1/24\"/>"
        replace="<propval type=\"net_address_v4\" name=\"static_address\" value=\"$ipaddr\"/>"
        sed "s|$search|$replace|" $z_sysconfig > $z_sysconfig2
    
        cp $z_sysconfig2 ./$zone-template.xml
        rm -rf $TMPDIR
    
    
    exit 0
    

    Listing 2. buildprofile script

  3. Create the verifycluster script using your favorite editor, as shown in Listing 3. We will use this script to verify the Hadoop cluster setup.

    root@global_zone:~# vi /usr/local/Scripts/verifycluster
    
    #!/bin/ksh
    
    # FILENAME:   verifycluster
    # Verify the hadoop cluster configuration
    # Usage:
    # verifycluster
    
    RET=1
    
    
    for transaction in _; do
    
      for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
       do
    
       cmd="zlogin $i ls /usr/local > /dev/null 2>&1 "
       eval $cmd || break 2
    
       done
    
    
        for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
         do
           cmd="zlogin $i ping name-node1 > /dev/null 2>&1" 
           eval $cmd || break 2
         done 
    
        for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
          do 
            cmd="zlogin $i ping name-node2 > /dev/null 2>&1" 
            eval $cmd || break 2
          done 
    
    for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
          do 
            cmd="zlogin $i ping resource-manager > /dev/null 2>&1" 
            eval $cmd || break 2
          done
    
        for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
          do
           cmd="zlogin $i ping data-node1 > /dev/null 2>&1" 
           eval $cmd || break 2
          done 
    
        for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
          do
           cmd="zlogin $i ping data-node2 > /dev/null 2>&1" 
           eval $cmd || break 2
          done 
    
        for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
          do
           cmd="zlogin $i ping data-node3 > /dev/null 2>&1" 
           eval $cmd || break 2
        done 
    
    RET=0
    done
    
    if [ $RET == 0 ] ; then
    echo "The cluster is verified"
    else
    echo "Error: unable to verify the cluster"
    fi
    exit $RET
    

    Listing 3. verifycluster script

  4. Create the testssh script, as shown in Listing 4. We will use this script to verify the SSH setup.

    root@global_zone:~# vi /usr/local/Scripts/testssh. 
    
    #!/bin/ksh
    
    for i in name-node1 name-node2 resource-manager data-node1 data-node2 data-node3
       do
    
       ssh $i exit
    
       done 
    

    Listing 4. testssh script

  5. Create the startcluster script, as shown in Listing 5. We will use this script to start all the services on the Hadoop cluster.

    root@global_zone:~# vi /usr/local/Scripts/startcluster.
    
    #!/bin/ksh
    
    zlogin -l hdfs name-node1 'hadoop-daemon.sh start namenode'
    zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode'
    zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode'
    zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode'
    zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager'
    zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager'
    zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager'
    zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager'
    zlogin -l mapred  resource-manager 'mr-jobhistory-daemon.sh start historyserver'
    

    Listing 5. startcluster script

  6. Create the stopcluster script, as shown in Listing 6. We will use this script to stop all the services on the Hadoop cluster.

    root@global_zone:~# vi /usr/local/Scripts/stopcluster 
    
    #!/bin/ksh
    
    zlogin -l hdfs name-node1 'hadoop-daemon.sh stop namenode'
    zlogin -l hdfs data-node1 'hadoop-daemon.sh stop datanode'
    zlogin -l hdfs data-node2 'hadoop-daemon.sh stop datanode'
    zlogin -l hdfs data-node3 'hadoop-daemon.sh stop datanode'
    zlogin -l yarn resource-manager 'yarn-daemon.sh stop resourcemanager'
    zlogin -l yarn data-node1 'yarn-daemon.sh stop nodemanager'
    zlogin -l yarn data-node2 'yarn-daemon.sh stop nodemanager'
    zlogin -l yarn data-node3 'yarn-daemon.sh stop nodemanager'
    zlogin -l mapred  resource-manager 'mr-jobhistory-daemon.sh stop historyserver'
    

    Listing 6. stopcluster script

  7. Change the scripts' permissions:

    root@global_zone:~# chmod +x /usr/local/Scripts/*
    

Create the NameNodes, DataNodes, and ResourceManager Zones

We will leverage the integration between Oracle Solaris Zones virtualization technology and the ZFS file system that is built into Oracle Solaris.

Table 2 shows a summary of the Hadoop zones we will create:

Table 2. Zone Summary
Function Zone Name ZFS Mount Point IP Address
Active NameNode name-node1 /zones/name-node 192.168.1.1/24
Standby NameNode name-node2 /zones/sec-name-node 192.168.1.2/24
ResourceManager resource-manager /zones/resource-manager 192.168.1.3/24
DataNode data-node1 /zones/data-node1 192.168.1.4/24
DataNode data-node2 /zones/data-node2 192.168.1.5/24
DataNode data-node3 /zones/data-node3 192.168.1.6/24

  1. Create the name-node1 zone using the createzone script, which will create the zone configuration file. For the argument, the script needs the zone's name, for example, createzone <zone name>.

    root@global_zone:~# /usr/local/Scripts/createzone name-node1 
    Successfully created the name-node1 zone
    
  2. Create the name-node2 zone using the createzone script:

    root@global_zone:~# /usr/local/Scripts/createzone name-node2 
    Successfully created the name-node2 zone
    
  3. Create the resource-manager zone using the createzone script:

    root@global_zone:~# /usr/local/Scripts/createzone resource-manager
    Successfully created the resource-manager zone
    
  4. Create the three DataNode zones using the createzone scripts:

    root@global_zone:~# /usr/local/Scripts/createzone data-node1 
    Successfully created the data-node1 zone
    
    root@global_zone:~# /usr/local/Scripts/createzone data-node2 
    Successfully created the data-node2 zone
    
    root@global_zone:~# /usr/local/Scripts/createzone data-node3 
    Successfully created the data-node3 zone
    

Configure the Active NameNode

Let's create a system configuration profile template for the name-node1 zone. The system configuration profile will include the host information, such as the host name, IP address, and name services.

  1. Run the sysconfig command, which will start the System Configuration Tool (see Figure 2):

    root@global_zone:~# sysconfig create-profile  
    
    System Configuration Tool

    Figure 2. System Configuration Tool

  2. Press ESC-2 and go through the System Configuration Tool screens. Provide the zone's host information by using the following configuration for the name-node1 zone:

    1. For the host name, use name-node1.
    2. Select manual network configuration.
    3. Ensure the network interface net0 has an IP address of 192.168.1.1 and a netmask of 255.255.255.0.
    4. Ensure the name service is based on your network configuration. In this article, we will use /etc/hosts for name resolution, so we won't set up DNS for host name resolution. Select Do not configure DNS.
    5. For Alternate Name Service, select None.
    6. For Time Zone Regions, select Americas.
    7. For Time Zone Locations, select United States.
    8. For Time Zone, select Pacific Time.
    9. For Locale: Language, select English.
    10. For Locale: Territory, select United States.
    11. For Keyboard, select US-English.
    12. Enter your root password.
    13. For Support – Registration, provide your My Oracle Support credentials.
    14. For Support – Network Configuration, select an internet access method for Oracle Configuration Manager and Oracle Auto Service Request.
  3. Copy the profile to /root/name-node1-template.xml:

    root@global_zone:~# cp /system/volatile/profile/sc_profile.xml /root/name-node1-template.xml
    
  4. Now, install the name-node1 zone; later we will clone this zone in order to accelerate the creation of the other zones:

    root@global_zone:~# zoneadm -z name-node1 install -c /root/name-node1-template.xml
    The following ZFS file system(s) have been created:
        rpool/zones/name-node1
    Progress being logged to /var/log/zones/zoneadm.20140225T111519Z.name-node1.install
           Image: Preparing at /zones/name-node1/root.
    ...
    
  5. Boot the name-node1 zone:

    root@global_zone:~# zoneadm -z name-node1 boot
    
  6. Check the status of the zones we've created:

    root@global_zone:~# zoneadm list -cv
    
    ID NAME             STATUS      PATH                         BRAND      IP
       0 global           running     /                            solaris    shared
       1 name-node1       running     /zones/name-node1            solaris    excl
       - name-node2       configured  /zones/name-node2            solaris    excl
       - resource-manager configured  /zones/resource-manager      solaris    excl
       - data-node1       configured  /zones/data-node1            solaris    excl
       - data-node2       configured  /zones/data-node2            solaris    excl
       - data-node3       configured  /zones/data-node3            solaris    excl
    

    We can see the six zones that we have created.

  7. Log in to the name-node1 zone:

    root@global_zone:~# zlogin name-node1
    
  8. Wait one minute for the zone services to come up and then verify that all the services are up and running:

    root@name-node1:~# svcs -xv
    

    If all the services are up and running without any issues, the command will return to the system prompt without any error message.

  9. Developing for Hadoop requires a Java programming environment. You can install Java Development Kit (JDK) 7 using the following command:

    root@name-node1:~# pkg install --accept jdk-7
    
  10. Verify the Java installation:

    root@name-node1:~# java -version
    java version "1.7.0_45"
    Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
    Java HotSpot(TM) Server VM (build 24.45-b08, mixed mode)
    
  11. Create the hadoop group:

    root@name-node1:~# groupadd -g 200 hadoop
    
  12. For the Hadoop cluster, create the four users shown in Table 3.

    Table 3. Hadoop Users Summary
    User:Group Description
    hdfs:hadoop The NameNodes and DataNodes run as this user.
    yarn:hadoop The ResourceManager and NodeManager services run as this user.
    mapred:hadoop The History Server runs as this user.
    bob:staff This user will run the MapReduce jobs.

    1. Add the hdfs user:

      root@name-node1:~# useradd -u 100 -m -g hadoop hdfs
      
    2. Set the hdfs user's password. You can use whatever password you want, but be sure you remember the password.

      root@name-node1:~# passwd hdfs
      New Password:<enter hdfs password>
      Re-enter new Password: <re-enter hdfs password>
      passwd: password successfully changed for hdfs
      
    3. Add the yarn user:

      root@name-node1:~# useradd -u 101 -m -g hadoop yarn
      root@name-node1:~# passwd yarn
      New Password: <enter yarn password>
      Re-enter new Password: <re-enter yarn password>
      passwd: password successfully changed for yarn
      
    4. Add the mapred user:

      root@name-node1:~# useradd -u 102 -m -g hadoop mapred
      root@name-node1:~# passwd mapred
      New Password: <enter mapred password>
      Re-enter new Password: <re-enter mapred password>
      passwd: password successfully changed for mapred
      
    5. Create a directory for the YARN log files:

      root@name-node1:~# mkdir -p /var/log/hadoop/yarn
      root@name-node1:~# chown yarn:hadoop /var/log/hadoop/yarn
      
    6. Create a directory for the HDFS log files:

      root@name-node1:~# mkdir -p /var/log/hadoop/hdfs
      root@name-node1:~# chown hdfs:hadoop /var/log/hadoop/hdfs
      
    7. Create a directory for the mapred log files:

      root@name-node1:~# mkdir -p /var/log/hadoop/mapred
      root@name-node1:~# chown mapred:hadoop /var/log/hadoop/mapred
      
    8. Create a directory for the HDFS metadata:

      root@name-node1:~# mkdir -p /var/data/1/dfs/nn
      root@name-node1:~# chmod 700 /var/data/1/dfs/nn
      root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/nn
      
    9. Create a Hadoop data directory to store the HDFS blocks:

      root@name-node1:~# mkdir -p /var/data/1/dfs/dn
      root@name-node1:~# chown -R hdfs:hadoop /var/data/1/dfs/dn
      
    10. Configure local storage directories for use by YARN:

      root@name-node1:~# mkdir -p /var/data/1/yarn/local
      root@name-node1:~# mkdir -p /var/data/1/yarn/logs
      root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/local
      root@name-node1:~# chown -R yarn:hadoop /var/data/1/yarn/logs
      
    11. Create the runtime directories:

      root@name-node1:~# mkdir -p /var/hadoop/run/yarn
      root@name-node1:~# chown yarn:hadoop /var/hadoop/run/yarn
      root@name-node1:~# mkdir -p /var/hadoop/run/hdfs
      root@name-node1:~# chown hdfs:hadoop /var/hadoop/run/hdfs
      root@name-node1:~# mkdir -p /var/hadoop/run/mapred
      root@name-node1:~# chown mapred:hadoop /var/hadoop/run/mapred
      
    12. Add the user bob (later this user will run the MapReduce jobs):

      root@name-node1:~# useradd -m -u 1000 bob
      root@name-node1:~# passwd bob
      New Password: <enter bob password>
      Re-enter new Password: <re-enter bob password>
      passwd: password successfully changed for bob
      
  13. Switch to user bob:

    root@name-node1:~# su - bob
    
  14. Using your favorite editor, append the following lines to .profile:
    bob@name-node1:~$ vi $HOME/.profile
    
    # Set JAVA_HOME
    export JAVA_HOME=/usr/java
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    
  15. Log out using the exit command:

    bob@name-node1:~$ exit
    logout
    
  16. Configure an NTP client, as shown in the following example:

    1. Install the NTP package:

      root@name-node1:~# pkg install ntp
      
    2. Create the NTP client configuration files:

      root@name-node1:~# cp /etc/inet/ntp.client /etc/inet/ntp.conf
      root@name-node1:~# chmod +w /etc/inet/ntp.conf
      root@name-node1:~# touch /var/ntp/ntp.drift
      
    3. Edit the NTP client configuration file:

      Note: In this setup, we are using the global zone as a time server so we add its name (for example, global-zone) to /etc/inet/ntp.conf.

      root@name-node1:~# vi /etc/inet/ntp.conf
      
      server global-zone prefer
      driftfile /var/ntp/ntp.drift
      statsdir /var/ntp/ntpstats/
      filegen peerstats file peerstats type day enable
      filegen loopstats file loopstats type day enable 
      
  17. Add the Hadoop cluster members' host names and IP addresses to /etc/hosts:

    root@name-node1:~# vi /etc/hosts
    
    ::1             localhost
    127.0.0.1       localhost loghost
    192.168.1.1 name-node1
    192.168.1.2 name-node2
    192.168.1.3 resource-manager
    192.168.1.4 data-node1
    192.168.1.5 data-node2
    192.168.1.6 data-node3
    192.168.1.100 global-zone
    
  18. Enable the NTP client service:

    root@name-node1:~# svcadm enable ntp
    
  19. Verify the NTP client status:

    root@name-node1:~#:~# svcs ntp
    STATE          STIME    FMRI
    online          1:04:35 svc:/network/ntp:default
    
  20. Check whether the NTP client can synchronize its clock with the NTP server:

    root@name-node1:~# ntpq -p
         remote        refid         st t when poll reach   delay   offset  jitter
    ==============================================================================
     global-zone     LOCAL(0)         6 u   19   64    1    0.374    0.119   0.000
    

Set Up SSH

Set up SSH key-based authentication for the Hadoop users on the name-node1 zone in order to enable password-less login to other zones in the Hadoop cluster:

  1. First, switch to the user hdfs, and then generate the SSH public key and copy it into the ~/.ssh/authorized_keys file:

    root@name-node1:~# su - hdfs
    Oracle Corporation      SunOS 5.11      11.1    September 2012
    hdfs@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
    hdfs@name-nod1e:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    
  2. Edit $HOME/.profile and append to the end of the file the following lines:

    hdfs@name-node1:~$ vi $HOME/.profile
    
    # Set JAVA_HOME
    export JAVA_HOME=/usr/java
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_PID_DIR=/var/hadoop/run/hdfs
    
  3. Switch to user yarn and edit $HOME/.profile to append to the end of the file the following lines:

    hdfs@name-node1:~$ su - yarn
    Password: <provide yarn password>
    Oracle Corporation      SunOS 5.11      11.1    September 2012
    yarn@name-node1:~$ vi $HOME/.profile
    
    # Set JAVA_HOME
    export JAVA_HOME=/usr/java
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_PID_DIR=/var/hadoop/run/yarn
    
  4. Generate the SSH public key and copy it into the ~/.ssh/authorized_keys file:

    yarn@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
    yarn@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    
  5. Switch to user mapred and edit $HOME/.profile to append to the end of the file the following lines:

    yarn@name-node1:~$ su - mapred
    Password: <provide mapred password>
    Oracle Corporation      SunOS 5.11      11.1    September 2012
    mapred@name-node1:~$ vi $HOME/.profile
    
    # Set JAVA_HOME
    export JAVA_HOME=/usr/java
    # Add Hadoop bin/ directory to PATH
    export PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
    export HADOOP_HOME=/usr/local/hadoop
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export MAPRED_PID_DIR=/var/hadoop/run/mapred
    
  6. Generate the SSH public key and copy it into the ~/.ssh/authorized_keys file:

    mapred@name-node1:~$ ssh-keygen -t dsa -P "" -f ~/.ssh/id_dsa
    mapred@name-node1:~$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
    

Set Up the Standby NameNode and the ResourceManager

  1. Run the following command to execute the .profile script:

    mapred@name-node1:~$ source $HOME/.profile
    
  2. Check that Hadoop runs by running the following command:

    mapred@name-node1:~$ hadoop version
    Hadoop 2.2.0
    Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
    Compiled by hortonmu on 2013-10-07T06:28Z
    Compiled with protoc 2.5.0
    From source with checksum 79e53ce7994d1628b240f09af91e1af4
    This command was run using /usr/local/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar
    

    Note: Press Ctrl-D three times until you exit from the name-node1 console and return to the global zone. You can verify that you are in the global zone by using the zonename command:

    root@global_zone:~# zonename
    global
    
  3. Create a profile for the name-node2 zone using the name-node1 profile as a template and using the buildprofile script. In a later step, we will use this profile in order to create the name-node2 zone.

    Note: For arguments, the script needs the template profile's name (/root/name-node1-template.xml, which we created in a previous step), the zone's name (name-node2), and the zone's IP address (192.168.1.2, as shown in Table 2).

    1. Change to the /root directory and create the zone profile there:

      root@global_zone:~# cd /root
      root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-template.xml name-node2 192.168.1.2/24
      
    2. Verify the profile's creation:

      root@global_zone:~# ls -l /root/name-node2-template.xml
      -rw-r--r--   1 root     root        3715 Feb 25 05:59 /root/name-node2-template.xml
      
  4. From the global zone, run the following command to create the name-node2 zone as a clone of the name-node1:

    1. Shut down the name-node1 zone (we can clone only halted zones):

      root@global_zone:~# zoneadm -z name-node1 shutdown 
      
    2. Then clone the zone using the profile we created for name-node2:

      root@global_zone:~# zoneadm -z name-node2 clone -c /root/name-node2-template.xml name-node1
      
  5. Boot the name-node2 zone:

    root@global_zone:~# zoneadm -z name-node2 boot
    
  6. Log in to the name-node2 zone:

    root@global_zone:~# zlogin name-node2
    
  7. Verify that all the services are up and running:

    root@name-node2:~# svcs -xv
    

    If all the services are up and running without any issues, the command will return to the system prompt without any error message.

  8. Exit from the name-node2 zone by pressing Ctrl- D.
  9. Create the resource-manager profile using the name-node1 profile as a template:

    root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-template.xml resource-manager 192.168.1.3/24
    
  10. Create the data-node1 profile using the name-node1 profile as a template:

    root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-template.xml data-node1 192.168.1.4/24
    
  11. Create the data-node2 profile using the name-node1 profile as a template:

    root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-template.xml data-node2 192.168.1.5/24
    
  12. Create the data-node3 profile using the name-node1 profile as a template:

    root@global_zone:~# /usr/local/Scripts/buildprofile /root/name-node1-template.xml data-node3 192.168.1.6/24
    
  13. Verify the creation of the profiles:

    root@global_zone:~# ls -l /root/*.xml
    -rw-r--r-- 1 root   root  3715 Feb 25 08:05 /root/data-node1-template.xml
    -rw-r--r-- 1 root   root  3715 Feb 25 08:05 /root/data-node2-template.xml
    -rw-r--r-- 1 root   root  3715 Feb 25 08:05 /root/data-node3-template.xml
    -r-------- 1 root   root  3715 Feb 25 03:11 /root/name-node1-template.xml
    -rw-r--r-- 1 root   root  3715 Feb 25 07:57 /root/name-node2-template.xml
    -rw-r--r-- 1 root root  3735 Feb 25 08:04 /root/resource-manager-template.xml
    
  14. From the global zone, run the following command to create the resource-manager zone as a clone of name-node1:

    root@global_zone:~# zoneadm -z resource-manager clone -c /root/resource-manager-template.xml name-node1
    
  15. Boot the resource-manager zone:

    root@global_zone:~# zoneadm -z resource-manager boot
    

Set Up the DataNode Zones

In this section, we can leverage the integration between Oracle Solaris Zones virtualization technology and the ZFS file system that is built into Oracle Solaris.

Hadoop best practice is to use a separate hard disk for each DataNode. Therefore, every DataNode zone will have its own hard disk in order to provide better I/O distribution, as shown in Figure 3.

Using a separate disk for each DataNode

Figure 3. Using a separate disk for each DataNode

  1. Get the hard disk names using the following command:

    root@global_zone:~# format < /dev/null
    Searching for disks...done
    AVAILABLE DISK SELECTIONS:
           0. c0t5000C5001CB478AFd0 <SEAGATE-ST930003SSUN300G-0868-279.40GB>
              /scsi_vhci/disk@g5000c5001cb478af
              /dev/chassis//SYS/HDD0/disk
           1. c0t5000CCA00AC5D5C8d0 <HITACHI-H103030SCSUN300G-A2A8-279.40GB>
              /scsi_vhci/disk@g5000cca00ac5d5c8
              /dev/chassis//SYS/HDD1/disk
           2. c0t5000C50014D5DED4d0 <ATA-SEAGATE ST95000N-SF02-465.76GB>
              /scsi_vhci/disk@g5000c50014d5ded4
              /dev/chassis//SYS/HDD2/disk
           3. c0t5000CCA00AC2A038d0 <SUN300G cyl 46873 alt 2 hd 20 sec 625>
              /scsi_vhci/disk@g5000cca00ac2a038
              /dev/chassis//SYS/HDD3/disk
    Specify disk (enter its number):
    
  2. Create a separate ZFS file system for each DataNode in order to provide better disk I/O performance:

    root@global_zone:~# zpool create data-node1-pool c0t5000CCA00AC5D5C8d0
    root@global_zone:~# zpool create data-node2-pool c0t5000C50014D5DED4d0
    root@global_zone:~# zpool create data-node3-pool c0t5000CCA00AC2A038d0
    
  3. HDFS is designed for write-once read-many and provides high-throughput access to application data. Tune ZFS in order to support throughput mode for data-node1:

    root@global_zone:~# zfs create -o logbias=throughput -o compression=on -o mountpoint=/zones/data-node1 data-node1-pool/data-node1
    

    where:

    • -o logbias=throughput enables throughput mode rather than the default (latency) (default).
    • -o compression=on enables compression.
    • -o mountpoint=/zones/data-node1 specifies the mount point location.
    • data-node1-pool/data-node1 is the ZFS file system name.
  4. Do the same for data-node2 and data-node3:

    root@global_zone:~# zfs create -o logbias=throughput -o compression=on -o 
    mountpoint=/zones/data-node2 data-node2-pool/data-node2
    root@global_zone:~# zfs create -o logbias=throughput -o compression=on -o 
    mountpoint=/zones/data-node3 data-node3-pool/data-node3
    
  5. Change the directories' permissions in order to avoid warring messages during zone creation:

    root@global_zone:~# chmod 700 /zones/data-node1
    root@global_zone:~# chmod 700 /zones/data-node2
    root@global_zone:~# chmod 700 /zones/data-node3
    
  6. Run the following commands to create the three DataNode zones as a clone of the name-node1 zone, and then boot the new zones:

    root@global_zone:~# zoneadm -z data-node1 clone -c /root/data-node1-template.xml name-node1
    root@global_zone:~# zoneadm -z data-node1 boot
    root@global_zone:~# zoneadm -z data-node2 clone -c /root/data-node2-template.xml name-node1
    root@global_zone:~# zoneadm -z data-node2 boot
    root@global_zone:~# zoneadm -z data-node3 clone -c /root/data-node3-template.xml name-node1
    root@global_zone:~# zoneadm -z data-node3 boot
    
  7. Boot the name-node1 zone:

    root@global_zone:~# zoneadm -z name-node1 boot
    
  8. Check the status of the zones we've created:

    root@global_zone:~# zoneadm list -cv
      ID NAME             STATUS      PATH                         BRAND      IP
       0 global           running     /                            solaris    shared
       6 name-node1       running     /zones/name-node1            solaris    excl
      10 name-node2       running     /zones/name-node2            solaris    excl
      11 resource-manager running     /zones/resource-manager      solaris    excl
      12 data-node1       running     /zones/data-node1            solaris    excl
      13 data-node2       running     /zones/data-node2            solaris    excl
      14 data-node3       running     /zones/data-node3            solaris    excl
    

    We can see that all the zones are running now.

Verify the SSH Setup

  1. Log in to the name-node1 zone:

    root@global_zone:~# zlogin name-node1
    [Connected to zone 'name-node1' pts/1]
    Oracle Corporation      SunOS 5.11      11.1    September 2012
    root@name-node1:~# su - hdfs
    Oracle Corporation      SunOS 5.11      11.1    September 2012
    
  2. Run the testssh script to log in to the cluster nodes using the ssh command:

    Note: Enter yes at the command prompt for the "Are you sure you want to continue connecting (yes/no)?" question.

    hdfs@name-node1:~$ /usr/local/Scripts/testssh
    The authenticity of host 'name-node1 (192.168.1.1)' can't be established.
    RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.
    Are you sure you want to continue connecting (yes/no)? yes
    
  3. Switch to user yarn and run the testssh script again:

    root@name-node1:~# su - yarn
    Password: <enter yarn password>
    yarn@name-node1:~$ /usr/local/Scripts/testssh
    
  4. Switch to user mapred and run the testssh script again:

    yarn@name-node1:~$ su - mapred
    Password: <enter mapred password>
    mapred@name-node1:~$ /usr/local/Scripts/testssh
    
  5. Return to the global zone and repeat similar steps for name-node2:

    1. Edit the /etc/hosts file inside name-node2 in order to add the name-node1 entry:

      root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-node1" >> /etc/hosts'
      
    2. Log in to the name-node2 zone:

      root@global_zone:~# zlogin name-node2
      [Connected to zone 'name-node1' pts/1]
      Oracle Corporation      SunOS 5.11      11.1    September 2012
      root@name-node2:~# su - hdfs
      Oracle Corporation      SunOS 5.11      11.1    September 2012
      
    3. Run the testssh script in order to log in to the cluster nodes using the ssh command.

      Note: Enter yes at the command prompt for the "Are you sure you want to continue connecting (yes/no)?" question.

      hdfs@name-node2:~$ /usr/local/Scripts/testssh
      The authenticity of host 'name-node1 (192.168.1.1)' can't be established.
      RSA key fingerprint is 07:b6:b7:40:0c:39:cd:60:32:c4:98:07:66:79:63:1c.
      Are you sure you want to continue connecting (yes/no)? yes
      
    4. Switch to user yarn:

      root@name-node2:~# su - yarn
      Password: <enter yarn password>
      
    5. Run the testssh script:

      yarn@name-node2:~$ /usr/local/Scripts/testssh
      
    6. Switch to user mapred:

      yarn@name-node2:~$ su - mapred
      Password: <enter mapred password>
      
    7. Run the testssh script:

      mapred@name-node2:~$ /usr/local/Scripts/testssh
      

Verify Name Resolution

  1. From the global zone, edit the /etc/hosts files inside resource-manager and the DataNodes in order to add the name-node1 entry:

    root@global_zone:~# zlogin name-node2 'echo "192.168.1.1 name-node1" >> /etc/hosts'
    root@global_zone:~# zlogin resource-manager 'echo "192.168.1.1 name-node1" >> /etc/hosts'
    root@global_zone:~# zlogin data-node1 'echo "192.168.1.1 name-node1" >> /etc/hosts'
    root@global_zone:~# zlogin data-node2 'echo "192.168.1.1 name-node1" >> /etc/hosts'
    root@global_zone:~# zlogin data-node3 'echo "192.168.1.1 name-node1" >> /etc/hosts'
    
  2. Verify name resolution by ensuring that the /etc/hosts files for the global zone and all the Hadoop zones have the host entries shown below:

    # vi /etc/hosts
    ::1             localhost
    127.0.0.1       localhost loghost
    192.168.1.1 name-node1
    192.168.1.2 name-node2
    192.168.1.3 resource-manager
    192.168.1.4 data-node1
    192.168.1.5 data-node2
    192.168.1.6 data-node3
    192.168.1.100 global-zone
    

    Note: If you are using the global zone as an NTP server, you must also add its host name and IP address to /etc/hosts.

  3. Verify the cluster using the verifycluster script:

    root@global_zone:~# /usr/local/Scripts/verifycluster
    

    If the cluster setup is correct, you will get a cluster is verified message.

    Note: If the verifycluster script fails with an error message, check that the /etc/hosts file in every zone includes all the zones names, as described in Step 1, and then rerun the verifycluster script again.

Format the Hadoop File System

  1. To format HDFS, run the following commands:

    root@global_zone:~# zlogin -l hdfs name-node1
    hdfs@name-node:$ hdfs namenode -format
    
  2. Look for the following message, which indicates HDFS has been set up:

    ...
    INFO common.Storage: Storage directory /var/data/1/dfs/nn has been successfully formatted.
    ...
    

Start the Hadoop Cluster

Table 4 describes the startup scripts.

Table 4. Startup Scripts
User Command Command Description
hdfs hadoop-daemon.sh start namenode Starts the HDFS daemon (NameNode process)
hdfs hadoop-daemon.sh start datanode Starts the DataNode process on all DataNodes
yarn yarn-daemon.sh start resourcemanager Starts YARN on the ResourceManager
yarn yarn-daemon.sh start nodemanager Starts the NodeManager process on all DataNodes
mapred mr-jobhistory-daemon.sh start historyserver Starts the MapReduce History Server

  1. Start HDFS by running the following command:

    hdfs@name-node1:~$ hadoop-daemon.sh start namenode
    starting namenode, logging to /var/log/hadoop/hdfs/hadoop--namenode-name-node1.out
    
  2. Run the jps command to verify that the NameNode process has been started:

    hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep NameNode
    4223 NameNode
    

    You should see the NameNode process ID (for example, 4223). If the process did not start, look at the log file /var/log/hadoop/hdfs/hadoop--namenode-name-node1.log to find the reason.

  3. Exit from the name-node1 zone by pressing Ctrl-D.
  4. Start the DataNodes on all the slaves (data-node1, data-node2, and data-node3):

    1. Run the following commands for data-node1:

      root@global_zone:~# zlogin -l hdfs data-node1
      hdfs@data-node1:~$ hadoop-daemon.sh start datanode
      hdfs@data-node1:~$ /usr/jdk/latest/bin/jps | grep DataNode
      19762 DataNode
      
    2. Exit from the data-node1 zone by pressing Ctrl-D.
    3. Run the following commands for data-node2:

      root@global_zone:~# zlogin -l hdfs data-node2
      hdfs@data-node2:~$ hadoop-daemon.sh start datanode
      hdfs@data-node2:~$ /usr/jdk/latest/bin/jps | grep DataNode
      21525 DataNode
      
    4. Exit from the data-node2 zone by pressing Ctrl-D.
    5. Run the following commands for data-node3:

      root@global_zone:~# zlogin -l hdfs data-node3
      hdfs@data-node3:~$ hadoop-daemon.sh start datanode
      hdfs@data-node3:~$ /usr/jdk/latest/bin/jps | grep DataNode
      29699 DataNode
      
    6. Exit from the data-node3 zone by pressing Ctrl-D.
  5. Create a /tmp directory and set its permissions to 1777 (drwxrwxrwt). Then create the HDFS file system using the hadoop fs command:

    root@global_zone:~# zlogin -l hdfs name-node1
    hdfs@name-node1:~$ hadoop fs -mkdir /tmp
    hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /tmp
    

    Note: You might get the warning message NativeCodeLoader: Unable to load native-hadoop library for your platform...using builtin-java classes where applicable. Hadoop is able to use native platform libraries that accelerate the Hadoop suite. These native libraries are optional; the port of the Oracle Solaris hadoop 2.x native libraries is a work in progress.

  6. Create a history directory and set permissions and ownership:

    hdfs@name-node1:~$ hadoop fs -mkdir /user
    hdfs@name-node1:~$ hadoop fs -mkdir /user/history
    hdfs@name-node1:~$ hadoop fs -chmod -R 1777 /user/history
    hdfs@name-node1:~$ hadoop fs -chown yarn /user/history
    
  7. Create the log directories:

    hdfs@name-node1:~$ hadoop fs -mkdir /var
    hdfs@name-node1:~$ hadoop fs -mkdir /var/log
    hdfs@name-node1:~$ hadoop fs -mkdir /var/log/hadoop-yarn
    hdfs@name-node1:~$ hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
    
  8. Create a directory for user bob and set ownership:

    hdfs@name-node1:~$ hadoop fs -mkdir /user/bob
    hdfs@name-node1:~$ hadoop fs -chown bob /user/bob
    
  9. Verify the HDFS file structure:

    hdfs@name-node:~$ hadoop fs -ls -R /
    drwxrwxrwt   - hdfs supergroup          0 2014-02-26 10:43 /tmp
    drwxr-xr-x   - hdfs supergroup          0 2014-02-26 10:58 /user
    drwxr-xr-x   - bob  supergroup          0 2014-02-26 10:58 /user/bob
    drwxrwxrwt   - yarn supergroup          0 2014-02-26 10:50 /user/history
    drwxr-xr-x   - hdfs supergroup          0 2014-02-26 10:53 /var
    drwxr-xr-x   - hdfs supergroup          0 2014-02-26 10:53 /var/log
    drwxr-xr-x   - yarn mapred              0 2014-02-26 10:53 /var/log/hadoop-yarn
    
  10. Exit from the name-node1 zone by pressing Ctrl-D.
  11. Start the YARN resource-manager service using the following commands:

    root@global_zone:~# zlogin -l yarn resource-manager
    yarn@resource-manager:~$ yarn-daemon.sh start resourcemanager 
    yarn@resource-manager:~$ /usr/jdk/latest/bin/jps | grep ResourceManager
    29776 ResourceManager
    
  12. Start the NodeManager process on all DataNodes and verify the status:

    root@global_zone:~# zlogin -l yarn data-node1 yarn-daemon.sh start nodemanager
    root@global_zone:~# zlogin -l yarn data-node1 /usr/jdk/latest/bin/jps | grep NodeManager 
    29920 NodeManager
    root@global_zone:~# zlogin -l yarn data-node2 yarn-daemon.sh start nodemanager
    root@global_zone:~# zlogin -l yarn data-node2 /usr/jdk/latest/bin/jps | grep NodeManager 
    29930 NodeManager
    root@global_zone:~# zlogin -l yarn data-node3 yarn-daemon.sh start nodemanager
    root@global_zone:~# zlogin -l yarn data-node3 /usr/jdk/latest/bin/jps | grep NodeManager 
    29982 NodeManager
    
  13. Start the MapReduce History Server and verify its status:

    root@global_zone:~# zlogin -l mapred resource-manager
    mapred@history-server:~$ mr-jobhistory-daemon.sh start historyserver
    mapred@history-server:~$ /usr/jdk/latest/bin/jps | grep JobHistoryServer
    654 JobHistoryServer
    
  14. Log in to name-node1:

    root@global_zone:~# zlogin -l hdfs name-node1
    
  15. Use the following command to show basic HDFS statistics for the cluster:

    hdfs@name-node1:~$ hdfs dfsadmin -report 
    
    13/11/26 05:16:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Configured Capacity: 1077762507264 (1003.74 GB)
    Present Capacity: 1075847407736 (1001.96 GB)
    DFS Remaining: 1075845337088 (1001.96 GB)
    DFS Used: 2070648 (1.97 MB)
    DFS Used%: 0.00%
    Under replicated blocks: 4
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    
    -------------------------------------------------
    Datanodes available: 3 (3 total, 0 dead)
    
  16. Use the following command to show the cluster topology:

    hdfs@name-node1:~$ hdfs dfsadmin -printTopology 
    
    13/11/26 05:19:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Rack: /default-rack
       10.153.111.222:50010 (data-node1)
       10.153.111.223:50010 (data-node2)
       10.153.111.224:50010 (data-node3)
    
  17. Run a simple MapReduce job:

    root@global_zone:~# zlogin -l bob name-node1 hadoop jar 
    /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 10 20
    

    where:

    • zlogin -l bob name-node1 specifies that the command be run as user bob on the name-node1 zone.
    • hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi specifies the Hadoop .jar file.
    • 10 specifies the number of maps.
    • 20 specifies the number of samples.

About Hadoop High Availability

In earlier Hadoop releases, the NameNode was a single point of failure (SPOF) in a Hadoop cluster. Each cluster had a single NameNode, and the secondary NameNode could not provide failover capability. Therefore, if the NameNode became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate node.

The Hadoop HA feature allows running two NameNodes in the same cluster in an active/passive configuration. These are referred to as the Active NameNode and the Standby NameNode. Unlike a traditional secondary NameNode, a Standby NameNode has hot standby capability, allowing fast failover to a new NameNode in case of an unplanned or planned outage.

Note: The maximum number NameNodes in an HA configuration is two.

The Active NameNode is responsible for all client operations in the cluster, while the Standby NameNode simply acts as a slave, maintaining enough state to provide fast failover if necessary.

In order for the Standby NameNode to keep its state synchronized with the Active NameNode in an HA implementation, both nodes communicate with separate daemons called JournalNodes.

The JournalNodes provide the capability to store HDFS edits at several locations, and they use a distributed protocol to ensure that these locations stay correctly synchronized.

When the Active NameNode performs a namespace modification, it logs a record of the modification to a majority of the JournalNodes. The Standby NameNode constantly monitors any changes to the edit log. Once the Standby Node observers the edits, it applies them to its own namespace.

Note: In an HA cluster, since the Standby NameNode also performs checkpoints of the namespace state, it is not necessary to run a secondary NameNode.

In the event of a failover, the Standby NameNode ensures it has read all of the edits from the JournalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.

To provide fast failover, the Standby NameNode needs to have up-to-date information regarding the location of HDFS blocks in the cluster. Therefore, all DataNodes are configured with the location of both NameNodes, and they send block location information and heartbeats to both NameNodes.

To ensure that only one NameNode is able to write to the JournalNodes, JournalNodes allow only a single NameNode to be a writer at any given time.

During a failover, the NameNode that is to become active simply takes over the role of writing to the JournalNodes, which effectively prevents the other NameNode from continuing in the Active state and allows the new Active NameNode to safely proceed with failover.

For better manageability, each NameNode has a logical name, as shown in Table 5.

Table 5. NameNodes' Logical Names
Logical Name Zone Name in Our Example Cluster
NN1 name-node1
NN2 name-node2

Note: NN1 is the original NameNode (name-node1) in our non-HA cluster.

Requirements for Quorum-Based Storage

In order to deploy an HA cluster using quorum-based storage, the following requirements must be met:

  • The zones on which you run the Active and Standby NameNodes should have hardware that is equivalent to each other and equivalent to what would be used in a non-HA cluster. In a production environment, you need two separate machines for this task.
  • The JournalNode daemon is relatively lightweight, so JournalNodes can reasonably be collocated on nodes with other Hadoop daemons, for example, NameNodes and the YARN ResourceManager.
  • There must be at least three JournalNodes daemons, since edit log modifications must be written to a majority of JournalNodes. This allows the system to tolerate the failure of a single node.

Figure 4 shows the architecture:

Diagram of the HA architecture

Figure 4. Diagram of the HA architecture

Configure Manual Failover

Note: This section describes how to configure manual failover. In this mode, the system will not automatically trigger a failover from the Active NameNode to the Standby NameNode, even if the Active NameNode has failed. The "Configure Automatic Failover" section describes how to configure and deploy automatic failover.

  1. Before starting the HA configuration, we need to stop the Hadoop cluster:

    root@global_zone:~# /usr/local/Scripts/stopcluster
    
  2. Verify that all the Hadoop services are down by using the following command:

    root@global_zone:~# ps -ef | grep java 
    

    If any services are still up, stop them using the pkill java command.

  3. Add the following lines to hdfs-site.xml:

    Notes:

    dfs.nameservices The logical name for this new name service
    dfs.ha.namenodes.mycluster A unique identifier for each NameNode in the name service
    dfs.namenode.rpc-address.mycluster.nn1 The fully qualified RPC address for the Active NameNode to listen on
    dfs.namenode.rpc-address.mycluster.nn2 The fully qualified RPC address for the Standby NameNode to listen on
    dfs.namenode.http-address.mycluster.nn1 The fully qualified HTTP address for the Active NameNode to listen on
    dfs.namenode.http-address.mycluster.nn2 The fully qualified HTTP address for the Standby NameNode to listen on
    dfs.namenode.shared.edits.dir The URI that identifies the group of Journal Nodes where the NameNodes will write/read edits
    dfs.client.failover.proxy.provider.mycluster The Java class that HDFS clients use to contact the Active NameNode
    dfs.ha.fencing.methods A list of scripts or Java classes that will be used to fence the Active NameNode during a failover
    dfs.ha.fencing.ssh.private-key-files SSH private key files for password-less authentication between the NameNodes

    root@global_zone:~# vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
    
    <property>
      <name>dfs.nameservices</name>
      <value>mycluster</value>
    </property>
    <property>
      <name>dfs.ha.namenodes.mycluster</name>
      <value>nn1,nn2</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn1</name>
      <value>name-node1:8020</value>
    </property>
    <property>
      <name>dfs.namenode.rpc-address.mycluster.nn2</name>
      <value>name-node2:8020</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn1</name>
      <value>name-node1:50070</value>
    </property>
    <property>
      <name>dfs.namenode.http-address.mycluster.nn2</name>
      <value>name-node2:50070</value>
    </property>
    <property>
      <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://name-node1:8485;name-node2:8485;resource-manager:8485/mycluster</value>
    </property>
    <property>
      <name>dfs.client.failover.proxy.provider.mycluster</name>
      <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
      <name>dfs.ha.fencing.methods</name>
      <value>sshfence</value>
    </property>
    <property>
      <name>dfs.ha.fencing.ssh.private-key-files</name>
      <value>/export/home/hdfs/.ssh/id_dsa</value>
    </property>
    
  4. Configure the fs.defaultFS property in the core-site.xml file:

    Notes:

    fs.defaultFS The default path prefix used by the Hadoop HDFS clients
    dfs.journalnode.edits.dir The path where the JournalNode daemon will store its local state

    root@global_zone:~# vi /usr/local/hadoop/etc/hadoop/core-site.xml
    
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://mycluster</value>
    </property>
    <property>
      <name>dfs.journalnode.edits.dir</name>
      <value>/var/data/1/dfs/jn</value>
    </property>
    
  5. Create the JournalNode directories:

    root@global_zone:~# zlogin name-node1 mkdir -p /var/data/1/dfs/jn
    root@global_zone:~# zlogin name-node2 mkdir -p /var/data/1/dfs/jn
    root@global_zone:~# zlogin resource-manager mkdir -p /var/data/1/dfs/jn
    
  6. Change the directories' ownership:

    root@global_zone:~# zlogin name-node1 chown -R hdfs:hadoop /var/data/1/dfs/jn
    root@global_zone:~# zlogin name-node2 chown -R hdfs:hadoop /var/data/1/dfs/jn
    root@global_zone:~# zlogin resource-manager chown -R hdfs:hadoop /var/data/1/dfs/jn
    
  7. Start the JournalNodes:

    1. In the NN1 (name-node1) zone, execute the following command to start the JournalNode daemon:

      root@global_zone:~# zlogin -l hdfs name-node1
      hdfs@name-node1:$ hadoop-daemon.sh start journalnode
      
    2. Verify that the JournalNode process has been started by using the jps command:

      hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep JournalNode
      10475 JournalNode
      
    3. In the NN2 (name-node2) zone, execute the following command to start the JournalNode daemon:

      root@global_zone:~# zlogin -l hdfs name-node2
      hdfs@name-node2:$ hadoop-daemon.sh start journalnode
      
    4. Verify that the JournalNode process has been started by using the jps command:

      hdfs@name-node2:~$ /usr/jdk/latest/bin/jps | grep JournalNode
      10521 JournalNode
      
    5. In the resource-manager zone, execute the following command to start the JournalNode daemon:

      root@global_zone:~# zlogin -l hdfs resource-manager hadoop-daemon.sh start journalnode
      
    6. Verify that the JournalNode process has been started:

      root@global_zone:~# zlogin -l hdfs resource-manager /usr/jdk/latest/bin/jps | grep JournalNode
      10569 JournalNode
      
  8. Execute the following commands on the NN1 node:

    root@global_zone:~# zlogin -l hdfs name-node1
    hdfs@name-node1:~$ hdfs namenode -initializeSharedEdits
    

    The second command performs the following tasks:

    • Formats all the JournalNodes
    • Copies all the edit data after the most recent checkpoint from the edit directory of NN1 to the edit directories of the JournalNodes
  9. Start the NameNode process on NN1:

    hdfs@name-node1:~$ hadoop-daemon.sh start namenode
    
  10. Ensure that NN1 is running correctly:

    hdfs@name-node1:~$ /usr/jdk/latest/bin/jps | grep NameNode
    10658 NameNode
    
  11. Initialize and format NN2 and copy the latest checkpoint (FSImage) from NN1 to NN2 by executing the following commands:

    root@global_zone:~# zlogin -l hdfs name-node2
    hdfs@name-node2:~$ hdfs namenode -bootstrapStandby
    INFO util.ExitUtil: Exiting with status 0
    
  12. Start NN2 by executing the following command on the NN2 node:

    hdfs@name-node2:~$ hadoop-daemon.sh start namenode
    
  13. Ensure that NN2 is running correctly:

    hdfs@name-node2:~$ /usr/jdk/latest/bin/jps | grep NameNode
    10658 NameNode
    
  14. Initially, both NN1 and NN2 are in Standby state. Use the following command to query the state of NN1:

    hdfs@name-node1:~$ hdfs haadmin -getServiceState nn1
    Standby
    
  15. Transition NN1 to Active state:

    hdfs@name-node1:~$ hdfs haadmin -failover --forceactive nn2 nn1
    Failover from nn2 to nn1 successful
    
  16. Verify the status of NN1:

    hdfs@name-node1:~$ hdfs haadmin -getServiceState nn1
    active
    

    You can see that it is in Active state now.

  17. Start the rest of the cluster:

    root@global_zone:~# zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager'
    root@global_zone:~# zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager'
    root@global_zone:~# zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager'
    root@global_zone:~# zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager'
    

About Apache ZooKeeper and Automatic Failover

The previous section described how to configure manual failover. In that mode, the system will not automatically trigger a failover from the Active NameNode to the Standby NameNode, even if the Active NameNode has failed. This section and the "Configure Automatic Failover" section describe how use automatic failover.

Apache ZooKeeper is a highly available and distributed synchronization service. ZooKeeper nodes store their data in a hierarchical namespace. In addition, it monitors clients for failure and notifies clients when data changes accrue.

Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum and the ZKFailoverController process (abbreviated as ZKFC).

The implementation of automatic HDFS failover relies on ZooKeeper for the following activities:

  • NameNode Failure detection. Each NameNode in the cluster maintains a persistent session in ZooKeeper. If a NameNode goes down, its ZooKeeper session expires and the other NameNode is notified that a failover should be triggered.
  • Active NameNode election. ZooKeeper provides mechanism to elect the Active NameNode. If the current Active NameNode crashes, another node can take a special exclusive lock in ZooKeeper indicating that it should become the next Active NameNode.

The ZKFailoverController (ZKFC) is a ZooKeeper client that monitors and manages the state of the NameNodes. Each system that runs a NameNode also runs ZKFC. ZKFC is responsible for the following activities:

  • Health monitoring. ZKFC pings its local NameNode on a periodic basis using a health-check command. If the NameNode responds with a healthy status, ZKFC considers the node healthy. If the node doesn't respond in a timely manner because it has crashed or is frozen, ZKFC marks it as unhealthy.
  • ZooKeeper session management. When the local NameNode is healthy, ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special lock; if the session expires, the lock will be automatically deleted.
  • ZooKeeper-based election. If the local NameNode is healthy and ZKFC sees that no other node currently holds the lock, ZKFC will try to acquire the lock. If it succeeds, then it has "won the election," and it is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described earlier: first, the previously active NameNode is fenced, if necessary, and then the local NameNode transitions to the active state.

Requirements for ZooKeeper

Figure 5 shows a typical ZooKeeper deployment.

Diagram of a typical ZooKeeper deployment

Figure 5. Diagram of a typical ZooKeeper deployment

For the election process, in a typical deployment ZooKeeper daemons are configured to run on and odd number of nodes (three or five), which is known as a ZooKeeper ensemble.

ZooKeeper has light resource requirements; it is acceptable to deploy ZooKeeper on the NameNodes and the YARN ResourceManager.

Note: In production environments, best practice is to configure the ZooKeeper nodes to store their data on separate disk drives from the HDFS metadata for best performance and isolation.

Configure Automatic Failover

  1. Before you begin configuring automatic failover, you should shut down your cluster. It is not currently possible to transition from a manual failover setup to an automatic failover setup while the cluster is running.

    1. Shut down the cluster:

      root@global_zone:~# /usr/local/Scripts/stopcluster
      
    2. Verify that all the Hadoop services are down by using the following command:

      root@global_zone:~# ps -ef | grep java 
      
    3. If any services are still up, stop them using the pkill java command.
  2. Set up a ZooKeeper ensemble:

    1. Download a recent, stable ZooKeeper release from one of the Apache download mirrors and copy it to /usr/local:

      root@global_zone:~# cp /tmp/zookeeper-3.4.6.tar.gz /usr/local
      
    2. Unpack the tarball:

      root@global_zone:~# cd /usr/local
      root@global_zone:~# tar -xvfz /usr/local/zookeeper-3.4.6.tar.gz 
      
    3. Create a symlink for the ZooKeeper binaries:

      root@global_zone:~# ln -s /usr/local/zookeeper-3.4.6 /usr/local/zookeeper
      
    4. Edit the zoo.cfg file so it looks like the following:

      Notes:

      tickTime The number of milliseconds for each tick
      initLimit The number of ticks that the initial synchronization phase can take
      syncLimit The number of ticks that can pass between sending a request and getting an acknowledgment
      dataDir The directory where the snapshot is stored
      clientPort The port at which the clients will connect
      server.1
      server.2
      server.3
      The ZooKeeper ensemble host names

      root@global_zone:~# vi /usr/local/zookeeper/conf/zoo.cfg
      
      tickTime=2000
      initLimit=10
      syncLimit=5
      dataDir=/var/zookeeper/data
      clientPort=2181
      
      server.1=name-node1:2888:3888
      server.2=name-node2:2888:3888
      server.3=resource-manager:2888:3888
      
  3. For simplicity, the hdfs user will run the ZooKeeper server and client (ZKFC) processes.

    Note: You can create a dedicated user for the ZooKeeper process.

    1. In the name-node1 zone, edit the hdfs user's $HOME/.profile file and append to the end of the file the following lines:

      root@global_zone:~# zlogin -l hdfs name-node1
      hdfs@name-node1:~$ vi .profile
      
      export ZOOKEEPER_USER=hdfs
      export ZOO_LOG_DIR=/var/log/zookeeper
      export ZOO_PID_DIR=/var/zookeeper/run
      export ZOOPIDFILE=$ZOO_PID_DIR/zookeeper_server.pid
      export ZOO_DATADIR=/var/zookeeper/data
      export ZOOCFG=/usr/local/zookeeper/conf
      export HADOOP_GROUP=hadoop
      
    2. In the name-node2 zone, edit the hdfs user's $HOME/.profile file and append to the end of the file the following lines:

      root@global_zone:~# zlogin -l hdfs name-node2
      hdfs@name-node2:~$ vi .profile
      
      export ZOOKEEPER_USER=hdfs
      export ZOO_LOG_DIR=/var/log/zookeeper
      export ZOO_PID_DIR=/var/zookeeper/run
      export ZOOPIDFILE=$ZOO_PID_DIR/zookeeper_server.pid
      export ZOO_DATADIR=/var/zookeeper/data
      export ZOOCFG=/usr/local/zookeeper/conf
      export HADOOP_GROUP=hadoop
      
    3. In the resource-manager zone, edit the hdfs user's $HOME/.profile file and append to the end of the file the following lines:

      root@global_zone:~# zlogin -l hdfs resource-manager
      hdfs@resource-manager:~$ vi .profile
      
      export ZOOKEEPER_USER=hdfs
      export ZOO_LOG_DIR=/var/log/zookeeper
      export ZOO_PID_DIR=/var/zookeeper/run
      export ZOOPIDFILE=$ZOO_PID_DIR/zookeeper_server.pid
      export ZOO_DATADIR=/var/zookeeper/data
      export ZOOCFG=/usr/local/zookeeper/conf
      export HADOOP_GROUP=hadoop
      
  4. Create the /var/zookeeper/run directories. ZooKeeper will use them to store daemon information, such as process IDs.

    root@global_zone:~# zlogin name-node1 mkdir -p /var/zookeeper/run
    root@global_zone:~# zlogin name-node2 mkdir -p /var/zookeeper/run
    root@global_zone:~# zlogin resource-manager mkdir -p /var/zookeeper/run
    
  5. Change the directories' ownership:

    root@global_zone:~# zlogin name-node1 chown -R hdfs:hadoop /var/zookeeper/run
    root@global_zone:~# zlogin name-node2 chown -R hdfs:hadoop /var/zookeeper/run
    root@global_zone:~# zlogin resource-manager chown -R hdfs:hadoop /var/zookeeper/run
    
  6. Create the /var/zookeeper/data directories. ZooKeeper will use them to store its data.

    root@global_zone:~# zlogin name-node1 mkdir -p /var/zookeeper/data
    root@global_zone:~# zlogin name-node2 mkdir -p /var/zookeeper/data
    root@global_zone:~# zlogin resource-manager mkdir -p /var/zookeeper/data
    
  7. Change the directories' ownership:

    root@global_zone:~# zlogin name-node1 chown -R hdfs:hadoop /var/zookeeper/data
    root@global_zone:~# zlogin name-node2 chown -R hdfs:hadoop /var/zookeeper/data
    root@global_zone:~# zlogin resource-manager chown -R hdfs:hadoop /var/zookeeper/data
    
  8. Create the /var/log/zookeeper directories. ZooKeeper will use them to store its logs.

    root@global_zone:~# zlogin name-node1 mkdir -p /var/log/zookeeper
    root@global_zone:~# zlogin name-node2 mkdir -p /var/log/zookeeper
    root@global_zone:~# zlogin resource-manager mkdir -p /var/log/zookeeper
    
  9. Change the directories' ownership:

    root@global_zone:~# zlogin name-node1 chown -R hdfs:hadoop /var/log/zookeeper
    root@global_zone:~# zlogin name-node2 chown -R hdfs:hadoop /var/log/zookeeper
    root@global_zone:~# zlogin resource-manager chown -R hdfs:hadoop /var/log/zookeeper
    
  10. Ensure each ZooKeeper node has a unique value in its myid file:

    1. On the first zone, enter the value 1:

      root@global_zone:~# zlogin -l hdfs name-node1 'echo "1" > /var/zookeeper/data/myid'
      
    2. On the second zone, enter the value 2:

      root@global_zone:~# zlogin -l hdfs name-node2 'echo "2" > /var/zookeeper/data/myid'
      
    3. On the third zone, enter the value 3:

      root@global_zone:~# zlogin -l hdfs resource-manager 'echo "3" > /var/zookeeper/data/myid'
      
  11. Start the ZooKeeper server on the name-node1, name-node2, and resource-manager zones:

    1. Start the server on name-node1:

      root@global_zone:~# zlogin -l hdfs name-node1  
      hdfs@name-node1:~$ /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zoo.cfg
      
      JMX enabled by default
      Using config:
      Starting zookeeper ... STARTED
      
    2. Verify that the service is up and running by using the following command:

      hdfs@name-node1:~$ ps -ef | grep zoo | grep -v grep
          hdfs 28530 26390   0 04:28:03 pts/2       0:01 /usr/java/bin/java 
      -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.root.logg
      

      You can see the ZooKeeper server's process ID (28530).

    3. Start the server on name-node2:

      root@global_zone:~# zlogin -l hdfs  name-node2
      hdfs@name-node2:~$ /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zoo.cfg
      
    4. Start the server on resource-manager:

      root@global_zone:~# zlogin -l hdfs  resource-manager
      hdfs@resource-manager:~$ /usr/local/zookeeper/bin/zkServer.sh start /usr/local/zookeeper/conf/zoo.cfg
      
  12. Edit the core-site.xml file and append the following lines:

    Note: ha.zookeeper.quorum lists the host-port pairs running the ZooKeeper service.

    root@global_zone:~# vi /usr/local/hadoop/etc/hadoop/core-site.xml
    
    <property>
       <name>ha.zookeeper.quorum</name>
       <value>name-node1:2181,name-node2:2181,resource-manager:2181</value>
    </property>
    
  13. Edit the hdfs-site.xml file and append the following lines:

    Note: dfs.ha.automatic-failover.enabled specifies that the cluster should be set up for automatic failover.

    root@global_zone:~# vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml
    
    <property>
      <name>dfs.ha.automatic-failover.enabled</name>
      <value>true</value>
    </property>
    
  14. Initialize the HA state in ZooKeeper by executing the following command on the NameNode hosts:

    root@global_zone:~# zlogin -l hdfs name-node1  
    hdfs@name-node1:~$ hdfs zkfc -formatZK
    root@global_zone:~# zlogin -l hdfs name-node2  
    hdfs@name-node2:~$ hdfs zkfc -formatZK
    
    The configured parent znode /hadoop-ha/mycluster already exists.
    Are you sure you want to clear all failover information from
    ZooKeeper?
    WARNING: Before proceeding, ensure that all HDFS services and
    failover controllers are stopped!
    ===============================================
    Proceed formatting /hadoop-ha/mycluster? (Y or N) Y
    
  15. Start the JournalNodes, NameNodes, and DataNodes using the following commands:

    root@global_zone:~# zlogin -l hdfs name-node1 hadoop-daemon.sh start journalnode
    root@global_zone:~# zlogin -l hdfs name-node2 hadoop-daemon.sh start journalnode
    root@global_zone:~# zlogin -l hdfs resource-manager hadoop-daemon.sh start journalnode
    root@global_zone:~# zlogin -l hdfs name-node1 'hadoop-daemon.sh start namenode'
    root@global_zone:~# zlogin -l hdfs name-node2 'hadoop-daemon.sh start namenode'
    root@global_zone:~# zlogin -l hdfs data-node1 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l hdfs data-node2 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l hdfs data-node3 'hadoop-daemon.sh start datanode'
    root@global_zone:~# zlogin -l yarn resource-manager 'yarn-daemon.sh start resourcemanager'
    root@global_zone:~# zlogin -l yarn data-node1 'yarn-daemon.sh start nodemanager'
    root@global_zone:~# zlogin -l yarn data-node2 'yarn-daemon.sh start nodemanager'
    root@global_zone:~# zlogin -l yarn data-node3 'yarn-daemon.sh start nodemanager'
    root@global_zone:~# zlogin -l mapred  resource-manager 'mr-jobhistory-daemon.sh start historyserver'
    
  16. Start the ZKFC daemon on each of the NameNode host nodes using the following commands:

    root@global_zone:~# zlogin -l hdfs name-node1 hadoop-daemon.sh start zkfc
    root@global_zone:~# zlogin -l hdfs name-node2 hadoop-daemon.sh start zkfc
    

    The sequence of starting ZKFC determines which NameNode will become active. For example, if ZKFC is started on NN1 first, NN1 will become active.

  17. Verify the Hadoop HA automatic failover:

    1. Locate the Active NameNode:

      hdfs@name-node2:~$ hdfs haadmin -getServiceState nn1
      active
      
    2. Cause a failure on the Active NameNode by killing the NameNode daemon on NN1. To do this, first get the NameNode process ID and then use the kill command:

      hdfs@name-node1:~$ ps -ef | grep namenode | grep -v grep
          hdfs 16571 26390   0 03:58:17 ?           0:33 /usr/java/bin/java -Dproc_namenode -Xmx1000m 
      -Djava.net.preferIPv4Stack=true -D
      hdfs@name-node2:~$ kill -9 $PID_of_Active_NameNode
      
    3. Verify that NN2 is the Active NameNode now:

      hdfs@name-node2:~$ hdfs haadmin -getServiceState nn2
      active
      

      You can see that the automatic failover is working, because NN2 is the Active NameNode now.

  18. Start the NameNode process again on NN1:

    hdfs@name-node1:~$ hadoop-daemon.sh start namenode
    
  19. Verify its status:

    hdfs@name-node1:~$ hdfs haadmin -getServiceState nn1
    Standby
    

Create a "Cloud in a Box" Using Unified Archive

New in Oracle Solaris 11. 2 is the Unified Archive feature. A Unified Archive contains a point-in-time archive of a host system, optionally including individual archives of select zones that are installed. Users can select installed zones for inclusion or exclusion, as well as explicitly indicate by name the exclusion of ZFS data sets. Once created, a Unified Archive can then be deployed as a means of system recovery or system cloning.

The Oracle Solaris Automated Installer is able to wholly or partially deploy a Unified Archive as an Automated Installer deployment. The Unified Archive software also provides the ability to create system installation media from a Unified Archive.

You can create an archive that includes the global zone and the non-global zones in addition to Oracle Solaris Kernel zones.

Let's create an archive of the Hadoop cluster, including all the zones, by using the archiveadm command. Using this tool, we can build a Hadoop cluster in a box, as shown in Figure 6. Later, we can use this archive to rapidly set up a new cluster, since the archive contains the entire environment—the global zone, the non-global zones, and the Hadoop software and configuration files.

Deploying the Unified Archive

Figure 6. Deploying the Unified Archive

  1. Run the following command to create the archive:

    root@global_zone:~# archiveadm create /root/hadoop-archive.uar
    Initializing Unified Archive creation resources...
    Unified Archive initialized: /root/hadoop-archive.uar
    

    Note: This step can take up to one hour to complete, based on the number of zones you back up and how much data each zone has.

  2. Verify the creation of the archive:

    root@global_zone:~# ls -lh /root/hadoop-archive.uar
    -rw-r--r--   1 root     root       10.0G Mar  2 02:18 /root/hadoop-archive.uar
    
  3. Use the archiveadm utility's info subcommand to retrieve information about the created Unified Archive. Verbose information can be requested using the -v option.

    root@global_zone:~# archiveadm info -v /root/hadoop-archive.uar
    
  4. (Optional) Now let's create an archive excluding the name-node1, name-node2, and resource-manager zones. This is useful if you want to create a Hadoop archive that will be used only for DataNodes.

    root@global_zone:~# archiveadm create -Z name-node1,name-node2,resource-manager /root/hadoop-archive.uar
    

Deploy a Bare-Metal System from a Unified Archive

You can use the Automated Installer server to deploy a bare-metal system from a Unified Archive. A new manifest example for deployment from a Unified Archive can be found in /usr/share/auto_install/manifest/default_archive.xml.

A software node of type "ARCHIVE" can be placed in a manifest to point an Automated Installer deployment at a Unified Archive:

root@global_zone:~# cat /usr/share/auto_install/manifest/default_archive.xml
...
     <software type="ARCHIVE">
     <source>
       <!-- Specify the location of the archive via file path or HTTP/HTTPS URL. -->
       <!-- CHANGE THIS LINE TO THE LOCATION OF YOUR ARCHIVE -->
       <file uri="/net/server.domain/system_images/hadoop-archive.uar "/>
     </source>
     <software_data action="install">
       <!-- Specify the name of the system from within the archive by its zone name. -->
       <!-- CHANGE THIS LINE TO THE NAME OF THE ZONE TO DEPLOY AS THE HOST GLOBAL ZONE -->
       <name>global</name>
     </software_data>
   </software>
...

Another option is to create bootable media from the archive that we created in the previous section. The bootable media can be either a USB or ISO image.

For example, use the following command to create a USB image from the archive:

root@global_zone:~# archiveadm create-media -f usb /root/hadoop-archive.uar

Conclusion

In this article, we saw how we can leverage Oracle Solaris Zones, ZFS, and the new Unified Archive feature of Oracle Solaris 11.2 to build a multi-node high availability Hadoop cluster.

See Also

Also see these additional publications by this author:

And here are additional Oracle Solaris 11 resources:

About the Author

Orgad Kimchi is a principal software engineer on the ISV Engineering team at Oracle (formerly Sun Microsystems). For six years he has specialized in virtualization, big data, and cloud computing technologies.

Revision 1.0, 05/13/2014
Revision 2.0, 06/19/2014
- Added important note at beginning of the "Download and Install Hadoop" section.
- Changed Step 6 and Step 10h of the "Download and Install Hadoop" section.
- Changed Step 2, Step 11, and Step 12k of the "Configure the Active NameNode" section.
- Added new Step 13 and Step 15 (and renumbered subsequent steps) in the "Configure the Active NameNode" section.
- Changed all the steps in the "Set Up SSH" section.
- Moved the note from Step 16 to Step 5 of the "Start the Hadoop Cluster" section and also changed Step 12.
- Added new Step 2 (and renumbered subsequent steps) in the "Configure Manual Failover" section. Also changed Step 14.
- Added new Step 1 (and renumbered subsequent steps) in the "Configure Automatic Failover" section. Also changed Step 3a through 3c, Step 4, Step 5, and Step 12.

Follow us:
Blog | Facebook | Twitter | YouTube