Wednesday, June 17, 2009

HBase setup (0.19.3)

Before you begin:

Before you start configure HBase, you need to have a running Hadoop cluster, which will be the storage for hbase. Please refere to Hadoop cluster setup document before continuing.

On the HBaseMaster (master) machine:

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

    192.168.1.9    hbase-masterserver
    192.168.1.8    hbase-regionserver1
    192.168.1.7    hbase-regionserver2
    192.168.1.6    hadoop-nameserver

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hbase-namenode".

2. Configure password less login from masterserver to all regionserver machines. Refer to Configuring passwordless ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hbase-0.19.3.tar.gz from HBase website to some path in your computer (We'll call the hbase installation root as $HBASE_INSTALL_DIR now on).

4. Edit the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>hbase.master</name>
            <value>hbase-masterserver:60000</value>
            <description>The host and port that the HBase master runs at.
            A value of 'local' runs the master and a regionserver in
            a single process.
            </description>
        </property>

        <property>
            <name>hbase.rootdir</name>
            <value>hdfs://hadoop-nameserver:9000/hbase</value>
            <description>The directory shared by region servers.</description>
        </property>

        <property>
            <name>hbase.regionserver.class</name>
            <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value>
            <description>This configuration is required to enable indexing on
            hbase and to be able to create secondary indexes
            </description>
        </property>

        <property>
            <name>hbase.regionserver.impl</name>
            <value>
            org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
            </value>
            <description>This configuration is required to enable indexing on
            hbase and to be able to create secondary indexes
            </description>
        </property>
    </configuration>
                
    Note: Remeber to replace masterserver and regionserver machine names with real machine names here.

6. Edit $HBASE_INSTALL_DIR/conf/regionservers and add the namenode machine

    hbase-regionserver1
    hbase-regionserver2
    hbase-masterserver

    Note: Add masterserver machine name only if you are running a regionserver on masterserver machine.

On HRegionServer (slave) machine:


1. In file /etc/hosts, define the ip address of the namenode machine. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1).

    192.168.1.9    bhase-masterserver

Note: Check to see if the masterserver machine ip is being resolved to actual ip not localhost ip using "ping hbase-masterserver".

2. Configure password less login from all regionserver machines to masterserver machines. Refer to Configuring passwordless ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hbase-0.19.3.tar.gz from HBase website to some path in your computer (We'll call the hadoop installation root as $HBASE_INSTALL_DIR now on).

4. Edit the file $HBASE_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <
property>
            <name>hbase.rootdir</name>
            <value>hdfs://rajeevks-lx:9000/hbase</value>
            <description>The directory shared by region servers.</description>
        </property>

        <property>
            <name>hbase.regionserver.class</name>
            <value>org.apache.hadoop.hbase.ipc.IndexedRegionInterface</value>
            <description>This configuration is required to enable indexing on
            hbase and to be able to create secondary indexes
            </description>
        </property>

        <property>
            <name>hbase.regionserver.impl</name>
            <value>
            org.apache.hadoop.hbase.regionserver.tableindexed.IndexedRegionServer
            </value>
            <description>This configuration is required to enable indexing on
            hbase and to be able to create secondary indexes.
            </description>
        </property>
    </configuration>

Start and Stop hbase daemons:

You need to start/stop the daemons only on the masterserver machine, it will start/stop the daemons in all regionserver machines. Execute the following command to start/stop the hbase.

    $HBASE_INSTALL_DIR/bin/start-hbase.sh
    or
    $HBASE_INSTALL_DIR/bin/stop-hbase.sh
Post a Comment