Wednesday, June 17, 2009

Hadoop cluster setup (0.19.1)

On the NameNode (master) machine:

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

    192.168.1.9    hadoop-namenode
    192.168.1.8    hadoop-datanode1
    192.168.1.7    hadoop-datanode2

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from namenode to all datanode machines. Refer to Configure passwordless ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://rajeevks-lx:9000</value>
            <description>This is the namenode uri</description>
        </property>

        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs-0.19.0/name</value>
        </property>

        <property>
            <name>fs.data.dir</name>
            <value>/opt/hdfs-0.19.0/data</value>
        </property>

        <property>
            <name>mapred.system.dir</name>
            <value>/system</value>
            <description>This is the path where mapred will store its data in
            HDFS.
            </description>

        </property>

        <property>
            <name>mapred.local.dir</name>
            <value>/opt/hdfs-0.19.0/mapred</value>
            <description>This is the path where mapred will store its temporary
            data in local file system
            </description>

        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created.
The default is used if replication is not specified in
            create time.

            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>
                
    Note: Remeber to replace namenode and datanode machine names with real machine names here

6. Edit $HADOOP_INSTALL_DIR/conf/masters and add the machine names where a secondary namenodes will run.

    hadoop-secondarynamenode1
    hadoop-secondarynamenode2
    
    Note: If you want to secondary name to to run on the same machine as the primary namenode, enter the machine name of primary namenode machine.

7. Edit $HADOOP_INSTALL_DIR/conf/slaves and add all the datanodes machine names.

    hadoop-namenode
    hadoop-datanode1
    hadoop-datanode2

    Note: If you are running a datanode on the namenode machine, add that also.

On DataNode (slave) machine:

1. In file /etc/hosts, define the ip address of the namenode machine. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1).

    192.168.1.9    hadoop-namenode

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from all datanode machines to namenode machines. Refer to Configuring password less ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://rajeevks-lx:9000</value>
            <description>This is the namenode uri</description>
        </property>

        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs-0.19.0/name</value>
        </property>

        <property>
            <name>fs.data.dir</name>
            <value>/opt/hdfs-0.19.0/data</value>
        </property>

        <property>
            <name>mapred.system.dir</name>
            <value>/system</value>
            <description>This is the path where mapred will store its data in
            HDFS
.
            </description>
            </property>

        <property>
            <name>mapred.local.dir</name>
            <value>/opt/hdfs-0.19.0/mapred</value>
            <description>This is the path where mapred will store its temporary
            data in local file system</description>
        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created. The default is used if replication is not specified in
            create
time.
            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>

Start and Stop hadoop daemons:

1. Before you start the Hadoop daemons, you need to format the filesystem. Execute the following command to format the file system.

    $HADOOP_INSTALL_DIR/bin/hadoop namenode -format

2. You need to start/stop the daemons only on the master machine, it will start/stop the daemons in all slave machines.

    To start/stop all the daemons execute the following command.

    $HADOOP_INSTALL_DIR/bin/start-all.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-all.sh

    To start only the dfs daemons execute the following command

    $HADOOP_INSTALL_DIR/bin/start-dfs.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-dfs.sh

7 comments :

sid said...

hey i just want to install it in my personal computer. do i need to change the ip address in the beginning.

Yogesh said...

Hello Rajiv, Yogesh here

I have windows 7 as O.S and installed cygwin n Hadoop(1.0.0),

I am trying to create Single-Node-Setup, I have set the JAVA_HOME(throug cygwin [export JAVA_HOME='C:\Program Files\Java\jdk1.6.0_20'] and also on Environment Variables ) but when I start the bin/hadoop start-all.sh then it starts only Namenode and Jobtracker and for all nodes it shows error i.e JAVA_HOME is not set.

Please help me.

vinay said...

Hi, Do i need to install hadoop on client machine also(besides master and slaves machines). If yes, How do i configure the files in client hadoop and how to i copy the data from clients local disk to HDFS? Thanks!

what_u_r said...

please provide some screenshot ,we get understand fast ... cheer ..stay tune

Shaikziya Pasha said...

Hi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good continuationAll the best for all your blogging efforts.
Appreciate the recommendation! Let me try it out.
Keep working ,great job!
Awesome post
Dynamics training

Shaikziya Pasha said...


Wonderful post! Youve made some very astute observations and I am thankful for the the effort you have put into your

writing. Its clear that you know what you are talking about. I am looking forward to reading more of your sites content.
Hadoop training

Shaikziya Pasha said...

Blasphemy! LOL Just kidding. Ive read similar things on other blogs. Ill take your word for it. Stay solid! your pal.
Hmm, that is some compelling information youve got going! Makes me scratch my head and think. Keep up the good writing!
Latest jobs