Wednesday, June 17, 2009

Hadoop cluster setup (0.19.1)

On the NameNode (master) machine:

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

    192.168.1.9    hadoop-namenode
    192.168.1.8    hadoop-datanode1
    192.168.1.7    hadoop-datanode2

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from namenode to all datanode machines. Refer to Configure passwordless ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://rajeevks-lx:9000</value>
            <description>This is the namenode uri</description>
        </property>

        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs-0.19.0/name</value>
        </property>

        <property>
            <name>fs.data.dir</name>
            <value>/opt/hdfs-0.19.0/data</value>
        </property>

        <property>
            <name>mapred.system.dir</name>
            <value>/system</value>
            <description>This is the path where mapred will store its data in
            HDFS.
            </description>

        </property>

        <property>
            <name>mapred.local.dir</name>
            <value>/opt/hdfs-0.19.0/mapred</value>
            <description>This is the path where mapred will store its temporary
            data in local file system
            </description>

        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created.
The default is used if replication is not specified in
            create time.

            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>
                
    Note: Remeber to replace namenode and datanode machine names with real machine names here

6. Edit $HADOOP_INSTALL_DIR/conf/masters and add the machine names where a secondary namenodes will run.

    hadoop-secondarynamenode1
    hadoop-secondarynamenode2
    
    Note: If you want to secondary name to to run on the same machine as the primary namenode, enter the machine name of primary namenode machine.

7. Edit $HADOOP_INSTALL_DIR/conf/slaves and add all the datanodes machine names.

    hadoop-namenode
    hadoop-datanode1
    hadoop-datanode2

    Note: If you are running a datanode on the namenode machine, add that also.

On DataNode (slave) machine:

1. In file /etc/hosts, define the ip address of the namenode machine. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1).

    192.168.1.9    hadoop-namenode

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from all datanode machines to namenode machines. Refer to Configuring password less ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/user/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://rajeevks-lx:9000</value>
            <description>This is the namenode uri</description>
        </property>

        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs-0.19.0/name</value>
        </property>

        <property>
            <name>fs.data.dir</name>
            <value>/opt/hdfs-0.19.0/data</value>
        </property>

        <property>
            <name>mapred.system.dir</name>
            <value>/system</value>
            <description>This is the path where mapred will store its data in
            HDFS
.
            </description>
            </property>

        <property>
            <name>mapred.local.dir</name>
            <value>/opt/hdfs-0.19.0/mapred</value>
            <description>This is the path where mapred will store its temporary
            data in local file system</description>
        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created. The default is used if replication is not specified in
            create
time.
            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>

Start and Stop hadoop daemons:

1. Before you start the Hadoop daemons, you need to format the filesystem. Execute the following command to format the file system.

    $HADOOP_INSTALL_DIR/bin/hadoop namenode -format

2. You need to start/stop the daemons only on the master machine, it will start/stop the daemons in all slave machines.

    To start/stop all the daemons execute the following command.

    $HADOOP_INSTALL_DIR/bin/start-all.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-all.sh

    To start only the dfs daemons execute the following command

    $HADOOP_INSTALL_DIR/bin/start-dfs.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-dfs.sh

20 comments :

sid said...

hey i just want to install it in my personal computer. do i need to change the ip address in the beginning.

Yogesh said...

Hello Rajiv, Yogesh here

I have windows 7 as O.S and installed cygwin n Hadoop(1.0.0),

I am trying to create Single-Node-Setup, I have set the JAVA_HOME(throug cygwin [export JAVA_HOME='C:\Program Files\Java\jdk1.6.0_20'] and also on Environment Variables ) but when I start the bin/hadoop start-all.sh then it starts only Namenode and Jobtracker and for all nodes it shows error i.e JAVA_HOME is not set.

Please help me.

vinay said...

Hi, Do i need to install hadoop on client machine also(besides master and slaves machines). If yes, How do i configure the files in client hadoop and how to i copy the data from clients local disk to HDFS? Thanks!

what_u_r said...

please provide some screenshot ,we get understand fast ... cheer ..stay tune

Unknown said...

Hi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good continuationAll the best for all your blogging efforts.
Appreciate the recommendation! Let me try it out.
Keep working ,great job!
Awesome post
Dynamics training

Unknown said...


Wonderful post! Youve made some very astute observations and I am thankful for the the effort you have put into your

writing. Its clear that you know what you are talking about. I am looking forward to reading more of your sites content.
Hadoop training

Unknown said...

Blasphemy! LOL Just kidding. Ive read similar things on other blogs. Ill take your word for it. Stay solid! your pal.
Hmm, that is some compelling information youve got going! Makes me scratch my head and think. Keep up the good writing!
Latest jobs

sasi said...

The blog you shared is very good. I expect more information from you like this blog. Thankyou.
Artificial Intelligence Course in Chennai
ai courses in chennai
artificial intelligence training in chennai
ai classes in chennai
best artificial intelligence training in chennai
Hadoop Training in Bangalore
salesforce training in bangalore
Python Training in Bangalore


Reshma said...


Nice blog! Thanks for sharing this valuable information
Ethical Hacking Course in Chennai
Ethical hacking course in bangalore
Ethical hacking course in coimbatore
Ethical Hacking Training in Bangalore
Certified Ethical Hacking Course in Chennai
Ethical Hacking in Bangalore
Hacking Course in Bangalore
Ethical Hacking institute in Bangalore
Selenium Training in Bangalore
Software Testing course in Bangalore

rinjuesther said...

awesome article,the content has very informative ideas, waiting for the next update...
clinical sas training in chennai
clinical sas training fees
clinical sas training in vadapalani
clinical sas training in Guindy
clinical sas training in Thiruvanmiyur
SAS Training in Chennai
Spring Training in Chennai
LoadRunner Training in Chennai
QTP Training in Chennai
javascript training in chennai

navidevan said...

awesome article,the content has very informative ideas, waiting for the next update...
core java training in chennai
Best core java Training in Chennai
core java course
core java training in Velachery
core java training in Tambaram
C C++ Training in Chennai
javascript training in chennai
Hibernate Training in Chennai
LoadRunner Training in Chennai
Mobile Testing Training in Chennai

sasi said...

I really enjoyed this article. I need more information to learn so kindly update it.
Salesforce Training in Chennai
salesforce training in bangalore
Salesforce Course in bangalore
salesforce training institute in chennai
salesforce developer training in chennai
best salesforce training in bangalore
Big Data Course in Coimbatore
Python Training in Bangalore

Shadeep Shree said...

Nice blog!! I hope you will share more info like this. I will use this for my studies and research.
DevOps Training in Chennai
DevOps foundation certification
DevOps certification
Data Science Training in Chennai
DevOps Training in Anna Nagar
DevOps Training in Vadapalani
DevOps Training in Guindy
DevOps Training in Thiruvanmiyur

vinudevan said...

Thanks for sharing informative article with us..
DOT NET Training in Chennai
asp .net training in chennai
DOT NET Training Institute in Chennai
.net course in chennai
dot net training in vadapalani
Html5 Training in Chennai
Spring Training in Chennai
Struts Training in Chennai
Wordpress Training in Chennai
SAS Training in Chennai

Shadeep Shree said...

Wonderful blog with great piece of information. Regards to your effort. Keep sharing more such blogs. Looking forward to learn more from you.
Learn R programminga
R Training in Chennai
DevOps certification in Chennai
R course in Adyar
R Training in Anna Nagar
R Training in T Nagar
R course in Tambaram

Mithun said...

Thanks for the informative article About Java. This is one of the best resources I have found in quite some time. Nicely written and great info. I really cannot thank you enough for sharing.
Java training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery

cathrine juliet said...

Thank you for the informative post. It was thoroughly helpful to me. Keep posting more such articles and enlighten us.

Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery

Jayalakshmi said...

Thanks a lot for sharing such a good source with all, i appreciate your efforts taken for the same. I found this worth sharing and must share this with all.


Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery









Unknown said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
app and you are doing well.


Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery

sowmi said...

This was an excellent info being posted. This would definitely help the needed ones to a greater extend.
astrologers in india
astrology online
best astrologer in andhra pradesh
best astrology online
astrology
famous astrologer in andhra pradesh
best astrologer near me
top 10 astrologers in andhra pradesh