Wednesday, June 17, 2009

Hadoop cluster setup (0.20.0)

On the NameNode (master) machine:

1. In file /etc/hosts, define the ip address of the namenode machine and all the datanode machines. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1) for all the machines including the namenode, otherwise the datanodes will not be able to connect to namenode machine).

    192.168.1.9    hadoop-namenode
    192.168.1.8    hadoop-datanode1
    192.168.1.7    hadoop-datanode2

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from namenode to all datanode machines. Refer to Configure passwordless ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/usr/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://hadoop-namenode:54310</value>
            <description>The name of the default file system.  A URI whose
            scheme and authority determine the FileSystem implementation. The
            uri's scheme determines the config property (fs.SCHEME.impl) naming
            the fleSystem implementation class.  The uri's authority is used to
            determine the host, port, etc. for a filesystem.
            </description>
        </property>

        <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/hdfs/tmp</value>
            <description>A base for other temporary directories.</description>
        </property>
    </configuration>

    Note: Remeber to replace namenode and datanode machine names with real machine names here

6. Edit the file $HADOOP_INSTALL_DIR/conf/hdfs-site.xml and add the following properties. (This file defines the properties of the namenode and datanode).

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs/name</value>
            <description>Determines where on the local filesystem an DFS name
            node should store its blocks.  If this is a comma-delimited list of
            directories, then data will be stored in all named directories,
            typically on different devices. Directories that do not exist are
            ignored.
            </description>
        </property>

        <property>
            <name>dfs.data.dir</name>
            <value>/opt/hdfs/data</value>
            <description>Determines where on the local filesystem an DFS data
            node should store its blocks.  If this is a comma-delimited list of
            directories, then data will be stored in all named directories,
            typically on different devices. Directories that do not exist are
            ignored.
            </description>
        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created. The default is used if replication is not specified in
            create time.
            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>

    Note: Remeber to replace namenode and datanode machine names with real machine names here

7. Edit $HADOOP_INSTALL_DIR/conf/mapred.xml and add the following configuration. (This file defines the configuration for the maper and reduser)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
    </configuration>

    Note:
    1. If you only need to use HDFS, leave this file empty.
    2. Remeber to replace namenode and datanode machine names with real machine names here.

8. Edit $HADOOP_INSTALL_DIR/conf/masters and add the machine names where a secondary namenodes will run.

    hadoop-secondarynamenode1
    hadoop-secondarynamenode2
    
    Note: If you want to secondary name to to run on the same machine as the primary namenode, enter the machine name of primary namenode machine.

9. Edit $HADOOP_INSTALL_DIR/conf/slaves and add all the datanodes machine names. If you are running a datanode on the namenode machine do remember to add that also.

    hadoop-namenode
    hadoop-datanode1
    hadoop-datanode2

    Note: Add namenode machine name only if you are running a datanode on namenode machine.

On DataNode (slave) machine:

1. In file /etc/hosts, define the ip address of the namenode machine. Make sure you define the actual ip (eg. 192.168.1.9) and not the localhost ip (eg. 127.0.0.1).

    192.168.1.9    hadoop-namenode

    Note: Check to see if the namenode machine ip is being resolved to actual ip not localhost ip using "ping hadoop-namenode".

2. Configure password less login from all datanode machines to namenode machines. Refer to Configureing password less ssh access for instructions on how to setup password less ssh access.

3. Download and unpack hadoop-0.20.0.tar.gz from Hadoop website to some path in your computer (We'll call the hadoop installation root as $HADOOP_INSTALL_DIR now on).

4. Edit the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and define the $JAVA_HOME.

    export JAVA_HOME=/usr/lib/jvm/java-6-sun

5. Edit the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. (These configurations are required on all the node in the cluster)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://hadoop-namenode:54310</value>
        <description>The name of the default file system.  A URI whose scheme
        and authority determine the FileSystem implementation. The uri's scheme
        determines the config property (fs.SCHEME.impl) naming the FileSystem
        implementation class.  The uri's authority is used to determine the
        host, port, etc. for a filesystem.
        </description>
    </property>

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hdfs/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>
    </configuration>

6. Edit the file $HADOOP_INSTALL_DIR/conf/hdfs-site.xml and add the following properties. (This file defines the properties of the namenode and datanode).

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
        <property>
            <name>dfs.name.dir</name>
            <value>/opt/hdfs/name</value>
            <description>Determines where on the local filesystem an DFS name
            node
should store its blocks.  If this is a comma-delimited
            list of directories, then data will be stored in all named
            directories, typically on different devices.
            Directories that do not exist are ignored.
            </description>
        </property>

        <property>
            <name>dfs.data.dir</name>
            <value>/opt/hdfs/data</value>
            <description>Determines where on the local filesystem an DFS data
            node
should store its blocks.  If this is a comma-delimited
            list of directories, then data will be stored in all named
            directories, typically on different devices.
            Directories that do not exist are ignored.
            </description>
        </property>

        <property>
            <name>dfs.replication</name>
            <value>1</value>
            <description>Default block replication.
            The actual number of replications can be specified when the file is
            created. The default is used if replication is not specified in
            create
time.
            </description>
        </property>

        <property>
            <name>dfs.datanode.du.reserved</name>
            <value>53687090000</value>
            <description>This is the reserved space for non dfs use</description>
        </property>
    </configuration>

7. Edit $HADOOP_INSTALL_DIR/conf/mapred.xml and add the following configuration. (This file defines the configuration for the maper and reduser)

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <configuration>
    </configuration>

    Note: If you only need to use HDFS, leave this file empty.

Start and Stop hadoop daemons:

1. Before you start the Hadoop daemons, you need to format the filesystem. Execute the following command to format the file system.

    $HADOOP_INSTALL_DIR/bin/hadoop namenode -format

2. You need to start/stop the daemons only on the master machine, it will start/stop the daemons in all slave machines.

    To start/stop all the daemons execute the following command.

    $HADOOP_INSTALL_DIR/bin/start-all.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-all.sh

    To start only the dfs daemons execute the following command

    $HADOOP_INSTALL_DIR/bin/start-dfs.sh
    or
    $HADOOP_INSTALL_DIR/bin/stop-dfs.sh

49 comments :

Anonymous said...

I performed all steps specified by you in my 2-node cluster. When i run jps command on mater node, i get following:
8895 TaskTracker
7920 JobTracker
8945 Jps
7832 SecondaryNameNode
7520 NameNode

seems like datanode not working.
while on slave node, jps command output:
2445 Jps

Help !!

KAMALAKUR said...

Hi iam also getting the same result

Unknown said...

Hi.
I have facing a problem on installation of hadoop multinode cluster. I have installed single node cluster in 2 machines and configured the conf folder of both machines. i have taken one as master machine and another as slave. When i execute start-all.sh on master machine all the parameters are enabled properly on master but didnt enable on slave. When i execute start-all.sh on master iam getting eror that cannot copy log files into slave.

The error is as below,

hadoop@user-G41M-Combo:/usr/local/hadoop/hadoop$ bin/start-all.sh
namenode running as process 25724. Stop it first.
slave: mv: cannot move `/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-HP-dx2480-MT-NA125PA.out' to `/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-HP-dx2480-MT-NA125PA.out.1': Permission denied
slave: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-HP-dx2480-MT-NA125PA.out
master: starting datanode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-G41M-Combo.out
slave: log4j:ERROR Failed to rename [/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-HP-dx2480-MT-NA125PA.log] to [/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-datanode-user-HP-dx2480-MT-NA125PA.log.2011-04-11].
master: starting secondarynamenode, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-user-G41M-Combo.out
jobtracker running as process 27006. Stop it first.
slave: mv: cannot move `/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-HP-dx2480-MT-NA125PA.out' to `/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-HP-dx2480-MT-NA125PA.out.1': Permission denied
slave: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-HP-dx2480-MT-NA125PA.out
master: starting tasktracker, logging to /usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-G41M-Combo.out
slave: log4j:ERROR Failed to rename [/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-HP-dx2480-MT-NA125PA.log] to [/usr/local/hadoop/hadoop/bin/../logs/hadoop-hadoop-tasktracker-user-HP-dx2480-MT-NA125PA.log.2011-04-11].

Anonymous said...

/usr/local/hadoop/hadoop-0.20.203.0/bin/start-dfs.sh
starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-namenode-hadoop-ThinkCentre-A51.out
localhost: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-datanode-hadoop-ThinkCentre-A51.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203.0/bin/../logs/hadoop-hduser-secondarynamenode-hadoop-ThinkCentre-A51.out
hduser@hadoop-ThinkCentre-A51:~$ jps
12799 SecondaryNameNode
12837 Jps



seems like datanode is not working.....in log file it shows that /app/hadoop/tmp/dfs/data coesnot exists.........
plz help us to solve this

Amit Patel said...

Hi Rajeev..u r doing great work by posting such blog...

Anonymous said...

at step 7. do you mean mapred-site.xml ? because there is no mapred.xml in my hadoop folder

Unknown said...

Blasphemy! LOL Just kidding. Ive read similar things on other blogs. Ill take your word for it. Stay solid! your pal.
Hmm, that is some compelling information youve got going! Makes me scratch my head and think. Keep up the good writing!
Latest jobs

Unknown said...


Wonderful post! Youve made some very astute observations and I am thankful for the the effort you have put into your

writing. Its clear that you know what you are talking about. I am looking forward to reading more of your sites content.
Hadoop training

Unknown said...

Hi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good continuationAll the best for all your blogging efforts.
Appreciate the recommendation! Let me try it out.
Keep working ,great job!
Awesome post
Dynamics training

sheela rajesh said...

This blog is full of Innovative ideas.surely i will look into this insight.please add more information's like this soon.
Hadoop Training in Chennai
Big data training in chennai
Hadoop Training in Anna Nagar
JAVA Training in Chennai
Python Training in Chennai
Selenium Training in Chennai
Hadoop training in chennai
Big data training in chennai
big data course in chennai

jenifer irene said...

Such an excellent and interesting blog, do post like this more with more information, This was very useful, Thank you.
aviation training in Chennai
cabin crew course in Chennai
diploma in airline and airport management in Chennai
airport ground staff training in Chennai
Aviation Academy in Chennai
air hostess training in Chennai
airport management courses in Chennai
ground staff training in Chennai

Anbarasan14 said...

That’s a very useful information indeed. Kindly do share more post in detail.

Education Franchise India
Spoken English Franchise
Franchise For Spoken English Classes
Top Education Franchise In India
Best Education Franchise In India
Computer Education Franchise
Education Franchise India
Computer Center Franchise
Education Franchise Opportunities In India

markson said...

At 21 years old there is just a single thing the majority of our psyches, how are we going to settle down? data science course in pune

ExcelR Solutions said...

Excellent Blog! I would like to thank for the efforts you have made in writing this post.
digital marketing course

seoexpert said...

Nice Post...I have learn some new information.thanks for sharing.
ExcelR data analytics course in Pune | business analytics course | data scientist course in Pune

seoexpert said...

Such a very useful article. I have learn some new information.thanks for sharing.
data scientist course in mumbai

Manikanta said...

Such a very useful article. Very interesting to read this article. I have learn some new information.thanks for sharing. ExcelR

ExcelR Pune said...

Very nice blog here and thanks for post it.. Keep blogging...
ExcelR data science training

Excelrsolutions said...

I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work. data science courses

Data Science Course said...

Impressive! I finally found a great post here. Nice article on data science . It's really a nice experience to read your post. Thanks for sharing your innovative ideas to our vision.
Data Science Course
Data Science Course in Marathahalli

Unknown said...

The information provided on the site is informative. Looking forward for more such blogs. Thanks for sharing .
Artificial Inteligence course in Mangalore
AI Course in Mangalore

hrithiksai said...

This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data science courses

priyash said...

wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
Data science Interview Questions

priyash said...

wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
Data science Interview Questions
Data Science Course

Priyanka said...

Attend The Data Science Training Bangalore From ExcelR. Practical Data Science Training Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Training Bangalore.
Data Science Training Bangalore
Data Science Interview Questions

Priyanka said...

Attend The Business Analytics Course From ExcelR. Practical Business Analytics Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analytics Course.
Business Analytics Course

priyash said...

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspried me to read more. keep it up.
Correlation vs Covariance

ravali said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

Correlation vs Covariance

priyash said...

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Correlation vs Covariance
Simple linear regression

CloudLearn ERP said...

You must have a lot of pride in writing quality content. I'm impressed with the amount of solid information you have written in your article. I hope to read more.
Best Data Science training in Mumbai

Data Science training in Mumbai

Data Science Bangalore said...

Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

Data Science In Banglore With Placements
Data Science Course In Bangalore
Data Science Training In Bangalore
Best Data Science Courses In Bangalore
Data Science Institute In Bangalore

Thank you..

ek said...

PMP Certification
You completely match our expectation and the variety of our information.

Anirban Ghosh said...

I must say you are very skilled at persuasive writing if you can convince me to share your views. I am very impressed by your excellent writing abilities. Please keep up the good informational writing.
SAP training in Kolkata
SAP training Kolkata
Best SAP training in Kolkata
SAP course in Kolkata
SAP training institute Kolkata

priyash said...

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Correlation vs Covariance
Simple linear regression
data science interview questions

EXCELR said...


I have recently visited your blog profile. I am totally impressed by your blogging skills and knowledge.
Data Science Course in Hyderabad

priyanka said...

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Correlation vs Covariance
Simple linear regression
data science interview questions

EXCELR said...


Very interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.
Data Science Course in Hyderabad

priyash said...

Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Correlation vs Covariance
Simple linear regression
data science interview questions

Priyanka said...

Attend The Data Analyst Course From ExcelR. Practical Data Analyst Course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Analyst Course.
Data Analyst Course

ek said...

I will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
PMP Certification Pune
I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.
Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

hrithiksai said...

Very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data science course in Hyderabad

priyanka said...

Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Simple Linear Regression
Correlation vs covariance
data science interview questions
KNN Algorithm

hrithiksai said...

Very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data science course in Hyderabad

EXCELR said...

Thanks for such a great article here. I was searching for something like this for quite a long time and at last, I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays.Also Checkoutdata science course in Hyderabad

EXCELR said...

Your article was so impressive and informative. Its very interesting to read. Thanks for sharing,data scientist courses

hrithiksai said...

This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing, data sciecne course in hyderabad

priyash said...

Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
Correlation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
Logistic Regression explained

EXCELR said...

"I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it. data science courses
"

hrithiksai said...

Very nice blogs!!! i have to learning for lot of information for this sites…Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data science training in Hyderabad