Multi Node hadoop set-up

February 18, 2015

Assign host name to master,slaves and client
Install ssh
-"sudo yum install openssh-server"
Generate .ssh password less key
-"ssh-keygen"
-press Enter - press Enter
Now we should add IP address and host name of the master,slave and client machine in /etc/hosts file
Share the generated key across all slave nodes

"ssh-copy-id -i $HOME/.ssh/id.rsa.pub <user>@<hostname>"

Setup JAVA on a Master Node

download JAVA from oracle and export the JAVA_HOME and PATH in .bashrc file then "source .bashrc"

Setup hadoop on Master Node lets say /usr/local/hadoop is the HADOOP_HOME

download Hadoop from apache and export HADOOP_HOME and PATH in .bashrc file then "source .bashrc"

Now we need to configure 6 files in $HADOOP_HOME/conf/ directory

-"sudo nano /usr/local/hadoop/conf/hadoop-env.sh"
-add JAVA_HOME (export JAVA_HOME=/usr/local/java/jdk1.7.0_75)
"sudo nano /usr/local/hadoop/conf/core-site.xml"
-add below entries, Note that NameNode is the host name of the machine where NameNode process of the hadoop HDFS is running. In my case NameNode is the master node Host name.

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://NameNode:54310</value>
</property>
</configuration>

sudo mkdir -p /usr/local/hadoop/tmp
"sudo nano /usr/local/hadoop/conf/mapred-site.xml"
-add below entries ,Note that NameNode is the host name of the machine where JobTracker process of the hadoop MapReduce is running.
<configuration>

<property>

<name>mapred.job.tracker</name>

<value>NameNode:54311</value>

</property>

</configuration>
"sudo nano /usr/local/hadoop/conf/hdfs-site.xml"

<configuration>
<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.block.size</name>

<value>67108864</value>

</property>
<configuration>
Now we need to configure master file, here just we need to do is add the host name of the node where namenode and jobtracker process will run
Similarly add the hostnames in slaves file, so that datanode and tasktracker process will runs on those slave machine

9. Now we are ready with master machine so i will use the rsync command to sych up hadoop with the slave nodes and client node

o "rsync -avg /usr/local/hadoop/ <user>@<hosatname>/usr/local/hadoop/"

-Note: here i followed the same directory structure in both master and slave machine

10. Similarly we can also sync the JAVA from master machine with the slave machines

o"rsync -avg /usr/local/java/jdk1.7.0_75/ <user>@<hosatname>/usr/local/java/jdk1.7.0_75/"

-Note : here i maintained the same directory so no need of changing the JAVA_HOME from /usr/local/hadoop/conf/hadoop-env.sh,

o But JAVA_HOME and HADOOP_HOME should be set properly in slave machine or we can use rsync for .bashrc file as well.

11. Now delete /usr/local/hadoop/tmp from all the nodes and from master node run

"hadoop namenode -format"

12. Now we can star our hadoop muli node set-up by running "start-all.sh" from master machine

Search This Blog

Hadoop stuffs....:)

Multi Node hadoop set-up

Comments

Post a Comment

Popular posts from this blog

Hadoop 1 Vs Hadoop 2

Secondary NameNode check-pointing process

Failover and fencing