setting up single node hadoop cluster on Ubuntu

February 01, 2015

1. Install ssh
-"sudo apt-get install openssh-server"

2. Generate .ssh password less key
-"ssh-keygen"
-press Enter - press Enter
  (You can try "ssh localhost" but this will ask you for password)

3. Copy the Generated key to Authorized_keys, To prevent the password
-"cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
    (Now you can try "ssh localhost", this won't ask you for password any more )

4. Download and Install Oracle JAVA
-Download java from oracle, in my case it is "jdk-7u75-linux-x64.tar.gz"
   -Extract it by "tar -zxvf jdk-7u75-linux-x64.tar.gz"
   -Make directory "sudo mkdir -p /usr/local/java"
   -Move it to java directory "sudo mv jdk1.7.0_75 /usr/local/java"

5. Download Apache Hadoop from "http://hadoop.apache.org/releases.html#Download"
-In my case "hadoop-1.2.1-bin.tar.gz"
   -Extract it by "tar -zxvf hadoop-1.2.1-bin.tar.gz"
   -Make directory "sudo mkdir -p /usr/local/hadoop"
   -Move it to hadoop directory "sudo cp -r hadoop-1.2.1 /usr/local/hadoop"

6. Hadoop works only with the IPv4, so IPv6 should be disabled

-Run he below command to check

-"cat /proc/sys/net/ipv6/conf/all/disable_ipv6"

-0 means it’s disabled and 1 is enabled

-If Enabled then you must disable it, "sudo nano /etc/sysctl.conf"

-Modify below values and set it to 1

net.ipv6.conf.all.disable_ipv6 = 1

net.ipv6.conf.default.disable_ipv6 = 1

net.ipv6.conf.lo.disable_ipv6 = 1

7.Now set the Environment variable

-"cd" This will navigate to home directory

-"sudo nano .bashrc" and add below lines in end of the file

(Note : java version that you are using )

JAVA_HOME=/usr/local/java/jdk1.7.0_75

   PATH=$PATH:$JAVA_HOME/bin
   HADOOP_HOME=/usr/local/hadoop
   PATH=$PATH:$HADOOP_HOME/bin
   export JAVA_HOME
   export HADOOP_HOME
   export PATH

8. Now setup Hadoop

-"sudo nano /usr/local/hadoop/conf/hadoop-env.sh"
-add JAVA_HOME (export JAVA_HOME=/usr/local/java/jdk1.7.0_75)

-"sudo nano /usr/local/hadoop/conf/core-site.xml"
-add below entries
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>

-"sudo mkdir -p /usr/local/hadoop/tmp"

-"sudo nano /usr/local/hadoop/conf/mapred-site.xml"
-add below entries

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

</property>

</configuration>

-"sudo nano /usr/local/hadoop/conf/hdfs-site.xml"

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>dfs.block.size</name>

<value>67108864</value>

</property>

<configuration>

9.Format the hadoop file system

-"hadoop namenode -format"

10.Start the processes

- "start-all.sh"

11.Check the processes

-"jps"

12.If Datanode is not up and Running

Troubleshoot is @ "http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#java-io-ioexception-incompatible-namespaceids"

Search This Blog

Hadoop stuffs....:)

setting up single node hadoop cluster on Ubuntu

Comments

Post a Comment

Popular posts from this blog

Hadoop 1 Vs Hadoop 2

Secondary NameNode check-pointing process

Failover and fencing