setting up single node hadoop cluster on Ubuntu


1. Install ssh
    -"sudo apt-get install openssh-server"

2. Generate .ssh password less key 
    -"ssh-keygen"
    -press Enter - press Enter 
    (You can try "ssh localhost" but this will ask you for password)

3. Copy the Generated key to Authorized_keys, To prevent the password 
     -"cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
    (Now you can try "ssh localhost", this won't ask you for password any more )

4. Download and Install Oracle JAVA   
    -Download java from oracle, in my case it is "jdk-7u75-linux-x64.tar.gz"
    -Extract it by "tar -zxvf jdk-7u75-linux-x64.tar.gz"
    -Make directory "sudo mkdir -p /usr/local/java"
    -Move it to java directory "sudo mv jdk1.7.0_75 /usr/local/java"

5. Download Apache Hadoop from "http://hadoop.apache.org/releases.html#Download"   
    -In my case "hadoop-1.2.1-bin.tar.gz"
    -Extract it by "tar -zxvf hadoop-1.2.1-bin.tar.gz"
    -Make directory "sudo mkdir -p /usr/local/hadoop"
    -Move it to hadoop directory "sudo cp -r hadoop-1.2.1 /usr/local/hadoop"

6. Hadoop works only with the IPv4, so IPv6 should be disabled 

    -Run he below command to check 
    -"cat /proc/sys/net/ipv6/conf/all/disable_ipv6"
    -0 means it’s disabled and 1 is enabled 
    -If Enabled then you must disable it, "sudo nano /etc/sysctl.conf"
    -Modify below values and set it to 1

                       net.ipv6.conf.all.disable_ipv6 = 1
                       net.ipv6.conf.default.disable_ipv6 = 1
                       net.ipv6.conf.lo.disable_ipv6 = 1

7.Now set the Environment variable 

    -"cd"   This will navigate to home directory 
    -"sudo nano .bashrc"  and add below lines in end of the file 
      (Note : java version that you are using )
                      JAVA_HOME=/usr/local/java/jdk1.7.0_75

                      PATH=$PATH:$JAVA_HOME/bin
                      HADOOP_HOME=/usr/local/hadoop
                      PATH=$PATH:$HADOOP_HOME/bin
                      export JAVA_HOME
                      export HADOOP_HOME
                      export PATH

8. Now setup Hadoop 

-"sudo nano /usr/local/hadoop/conf/hadoop-env.sh"
 -add JAVA_HOME (export JAVA_HOME=/usr/local/java/jdk1.7.0_75)

-"sudo nano /usr/local/hadoop/conf/core-site.xml"
-add below entries 
                  <configuration>
                       <property> 
                              <name>hadoop.tmp.dir</name>
                              <value>/usr/local/hadoop/tmp</value>
                       </property> 

                       <property> 
                              <name>fs.default.name</name> 
                              <value>hdfs://localhost:54310</value>
                       </property> 
                 </configuration>

-"sudo mkdir -p /usr/local/hadoop/tmp"
      
-"sudo nano /usr/local/hadoop/conf/mapred-site.xml"
      -add below entries 
                    <configuration>
                         <property> 
                            <name>mapred.job.tracker</name> 
                            <value>localhost:54311</value>
                         </property> 
                 </configuration>
    
-"sudo nano /usr/local/hadoop/conf/hdfs-site.xml"
                    <configuration>
                              <property>
                                   <name>dfs.replication</name>
                                   <value>1</value>
                              </property>

                              <property>
                                 <name>dfs.block.size</name>
                                 <value>67108864</value>
                              </property>   
                   <configuration>

9.Format the hadoop file system 

         -"hadoop namenode -format"

10.Start the processes 
         - "start-all.sh"

11.Check the processes
          -"jps"

12.If Datanode is not up and Running 

Comments

Popular posts from this blog

Secondary NameNode check-pointing process

Failover and fencing

Hadoop 1 Vs Hadoop 2