setting up single node hadoop cluster on Ubuntu
1. Install ssh
-"sudo apt-get install openssh-server"
2. Generate .ssh password less key
-"ssh-keygen"
-press Enter - press Enter
(You can try "ssh localhost" but this will ask you for password)
3. Copy the Generated key to Authorized_keys, To prevent the password
-"cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys"
(Now you can try "ssh localhost", this won't ask you for password any more )
4. Download and Install Oracle JAVA
-Download java from oracle, in my case it is "jdk-7u75-linux-x64.tar.gz"
-Extract it by "tar -zxvf jdk-7u75-linux-x64.tar.gz"
-Make directory "sudo mkdir -p /usr/local/java"
-Move it to java directory "sudo mv jdk1.7.0_75 /usr/local/java"
5. Download Apache Hadoop from "http://hadoop.apache.org/releases.html#Download"
-In my case "hadoop-1.2.1-bin.tar.gz"
-Extract it by "tar -zxvf hadoop-1.2.1-bin.tar.gz"
-Make directory "sudo mkdir -p /usr/local/hadoop"
-Move it to hadoop directory "sudo cp -r hadoop-1.2.1 /usr/local/hadoop"
6. Hadoop works only with the IPv4, so IPv6 should be disabled
-Run he below command to check
-"cat /proc/sys/net/ipv6/conf/all/disable_ipv6"
-0 means it’s disabled and 1 is enabled
-If Enabled then you must disable it, "sudo nano /etc/sysctl.conf"
-Modify below values and set it to 1
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
7.Now set the Environment variable
-"cd" This will navigate to home directory
-"sudo nano .bashrc" and add below lines in end of the file
(Note : java version that you are using )
JAVA_HOME=/usr/local/java/jdk1.7.0_75
PATH=$PATH:$JAVA_HOME/bin
HADOOP_HOME=/usr/local/hadoop
PATH=$PATH:$HADOOP_HOME/bin
export JAVA_HOME
export HADOOP_HOME
export PATH
8. Now setup Hadoop
-"sudo nano /usr/local/hadoop/conf/hadoop-env.sh"
-add JAVA_HOME (export JAVA_HOME=/usr/local/java/jdk1.7.0_75)
-"sudo nano /usr/local/hadoop/conf/core-site.xml"
-add below entries
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
-"sudo mkdir -p /usr/local/hadoop/tmp"
-"sudo nano /usr/local/hadoop/conf/mapred-site.xml"
-add below entries
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
-"sudo nano /usr/local/hadoop/conf/hdfs-site.xml"
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.block.size</name>
<value>67108864</value>
</property>
<configuration>
9.Format the hadoop file system
-"hadoop namenode -format"
10.Start the processes
- "start-all.sh"
11.Check the processes
-"jps"
12.If Datanode is not up and Running
Comments
Post a Comment