Setting up HADOOP made easy
This blog talks about the HADOOP installation.
It takes at the max 2 hours for the installation if you are lucky 🙂
Please follow the below steps:
Step-1:
1. Download a stable release ending with tar.gz (hadoop-1.2.1.tar.gz)
2. In Linux, create a new folder “/home/hadoop”
3. Move the downloaded file to the folder “/home/hadoop” using Winscp or Filezilla.
4. In putty type: cd /home/hadoop
5. Type: tar xvf hadoop-1.2.1.tar.gz
Step-2:
Downloading and setting up java:
1.Check if Java is present
Type: java –version
2. If java is not present, please install it by following the below steps
3. Make a directory where we can install Java (/usr/local/java)
4. Download 64-bit Linux Java JDK and JRE ending with tar.gz from the below link:
http://oracle.com/technetwork/java/javase/downloads/index.html
5. Copy the downloaded files to the created folder
6. Extract and install java:
Type: cd /usr/local/java
Type: tar xvzf jdk.*.tar.gz
Type: tar xvzf jre.*.tar.gz
7. Include all the variables for path and Home directories in the /etc/profile at the end of file
JAVA_HOME=/usr/local/java/jdk1.7.0_40
PATH=$PATH:$JAVA_HOME/bin
JRE_HOME=/usr/local/java/jre1.7.0_40
PATH=$PATH:$JRE_HOME/bin
HADOOP_INSTALL=/home/hadoop/hadoop-1.2.1
PATH=$PATH:$ HADOOP_INSTALL /bin
Export JAVA_HOME
Export JRE_HOME
Export HADOOP_INSTALL
8. Run the below commands so that Linux can understand where Java is installed:
sudo update-alternatives –install “/usr/bin/java” “java” “/usr/local/java/jre1.7.0_40/bin/java” 1
sudo update-alternatives –install “/usr/bin/javac” “javac” “/usr/local/java/jdk1.7.0_40/bin/javac” 1
sudo update-alternatives –install “/usr/bin/javaws” “javaws” “/usr/local/java/jre1.7.0_40/bin/javaws” 1
sudo update-alternatives –set java /usr/local/java/ jre1.7.0_40/bin/java
sudo update-alternatives –set javac /usr/local/java/jdk1.7.0_40/bin/javac
sudo update-alternatives –set javaws /usr/local/java/jre1.7.0_40/bin/javaws
9. Test Java by typing Java –version
10. Check if JAVA_HOME is set by typing: echo $JAVA_HOME
Now we are done with the installation of Hadoop (Stand alone mode). 🙂
Step-3:
We can check if we are successful by running an example.
Go to Hadoop Installation directory
Type: mkdir output
Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
Type: ls output/*
The output is displayed with the success.
Step-4:
As a next step, change the configuration in the below files:
1. In the Hadoop installation folder change /conf/core-site.xml file to:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
2. Change /conf/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
3. Change /conf/mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
4. Edit /conf/hadoop-env.sh file:
export JAVA_HOME=/usr/local/java/ jdk1.7.0_40
Step-5:
1. Setup password less ssh by running the below commands:
Type: ssh-keygen -t dsa -P ” -f ~/.ssh/id_dsa
Type: cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
2. To check if the ssh password is disabled
Type: ssh localhost (It should not ask any password)
3. Format the name node:
Type: /bin/hadoop namenode –format
Step-6:
To start all the Hadoop services:
Type: /bin/start-all.sh
Now try the same example which we tried earlier:
Type: bin/hadoop jar hadoop-examples-*.jar grep input output ‘dfs[a-z.]+’
It should give the output.
To stop all the Hadoop services:
Type: /bin/stop-all.sh