Learning Hadoop Video Series : Creating Single Node Hadoop 2.2.0 Cluster

Here I am going to write the commands and show the snapshots taken from my Virtual Machine.Why I am writing the whole process instead of showing in a video is one can copy and paste the commands to run in his Vm as needed.

First lets understand and summarize the whole process of creating and testing out Hadoop Cluster.

So here is the steps to create the Single Node Hadoop Cluster ibn your laptop.

Creating the basic infrastructure for the deployment.

1.Dowload and Install VMware Player.
2.Dowload and install CentOS as a VM.
3.Download and Install Java 6.
4.Create a Dedicated Hadoop group : hadoop
5.Create a dedicated hadoop user : hduser and add the the hadoop group
6.Configure ssh to Use Public/Private Keys for Authentication .
7.Download Hadoop and extract it to defined directory.

Setup of OS and Hadoop Software

8.Setup Environment Variables in /etc/profile
9.Create Hadoop Data Directories for namenode and datanode.
10.Configure Hadoop Cluster
11.Format namenode
12.Starting HDFS processes and Map-Reduce Process
13.Verify Installation.

Run Java Application

14.Running Java Hadoop Application on the installed Single node cluster
15.Check the output at web interface.

Now Let me Explain you the Whole process in details.

1.Download and Install VMware Player.

2.Download CentOS ISO and Install it on VMware Player.

For this Please see my Previous post

https://panchaleswar.wordpress.com/2013/12/22/learning-hadoop-video-series-1/

3.Download and Install Java 6.

We need to Install Java.As java 6 is fully supported for hadoop so I am going use it,but one can try with java 7 also if needed.

See my previous post to install Java on CentOS.

https://panchaleswar.wordpress.com/2013/12/22/learning-hadoop-video-series-2-installing-java-on-centos-vm/

4.Create a Dedicated Hadoop group : hadoop

#groupadd hadoop

5.Create a dedicated hadoop user : hduser and add the the hadoop group

If creating a new user then use

# useradd hduser -g hadoop

if changing existing user then use

# usermod -g hadoop hduser

6.Configure ssh to Use Public/Private Keys for Authentication .

#ssh-keygen -t rsa

This will create two files in your (hidden) ~/.ssh directory called: id_rsa and id_rsa.pub
The first: id_rsa is your private key and the other: id_rsa.pub is your public key.

Now set permissions on your private key:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/id_rsa

Copy the public key (id_rsa.pub) to the server and install it to the authorized_keys list:

$ cat id_rsa.pub >> ~/.ssh/authorized_keys

and finally set file permissions on the server:

$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys

The above permissions are required if StrictModes is set to yes in /etc/ssh/sshd_config (the default).

Ensure the correct SELinux contexts are set:

$ restorecon -Rv ~/.ssh

7.Download Hadoop and extract it to defined directory.

$ wget http://www.interior-dsgn.com/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz

Be user root and do the following.

$ mv /home/hduser/hadoop-2.2.0.tar.gz /usr/local

$ tar -xvzf hadoop-2.2.0.tar.gz

8.Setup Environment Variables in /etc/profile

The following Environment Variable has to be set.Setup the Environment Variables in /etc/profile so that it ll be available to all the users.

#vi /etc/profile and add the following line to it.

export JAVA_HOME=/usr/local/jdk1.6.0_45
export ANT_HOME=/usr/local/apache-ant-1.9.2
export HADOOP_HOME=/usr/local/hadoop-2.2.0
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME

export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

$ sudo chown -R hduser:hadoop /usr/local/hadoop-2.2.0
$ sudo chmod -R 755 /usr/local/hadoop-2.2.0

Now logout of root and relogin to the VM as hduser.

9.Create Hadoop Data Directories for namenode and datanode.

Check the environment variable has been set or not using the following.

$ echo $HADOOP_HOME

Now create two Directories for name node and datanode.

$ mkdir -p $HOME/myhadoop-data/hdfs/namenode
$ mkdir -p $HOME/myhadoop-data/hdfs/datanode

10.Configure Hadoop Cluster

$ cd $HADOOP_HOME
$ vi etc/hadoop/yarn-site.xml

Add the following inside configuration tag
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>

Now edit the core-site.xml file.

$ vi etc/hadoop/core-site.xml

Add the following contents inside configuration tag

<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>

Now edit the hdfs-site.xml file

$ vi etc/hadoop/hdfs-site.xml
Add the following contents inside configuration tag

<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hduser/myhadoop-data/hdfs/namenode</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hduser/myhadoop-data/hdfs/datanode</value>
</property>

Now edit mapred-site.xml file

$ vi etc/hadoop/mapred-site.xml

Add the following contents inside configuration tag

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
11.Format a new Hadoop distributed filesystem:

$HADOOP_HOME/bin/hdfs namenode -format <cluster_name>

# Command for formatting Name node.

$ /usr/local/hadoop-2.2.0/bin/hdfs namenode -format

12.To start a Hadoop cluster you will need to start both the HDFS and YARN cluster.

Start the HDFS with the following command, run on the designated NameNode:

$ hadoop-daemon.sh start namenode

Run the script to start DataNodes on all slaves:

$ hadoop-daemon.sh start datanode

Start the YARN with the following command, run on the designated ResourceManager:

MR(Resource Manager, Node Manager & Job History Server).

$ yarn-daemon.sh start resourcemanager

Run the script to start NodeManagers on all slaves:

$ yarn-daemon.sh start nodemanager

Start the MapReduce JobHistory Server with the following command, run on the designated server:

$ mr-jobhistory-daemon.sh start historyserver

13. Verifying Installation

$ jps

RunningProcess

14.Running Java Hadoop Application on the installed Single node cluster

$ mkdir input
$ cat > input/textfile

word count example using hadoop 2.2.0.
Here we count the number of words this file has.

Add input directory to HDFS:

$ bin/hadoop hdfs -copyFromLocal input /input
Run wordcount example jar provided in HADOOP_HOME:
$ hadoop jar /usr/local/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar  /input /output

15.Check the output at web interface

http://localhost:50070
Browse HDFS dir for /output folder

Advertisements

3 thoughts on “Learning Hadoop Video Series : Creating Single Node Hadoop 2.2.0 Cluster

  1. Bruce says:

    Thanks for taking the time to create these instructions. It seems the needed information is scattered all over for 2.2.0; it’s nice to see it all in one place.

    Like

  2. Bruce says:

    Ran into a problem – NodeManager wouldn’t start and I couldn’t find any errors. Turned out there’s a typo in this line of your instructions above: yarn.nodemanager.aux-services.mapreduce.shuffle.class
    Should be yarn.nodemanager.aux-services.mapreduce_shuffle.class (underscore between ‘mapreduce’ and ‘shuffle’, not period. Thanks again!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s