Last year I created a Comparative study of the two big hadoop distributions ,cloudera and hortonworks, with their Learning products Quick Start VMs from Cloudera and Sandbox from Hortonworks.
Now Lets do the same thing again,after one year lets see what has been changed and what is the Difference and similarities between these two products or Hadoop Distros.
So lets me act same way like a new user who is trying to learn Hadoop and Big Data through Cloudera or Hortonworks, I ll start analyzing these two products from a new users perspective and angle.
The pic of CloudEra Quickstart VM and Start CloudEra Manager.
Well first thing comes to my mind is creating a working Hadoop Cluster. So lets create a Hadoop Cluster by adding more hosts to this Quickstart VM using the Cloudera manager.
I created another CentOS VM on my Desktop.
open the Browser and access cloudera manager
Add the newly created VM to the Cluster by adding it as a Host to Cloudera Manager.
Here is the output
and the details of the error
So may be we are getting advanced in terms of adding new features and nationalities but we are still lacking on the basic functionalists and the need of a new user to learn Hadoop.I tried to install the cloudera -manager-agent manually to the datanode-host with the command
“sudo yum install cloudera-manager-agent ” and it installed correctly ,the download speed of the cloudera manager agent packages are quite slow,it took some time to download and install which is around 416 MB,in a 50 MBps pipe it should be pretty fast.I am sure there will not be many simultaneous download of the agent so the download is slow.(the speed was really bad,I had to wait almost 30 mins to download the packages.)
Now lets see if still I can add the host to the hadoop cluster.
No still it is not going through,well I ll not try it more. As any new uesr would spend more time in learning the cluster and Hadoop rather than wasting time in digging out whats going wrong in adding the Hosts using Cloudera manager.
Now lets go and try the same thing with Hortonworks.
The Similar fuctionality to cloudera-manager is the Ambari from Hortonworks with which one can manage the hadoop cluster.
So lets check the Amabri fuctionality on the Hortonworks sandbox,I have logged in to the sandbox and enabled Amabri.
now lets login to Amabri and try to add a host to the Sandbox.
I created a new VM ie datanode2.localdomain tried to add it to the Sandbox as a new host to the cluster.Well before I could add a host it is asking me to give a SSH Private Key.
My comment :
Why should I crerate a Private SSH key and provide here,I have a VM ready and I need to add it to cluster ,simple let it create the key,this is a simple environment for learning ,why I need to do the basic admin work of creating a Private SSH key ?
well let me create a Private SSH Key and see if I can succeed
Immediately it returned as failed to add.It seems it didn’t connect to the VM at all.
Let me try without using SSH and manually installing the Agent on the host.But is there any way to know how to install the Hortonworks Agent instalation ? I have to search on Google coz there is no info of that on the page 😦 .
Well I found on Hortonworks site the following info
” Chapter 5. Appendix: Installing Ambari Agents Manually
In some situations you may decide you do not want to have the Ambari Install Wizard install and configure the Agent software on your cluster hosts automatically. In this case you can install the software manually.
Before you begin: on every host in your cluster download the HDP repository as described in Set Up the Bits. ”
My Comments :
This tells me I have to install HDP repository first ,com on do I have to install all manually why should I use your tool ? I can do eveything manually by learning hadoop. why should I use your tool ?
Should I install HDP repository now ? I don’t think I am in a mood to do that now.So I ll not proceed further on this.
So we came to know both the Learning VMs lack the basic functionalists of creating a hadoop cluster. May be there are good for one VM show.This is not good for the guys who wants to know the admin part of hadoop cluster ,sorry guys better you create your own cluster from beginning by following documents provided by Apache or many be individually from Cloudera and Hortonworks.
These VMs are no good for learning Hadoop administration. Better you create you own Hadoop Cluster manually. To Create a Cluster please follow my earlier blogs.
Tomorrow I ll try to find the Development aspect of these Products (VMs)
Lets see how matured they are and how helpful they are for a Hadoop Developer.