Cloudera’s QuickStart VM vs Hortonworks Sandbox Part – I

Big Data is a term which is making waves around the world and People say its going to change the way Business is being done today.

Cloud Computing will change the way computing is done and Big Data is going to change the way business is done today.

One of the Big Data technology i.e Hadoop is well accepted now for managing and processing this so called Big Data.I am not going to discuss more on Big Data or Hadoop now but here I am going to just compare some learning tools to learn Big Data and Hadoop and related technologies.

As I know there are two startup companies ( may be more are there) which are doing a lot of development on Hadoop technology, one is and another is two companies are developing many hadoop related software tools for making it easier to use and develop applications on them and writing most of the hadoop codes and donating to the Hadoop opensource technologies.

These companies provide tools to be downloaded and use for hadoop educational purposes.

From Cloudera it is “Cloudera QuickStart VM” and from Hortonworks it is the “Hortonworks Sandbox”.These tools are nothing but Virtual machines in which Hadoop is installed configured along with the tools these companies provide and support ,these can be downloaded and run on any of your preferred hypervisors.

Though I am not new to these tools or Hadoop but I ll pretend here I am new and  write my experience for others to follow or comment because it is intended for learning Hadoop from basics.

What I am going to do is setup my running environment for these VMs and then download and run it on my laptop to give me a quick view of these tools and learn Hadoop and Big Data.

My Setup Details.

I have windows 7 OS running on an Intel i5 Processor with 8GB RAM on my Laptop.

I downloaded VMware Player, it is free for noncommercial use. Installed it on my laptop.Downloaded the Cloudera Quickstart VM 4.4.0-1 and Hortonwork’s Sandbox 2.0.Extracted the files to two different folders.Then I just double clicked the file with .vmx extension in each folder to start these VMs on my VMware Player.


Here are my observations for these Fast Hadoop Learning VMs from two major companies who develop and donate code to the Hadoop open source communities.

Cloudera’s QuickStart VM vs Hortonworks Sandbox

I am not going to compare who is good or who is bad but from a new learner perspective who is what kind of comparison.

OS : Both are based on CentOS

CloudEra with Full Desktop GUI and Eclipse Pre Installed


As soon as I  started the CloudEra VM I found the firefox browser opens up with a page telling what is there to use as a Hadoop User and what is there to use as an Administrator.

By just one click I  can open the CloudEra  Manager or the Hadoop User Frontend like the HUE GUI.


The browser also has bookmarks for the UIs of various roles and services such as MapReduce JobTracker, HDFS NameNode, HBase Master, Cloudera Manager, Hue, and Solr.

My Comment:

As a Java developer I have seen many developers around the world have development environment as Windows OS based Desktop or laptop so they are used to GUI rather than text mode command line interface.May be Linux and Unix developers prefer command line interface.

I clicked the Use Hadoop Link, there is no user login and password given to login.Then I clicked the back page to go back on the page, then I clicked on Administer Hadoop, here also no knowledge of username and password given.I don’t know how to login now,I have to search in the document or search google to know the username and password.

My Comment:

Well this is a learning VM and I wish if I could login right away with a given username and password.I searched in google and I found the username and password as cloudera and cloudera.

AfterLogin As Cloudera

The first page says “Potential misconfiguration detected. Fix and restart Hue.”

My Comment:

I don’t know yet what configuration this VM needs I am a Hadoop Developer I don’t know about Hadoop VM configuration. So I ll just ignore it and try to move forward. I feel this misconfiguration could have been fixed by cloudera before making this VM.

HortonWorks only Text mode ,i.e only you can login in the text mode.


All other user and admin work I have to do from another computer,or from my base Windows 7 OS using the browser by pointing the browser to the IP address of the VM.After opening the browser and pointing to this VM I found the following page.


When I clicked the Go To Sand Box link I found the following page.


Its quite obvious to find this page,why ? Because they are pointing to the localhost i.e in this case this is not the VM but the other machine where you have opened the browser so it will not open.Rather you should point to the the IP address of the VM instead of

I changed the IP to the VM’s IP which it should point to the page then I found the following page.

GotoSandBoxPage-After Changed IP

Well here I found the user name and password given to login.i.e hue and 1111

My Comment:

I am good to go and login right away and login as a Hadoop user.And I found a good number of Hadoop user examples to learn Hadoop concepts.

After Login User Page

I clicked back to go to the homepage and then I clicked on the Start tutorial button.

Tutorial Page

Lets look at the Contents of the Homepages and after login as a user.

As a Hadoop User or Developer

Cloudera :

It is a Hue Application front end with the following extra Cloudera developed applications.


Cloudera Impala: Real-Time Queries in Apache Hadoop

With Impala, you can query data, whether stored in HDFS or Apache HBase – including SELECT, JOIN, and aggregate functions – in real time.

When I tried to access Impala from outside the Cloudera VM I get the error page because is connecting to localhost only,so I have to go to the VM and then login to the Hue page to access Impala  page.

ImpalaFromOutside VM

Well I have allocated 4 GB RAM to the VM but still when I clicked on Impala Query UI it is just hang around there.Still  it didn’t comeup even after 5 mins of waiting.The  VM became unresponsive for sometime and now still it is not up after 10 mins.

So can’t show you the Impala GUI now.May be latter when I ll get it.

Sqoop :  Sqoop (“SQL-to-Hadoop”) is a straightforward command-line tool with the following capabilities:

1.Imports individual tables or entire databases to files in HDFS

2.Generates Java classes to allow you to interact with your imported data

3.Provides the ability to import from SQL databases straight into your Hive data warehouse

4.After setting up an import job in Sqoop, you can get started working with SQL database-backed data from your Hadoop MapReduce cluster in minutes.

Solr Serach :  Solr is the popular, blazing fast open source enterprise search platform from the Apache LuceneTMproject. Its major features include powerful full-text search, hit highlighting, faceted search, near real-time indexing, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Solr powers the search and navigation features of many of the world’s largest internet sites.

solr Search

Hbase Browser : The Web UI for HBase – HBase Browser

Hue provides a tool visualize HBase data with HBase Browser. With HBase’s structure, it is difficult explore, understand, and search for data. Hue’s this smart view enables intelligent browsing of data in Bbase. It also makes adding, removing, and mutating cells easier. Users can limit the number of rows retrieved and what column families to show.


It is a Hue Application frontend with some extra  application Apache HCatalog.


Apache™ HCatalog is a table and storage management layer for Hadoop that enables users with different data processing tools – Apache Pig, Apache MapReduce, and Apache Hive – to more easily read and write data on the grid.

HCatalog’s table abstraction presents users with a relational view of data in the Hadoop Distributed File System (HDFS) and ensures that users need not worry about where or in what format their data is stored. HCatalog displays data from RCFile format, text files, or sequence files in a tabular view.


There are good number of Tutorial available to learn Hadoop and in just next one hour I could get a great deal of the hadoop development environment.

As a Hadoop Administrator :

Cloud Era :

I can click and open the Administrator Page from the open firefox page then it  popup the username and password to login with cloudera and cloudera.


After login to the Admin Page with username as cloudera and password as cloudera here is page.



After running the Sandbox I don’t  have any knowledge how to administer this Hadoop VM from the Web Browser.So I have to do all admin work from the  console of the VM using command line.

I am going to write more on the usability of these tools as a normal user and as an administrator tomorrow.


6 thoughts on “Cloudera’s QuickStart VM vs Hortonworks Sandbox Part – I

  1. Michael Aube says:

    Very nice initial work here!

    If you are looking for the Ambari cluster monitoring and management tool inside the Hortonworks Sandbox, you first need to enable it. Take a look at the Sandbox home page on the lower right . There you will find a link to “Enable Ambari.” Click that link, and you will then be able to see the administrator’s view of the cluster. I think you will like what you see!

    Also, I would humbly suggest that you revisit both companies periodically for updates to their VM learning environments. The Hortonworks Sandbox has a collection of syndicated tutorials for learning different facets of using Hadoop, and you can download tutorial updates and new tutorials with the click of a button from within the Sandbox itself. These tutorials cover a wide range of Hadoop topics, and are written by a variety of companies like Splunk, Revelytix, Concurrent, Tableau, Datameer, Talend, SAP, Syncsort and Actuate.


    • panchaleswar says:

      Hi, Thanks for your comments on my page,Well I don’t understand why you are telling about the Ambari cluster monitoring and management tool here.Off course I know it is there,I am not new to your sandbox.If you read it properly I tried to pretend as if I am a new user and commented based on that,where does your pages say that you have an administrator page ? It just says “Enable Ambari” why should I know from beginning that it is an administrator tool named Ambari,so you are asking the user to know things before even installing sandbox,that’s my point,I am talking about simplicity and usability,I know with Ambari you can do some monitoring stuffs.


    • Rajeev says:

      Tharun, install both so you can compare and contrast. They both take 4GB so have a relatively light footprint and are easy to install. Impala is only available in Cloudera. As a newbie you will also appreciate tutorials from both these companies. I believe Cloudera also have a free course in udacity about Hadoop.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s