In this post, I am going to tell you about how to set-up a Hadoop cluster on Google Cloud Platform.

Register on Google Cloud Platform

First of all, you have to register on Google cloud. It’s easy. Just sign-in with your Gmail id and fill your credit card details. Once registered you will get (300 USD) 1-year free subscription on Google Cloud.

How to create Virtual Machines

  • Create a new project. Give a name to your project or leave as it is provided by Google.
  • Now click on the icon on the top left corner of your homepage. A list of products and services will appear which the Google cloud provides. Click on Compute Engine and then click on VM instances. The VM Instances page will open, select Create Instance.

Create Four Machines as below.

  • Name – CM, DN1, DN2 and RM.
  • Zone – Select nearest Zone.
  • Machine Type – 3vCPU X 12GB(CM) and 1vCPU X 3.5GB (DN1 X DN2) and 2vCPU X 7GB(RM)
  • Boot disk – Click on change. I am familiar with Red Hat Enterprise Linux 7 so I chose that. Leave Boot disk type as it is and increase your disk size – 150 for all.
  • Identity and API access – Leave as it is.
  • Firewall – Allow HTTP traffic.
  • Click on create.

Once created, it will appear like this as shown below.

GCP4.PNG

Configure the Machines:

First of all, click on SSH (SSH is a network protocol that allows you to access a remote computer in a secure way). A terminal will open. Now do the following steps:

Login with root user. Use the command: sudo su

Disable firewall-

Disable firewall and stop the currently running firewall

Disable the SELinux-

Write “disabled” in place of “enforcing”.

Setup SSHD –

Update the “sshd_config” file as below.

Password change –

Set the password for all machines.

Once done restart your node by pressing init 6.

ClouderaManager Install:

Download the cloudera-manager-installer.bin file on CM node and change the permissions and run the installer.

The installation part is simple and straightforward. Accept the Licence for both cloudera and Oracle JDK.

Click Next and install JDK and CDH.

Installation complete and Open URL which is external IP address of the Machines where you run the installed which is CM in my case.

http://35.197.189.176/:7180/cmf/login

Login to the page with admin/admin. Accept the licence and accept cloudera Enterprise Data hub.

Next screen will show all the services which are available. Press continue.

CMM14

Next page you need to specify the Host IP Address/Hostname for all your instances and then click search.

After all the hosts have been searched, it will display the following page.

GCP10

The next page which will emerge is where you select the repository.

  • Choose Method -> select Use Parcels
  • Select the version of CDH -> select CDH 5.14

GCP12

Next anc click the box to install JDK.

GCP13

Next page will be of enabling Single User Mode. Just click Continue. No need to enable that.

GCP14

Provide SSH login credentials. Enter your password which you have set during configuring the server and hit Continue.

GCP16

It will install required packages and will take time.

Last step is validation and now the cluster installation part is complete.

CM6

Setup Cluster:

Now you have come on the Cluster Setup page. You can select which services you want to install. At the bottom, you will find Custom Service through which you can choose whichever services you want to assign to your cluster. I selected HDFS and YARN and click continue.

GCP19

In role assignment page, assign roles to different nodes and view by host the final distribution. I assigned  as below:

  • CM – NameNode, SecondaryNameNode and ClouderaManager related services.
  • RM – ResourceManager, JobHistroyServer
  • DN1/DN2 – DataNode1/2 and NodeManager1/2

GCP20

Set up repository for report Manager. Use embedded DB but in production use custom database like MySql/MSSql. Click on Test Connection and then Continue.

GCP21

Keep the default the settings for block size and data directories and press continue.

GCP22

Now it will start all the services on your cluster  and will also take some time. This concludes the cluster installation part.

GCP24

You will see the Cloudera Manager home page.

GCP25

The yellow signs shown near the services are warnings that can be ignored now but should be analyzed and fixed if you are going to bring the cluster in production.

If you like this article, please share it further.

Mandy

Leave a Reply