In this post, I am going to tell you about how to set-up a Hadoop cluster on Google Cloud Platform.
Register on Google Cloud Platform
First of all, you have to register on Google cloud. It’s easy. Just sign-in with your Gmail id and fill your credit card details. Once registered you will get (300 USD) 1-year free subscription on Google Cloud.
How to create Virtual Machines
- Create a new project. Give a name to your project or leave as it is provided by Google.
- Now click on the icon on the top left corner of your homepage. A list of products and services will appear which the Google cloud provides. Click on Compute Engine and then click on VM instances. The VM Instances page will open, select Create Instance.
Create Four Machines as below.
- Name – CM, DN1, DN2 and RM.
- Zone – Select nearest Zone.
- Machine Type – 3vCPU X 12GB(CM) and 1vCPU X 3.5GB (DN1 X DN2) and 2vCPU X 7GB(RM)
- Boot disk – Click on change. I am familiar with Red Hat Enterprise Linux 7 so I chose that. Leave Boot disk type as it is and increase your disk size – 150 for all.
- Identity and API access – Leave as it is.
- Firewall – Allow HTTP traffic.
- Click on create.
Once created, it will appear like this as shown below.
Configure the Machines:
First of all, click on SSH (SSH is a network protocol that allows you to access a remote computer in a secure way). A terminal will open. Now do the following steps:
Login with root user. Use the command: sudo su
Disable firewall and stop the currently running firewall
chkconfig iptables off
service iptables Stop
Disable the SELinux-
Write “disabled” in place of “enforcing”.
Setup SSHD –
Update the “sshd_config” file as below.
[root@cm ~]# vi /etc/ssh/sshd_config
# Enable root Login - changes no to yes.
# Similarly change the following two parameters.
Password change –
Set the password for all machines.
[root@cm ~]# passwd
Changing password for user root.
Retype new password:
passwd: all authentication tokens updated successfully.
Once done restart your node by pressing init 6.
Download the cloudera-manager-installer.bin file on CM node and change the permissions and run the installer.
[root@cm ~]# wget <span style="text-decoration: underline;">http://archive.cloudera.com/cm5/installer/5.12.0/cloudera-manager-installe<a title="r.bin" href="http://archive.cloudera.com/cm5/installer/5.12.0/cloudera-manager-installer.bin" target="_blank" rel="noopener">r.bin</a></span>
[root@cm ~]# chmod u+x cloudera-manager-installer.bin
[root@cm ~]# sudo ./cloudera-manager-installer.bin
The installation part is simple and straightforward. Accept the Licence for both cloudera and Oracle JDK.
Click Next and install JDK and CDH.
Installation complete and Open URL which is external IP address of the Machines where you run the installed which is CM in my case.
Login to the page with admin/admin. Accept the licence and accept cloudera Enterprise Data hub.
Next screen will show all the services which are available. Press continue.
Next page you need to specify the Host IP Address/Hostname for all your instances and then click search.
After all the hosts have been searched, it will display the following page.
The next page which will emerge is where you select the repository.
- Choose Method -> select Use Parcels
- Select the version of CDH -> select CDH 5.14
Next anc click the box to install JDK.
Next page will be of enabling Single User Mode. Just click Continue. No need to enable that.
Provide SSH login credentials. Enter your password which you have set during configuring the server and hit Continue.
It will install required packages and will take time.
Last step is validation and now the cluster installation part is complete.
Now you have come on the Cluster Setup page. You can select which services you want to install. At the bottom, you will find Custom Service through which you can choose whichever services you want to assign to your cluster. I selected HDFS and YARN and click continue.
In role assignment page, assign roles to different nodes and view by host the final distribution. I assigned as below:
- CM – NameNode, SecondaryNameNode and ClouderaManager related services.
- RM – ResourceManager, JobHistroyServer
- DN1/DN2 – DataNode1/2 and NodeManager1/2
Set up repository for report Manager. Use embedded DB but in production use custom database like MySql/MSSql. Click on Test Connection and then Continue.
Keep the default the settings for block size and data directories and press continue.
Now it will start all the services on your cluster and will also take some time. This concludes the cluster installation part.
You will see the Cloudera Manager home page.
The yellow signs shown near the services are warnings that can be ignored now but should be analyzed and fixed if you are going to bring the cluster in production.
If you like this article, please share it further.