I have written couple of blogs to set up Hadoop as Single/Cluster Muti-node environment and deploying, configuring and running a Hadoop cluster manually is rather time and cost-consuming. Here’s a helping hand to create a fully distributed Hadoop cluster with Cloudera Manager. In this blog, we’ll see how fast and easy to install Hadoop cluster with cloudera Manager.
Software used:
- CDH5
- Cloudera Manager – 5.7
- OS – REHL 7
- VirtualBox – 5.2
Prepare Servers:
For Minimal cluster, we need 3 servers for non-production cluster.
- CM – CloudManager + other Hadoop Services ( Minimum 8GB )
- DN1/DN2 – Data Nodes
Please do the following steps on one machine CloudManager (CM)
Disable Selinux:
1 2 |
vi /etc/selinux/config SELINUX=disabled |
Setup NTP:
1 2 3 4 5 6 7 |
[root@CM ~]# yum install ntp -y [root@CM ~]# chkconfig ntpd on Note: Forwarding request to 'systemctl enable ntpd.service'. Created symlink from /etc/systemd/system/multi-user.target.wants/ntpd.service to /usr/lib/systemd/system/ntpd.service. [root@CM ~]# service ntpd start Redirecting to /bin/systemctl start ntpd.service [root@CM ~]# hwclock --systohc |
Disable firewall:
1 2 3 4 5 |
[root@CM ~]# yum install iptables-services [root@CM ~]# service iptables stop Redirecting to /bin/systemctl stop iptables.service [root@CM ~]# chkconfig iptables off Note: Forwarding request to 'systemctl disable iptables.service'. |
Distribute Authentication Key-pairs:
1 2 |
[root@CM] .ssh]$ ssh-keygen [root@CM] ssh-copy-id -i .ssh/id_rsa.pub 192.168.1.80 |
Define host names:
Edit hosts file in /etc/ folder in clusterManager Node (CM) , specify the IP address of each system followed by their host names. Each machine need a static IP address and all VM’s machines should be ping able from each other.
1 2 3 4 |
[root@CM ~]# cat /etc/hosts 192.168.1.80 CM 192.168.1.81 DN1 192.168.1.82 DN2 |
Now clone the machines to DN1/DN2. Update the IP address and hostname. Test the SSH without password and display the hostname.
1 2 3 4 |
[root@CM ~]# for i in `cat hosts`; do ssh $i "hostname -f"; done CM DN1 DN2 |
Install Cloudera Manager and Agents:
Installation could be divided into the following steps:
- Install MySql database
- Java Set up
- install and run Cloudera Manager server/ Agents
Install MySql Database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
# If MariaDB installed then remove it first. [root@CM ~]# yum remove mariadb mariadb-server [root@CM ~]# rpm -ivh "https://dev.mysql.com/get/mysql57-community-release-el7-11.noarch.rpm" [root@CM ~]# yum install mysql mysql-server [root@CM ~]# systemctl start mysqld [root@CM log]# grep password /var/log/mysqld.log [root@CM log]# /usr/bin/mysql_secure_installation Securing the MySQL server deployment. Enter password for user root: The existing password for the user account root has expired. Please set a new password. New password: Re-enter new password: The 'validate_password' plugin is installed on the server. The subsequent steps will run with the existing configuration of the plugin. Using existing password for root. Estimated strength of the password: 100 Change the password for root ? ((Press y|Y for Yes, any other key for No) : y New password: Re-enter new password: Estimated strength of the password: 100 Do you wish to continue with the password provided?(Press y|Y for Yes, any other key for No) : y By default, a MySQL installation has an anonymous user, allowing anyone to log into MySQL without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? (Press y|Y for Yes, any other key for No) : y Success. Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? (Press y|Y for Yes, any other key for No) : y Success. By default, MySQL comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? (Press y|Y for Yes, any other key for No) : y - Dropping test database... Success. - Removing privileges on test database... Success. Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? (Press y|Y for Yes, any other key for No) : y Success. All done! |
Java Set up:
Hadoop is written in Java so we need to set up Java. Install the Oracle Java Development Kit (JDK) as below on all nodes.
1 2 3 4 5 6 7 8 9 |
[root@CM ~] mkdir /usr/java/ [root@CM ~] mv /root/software/jdk-7u75-linux-x64.tar.gz /usr/java/ [root@CM ~] tar -xvf jdk-7u75-linux-x64.tar.gz [root@CM ~] ln -s jdk1.7.0_75 default [root@CM ~] vim /etc/profile.d/java.sh export JAVA_HOME=/usr/java/default export PATH=$PATH:$JAVA_HOME:$JAVA_HOME/bin # Made file executable. [root@CM ~] chmod +x /etc/profile.d/java.sh |
Logout and login and you can see java version.
1 2 3 4 |
[root@CM ~] java -version java version "1.7.0_75" Java(TM) SE Runtime Environment (build 1.7.0_75-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.75-b04, mixed mode) |
Copy JDK to other Nodes DN1/DN2 as well.
Get MySQL jdbc connector:
1 2 3 4 5 |
[root@CM ~]# wget "https://dev.mysql.com/get/downloads/connector-j/mysql-connector-java-5.1.43.tar.gz" [root@CM ~]# tar -xvf mysql-connector-java-5.1.43.tar.gz [root@CM ~]# mkdir /usr/share/java [root@CM]# cd /root/mysql-connector-java-5.1.43 [root@CM mysql-connector-java-5.1.43]# cp mysql-connector-java-5.1.43-bin.jar /usr/share/java/mysql-connector-java.jar |
Install Cloudera Manager and Agents:
Now deploy the cloudera manager server and agent.
Setup Cloudera repository:
1 2 |
[root@CM ~]# Wget "http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo" [root@CM ~]# mv cloudera-manager.repo /etc/yum.repos.d/ |
Now install packages for Cloudera Manager on CM node.
1 |
[root@CM ~]# yum install cloudera-manager-server cloudera-manager-agent cloudera-manager-daemons -y |
On other two nodes , please install demons/agent only.
1 2 |
[root@DN1 ~]# yum install cloudera-manager-agent cloudera-manager-daemons -y [root@DN2 ~]# yum install cloudera-manager-agent cloudera-manager-daemons - |
Prepare Cloudera Manager Database:
1 |
[root@CM ~]# /usr/share/cmf/schema/scm_prepare_database.sh mysql -h localhost -uroot -p --scm-host localhost scm scm |
Start the cloudera manager.
1 |
[root@CM ~]# systemctl start cloudera-scm-server |
Before starting agent, update the host entry in agent config files.
1 2 3 4 5 6 7 8 |
# updaate file CM/DN1/DN2 [root@CM ~]# vi /etc/cloudera-scm-agent//config.ini # Hostname of the CM server.server_host=CM [root@CM ~]# systemctl start cloudera-scm-agent [root@DN2 ~]# systemctl start cloudera-scm-agent [root@DN1 ~]# systemctl start cloudera-scm-agent |
Hadoop cluster Set up via Cloudera Manager:
Go to http://192.168.1.80:7180/cmf/ and login page will appear. login with admin/admin as login/password.
Then read and accept the license agreement and choose “Cloudera Enterprise Data Hub Edition Trial” on the next page. After that you’ll be offered to set up a new cluster.
As you have already installed the Agents, you can see the hosts lists. Select all hosts.
Press continue and select the CDH version and select parcel method.
Press “Continue” and wait for distribution and activation.
Wait for Cluster Inspector to finish the inspection and you’ll see all installed components.
Install Hadoop cluster:
Then you can choose the cluster roles distribution across the cluster. Accept the default options. You can see the summary view via “Host view detail”.
Next Part is database set up. Please provide the database access detail.
Accept default and continue.
Wait for the Cloudera Manager to set up the cluster roles.
When cluster is installed you can see it in Cloudera Manager and start monitor the cluster state, add and remove new services in this cluster, change configurations, identify problems in the cluster and so on. The yellow signs shown near the services are warnings that can be ignored now but should be analyzed and fixed if you are going to bring the cluster in production.
Summary
Cloudera Manager makes creation and maintenance of Hadoop clusters significantly easier than if they have been managed manually. Due to this instruction it is possible to create a Hadoop cluster in less than one hour when manual configuration and deployment could take a few hours or even days.
Leave a Reply