Set up Hadoop Cluster – Multi-Node

From my previous blog, we learnt how to set up a Hadoop Single Node Installation. Now, I will show how to set up a Hadoop Multi Node Cluster. A Multi Node Cluster in Hadoop contains two or more DataNodes in a distributed Hadoop environment.  This is practically used in organizations to store and analyse their Petabytes and Exabytes of data.

Here in this blog, we are taking three machine to set up multi-node cluster – MN and DN1/DN2.

  • Master node (MN) will run the NameNode and ResourcesManager Daemons.
  • Data Nodes (DN1 and DN2) will be our data nodes that stores the actual data and provide processing power to run the jobs. Both hosts will run the DataNode and NodeManager daemons.

Software Required:

  • REHL 7 – Set up MN and DN1/DN2 with REHL 7 operating system – Minimal Install.
  • Hadoop-2.7.3
  • JAVA 7
  • SSH

Configure the System

First of all, we have to edit hosts file in /etc/ folder in MasterNode (MN) , specify the IP address of each system followed by their host names.

Disable the firewall restrictions. Continue reading → Set up Hadoop Cluster – Multi-Node

Install Apache Hadoop – Single Node REHL 7

Hadoop is a Java-based programming framework that supports the processing and storage of extremely large datasets on a cluster of inexpensive machines. It was the first major open source project in the big data playing field and provides high throughput access to application data .

The main goal of this tutorial is to get a simple Hadoop installation up and running so that you can play around with the software and learn more about it.

Environment: This  blog has been tested in the following software version.

  • REHL ( Red hat Linux 7.4) on Virtual box 5.2
  • Hadoop 2.7.3 version
  • update /etc/hosts file with Hostname and IP address.

[root@cdhs ~]# cat /etc/hosts cdhs

Dedicated Hadoop system user:

After VM set up, please add a non sudo user dedicated to Hadoop which will be used to configure Hadoop. Following command will add the user hduser and the group hadoop to VM machine. Continue reading → Install Apache Hadoop – Single Node REHL 7