In this blog, I am going to talk about how to configure and manage a High availability HDFS (CDH 5.12.0) cluster. In earlier releases, the NameNode was a single point of failure (SPOF) in a HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine. The Secondary NameNode did not provide failover capability.
The HA architecture solved this problem of NameNode availability by allowing us to have two NameNodes in an active/passive configuration. So, we have two running NameNodes at the same time in a High Availability cluster:
- Active NameNode
- Standby/Passive NameNode.
We can implement the Active and Standby NameNode configuration in following two ways:
- Using Quorum Journal Nodes
- Shared Storage using NFS
Using the Quorum Journal Manager (QJM) is the preferred method for achieving high availability for HDFS. Read here to know more about QJM and NFS methods. In this blog, I’ll implement the HA configuration for quorum based storage and here are the IP address and corresponding machines Names/roles.
- NameNode machines – NN1/NN2 of equivalent hardware and spec
- JournalNode machines – The JournalNode daemon is relatively lightweight, so these daemons can reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. There must be at least three JournalNode daemons, since edit log modifications must be written to a majority of JournalNodes.So 3 JN’s runs on NN1/NN2 and MGT Server.
- Note that when running with N JournalNodes, the system can tolerate at most (N – 1) / 2 failures and continue to function normally.
- The ZookeerFailoverController (ZKFC) is a Zookeeper client that also monitors and manages the NameNode status. Each of the NameNode runs a ZKFC also. ZKFC is responsible for monitoring the health of the NameNodes periodically.
- Resource Manager Running on same NameNode NN1/NN2.
- Two Data Nodes – DN1 and DN2