Commissioning nodes stand for adding new nodes in current cluster which operates your Hadoop framework. In contrast, decommissioning nodes stands for removing nodes from your cluster. This is very useful feature to handle node failure during the operation of Hadoop cluster without stopping entire Hadoop nodes in your cluster.
You can’t decommission a DataNode or host with DataNode if number of the data nodes equals to the replication factor. if you attempt to decommission a datanode in such situation the data node decommission process will not complete. you have to abort the decommission process and change the replication factor.
In my case, I have two data node and decommission one will leave only on data node. Before decomm process , change the replication factor to 1.
Same can be done via command line.
hdfs dfs -setrep -R -w 1/
Now restart the stale services.
Now you can decomm the Datanode.
- Go to hosts
- Select the host or hosts that you want to decommission
- Click on Action -> Select “Hosts Decommission/suppress Alert”
- Host decommission in progress and will take some time.
- Once the host is decommissioned, the “Commission state” of the host will change to “decommissioned”.
Recommission is applicable only for hosts decommissioned using Cloudera Manager.
- Go to CM –> Hosts – >select decommissioned hosts – >Actions for selected – Hosts Recommission – > confirm.
Remove host from cluster:
The host which is decommissioned, now can be removed from cluster. Remove the roles from host and leave the management role.
Remove host from Cloudera Manager:
- Go to hosts -> select host to delete
- Stop agent on the host first.
service cloudera-scm-agent stop
- Now click on remove Host from cloudera.
Add New Host to Cluster:
New host can be added through, New Host wizard. Select cluster -> add Host wizard and follow the steps.
- Search the host with IP address or hostname and select it.
- Select type of CDH software installation. Selected matched withe existing set up.
- Above step will distribute/activate the CDH software.
- Create new template and select the roles and apply it to this host.
- Wait for deployment step to complete and see the roles started on new host.
- Addition is now complete
Addition role instances can be added to the new host.
Rebalance the cluster:
In HDFS, the blocks of the files are distributed among the datanodes as per the replication factor. Whenever you add a new datanode, the node will start receiving, storing the blocks of the new files. Though this sounds alright, the cluster is not balanced when you look at administrative point view. HDFS provides a balancer utility that analyzes block placement and balances data across the DataNodes. You can do it via Cloudera after addition of new DataNode.
- Go to the HDFS service.
- Ensure the service has a Balancer role.
- Select Actions > Rebalance.
- Click Rebalance to confirm.
You can also do the same from command line.
# Set the different threshold
hdfs balancer -threshold 5