Apache hive is a Data Warehouse software project built on top of apache Hadoop for providing data summary, query and analysis. Hive gives an SQL like interface to query data stored in various databases and file systems that integrate with Hadoop.

Configuring High Availability for Hive requires the following components to be fail proof:

  • Hive MetaStore – RDBMS (MySQL)
  • ZooKeeper
  • Hive MetaStore Server
  • HiveServer2

Set up MySQL db:

First of all set up hive metastore as MySql database. Here are the steps:

Now login MySQL database and create the hive database /user. And grant the privileges.

Install Hive:

Add the service to cluster through Cloudera Manager.


Assign nodes as below (CM – master Node).


Test Mysql database connection.


Next page and keep the configuration default.


Service addition in progress and hive service added successfully.

Test hive:

Enabling High Availability for Hive Metastore Server:

  • Select Hive Services ->  configuration
  • Select Scope -> Hive Metastore Server and category -> Advanced.
  • Locate the Hive Metastore Delegation Token Store property. or search for it by typing its name in the search box.
  • select org.apache.hadoop.hive.thrift.DBTokenStore
  • click save changes
  • Click on instance tab and add role instance.
  • Click the text field under hive metastore server.
  • Click on Select Hosts for Hive Metastore Server.


  • Choose another Host (RM) to configure Hive Metastore Server on.


  • Click Finish. You should now see new hosts added as the Hive Metastore Server.
  • Re-start the stale configurations

Notice that you now have multiple instances of Hive Metastore Server.

Test HA set up for Hive Meta Store:

SSH to any DataNode. Connect to Hiveserver2 using Beeline.

# beeline -u “jdbc:hive2://cm:10000”

Issue show database.


Now from CM, select first Hive Metastore Server  and stop the connection.


Now stop second hive Metastore server. This command should fail which is normal.



Confiure Load balancing for HiverServer2:

To enable high availability for multiple HiveServer2 hosts, configure a load balancer to manage them. To increase stability and security, configure the load balancer on a proxy server.

Add couple of You should now see new hosts added as HiveServer2.

  • Go to the Hive service.
  • Click the Configuration tab -> Scope > HiveServer2 and Category -> Advanced
  • Locate the HiveServer2 advanced Snippet property or search for it by typing its name in the Search box.


The clients connecting to HiveServer2 now go through Zookeeper.

beeline -u “jdbc:hive2://dn1:2181,dn2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”


The connection gets routed to the HiveServer2 instances in a round robin fashion.








Leave a Reply