Apache hive is a Data Warehouse software project built on top of apache Hadoop for providing data summary, query and analysis. Hive gives an SQL like interface to query data stored in various databases and file systems that integrate with Hadoop.

Configuring High Availability for Hive requires the following components to be fail proof:

  • Hive MetaStore – RDBMS (MySQL)
  • ZooKeeper
  • Hive MetaStore Server
  • HiveServer2

Set up MySQL db:

First of all set up hive metastore as MySql database. Here are the steps:

Now login MySQL database and create the hive database /user. And grant the privileges.

Install Hive:

Add the service to cluster through Cloudera Manager.

hive1.PNG

Assign nodes as below (CM – master Node).

hive2

Test Mysql database connection.

hive3.PNG

Next page and keep the configuration default.

hive5

Service addition in progress and hive service added successfully.

Test hive:

Enabling High Availability for Hive Metastore Server:

  • Select Hive Services ->  configuration
  • Select Scope -> Hive Metastore Server and category -> Advanced.
  • Locate the Hive Metastore Delegation Token Store property. or search for it by typing its name in the search box.
  • select org.apache.hadoop.hive.thrift.DBTokenStore
  • click save changes
  • Click on instance tab and add role instance.
  • Click the text field under hive metastore server.
  • Click on Select Hosts for Hive Metastore Server.

hiveha5.PNG

  • Choose another Host (RM) to configure Hive Metastore Server on.

hiveha4.PNG

  • Click Finish. You should now see new hosts added as the Hive Metastore Server.
  • Re-start the stale configurations

Notice that you now have multiple instances of Hive Metastore Server.

Test HA set up for Hive Meta Store:

SSH to any DataNode. Connect to Hiveserver2 using Beeline.

# beeline -u “jdbc:hive2://cm:10000”

Issue show database.

HIveHA7

Now from CM, select first Hive Metastore Server  and stop the connection.

HIveMS

Now stop second hive Metastore server. This command should fail which is normal.

HIVEMS3

HIVEMS4

Confiure Load balancing for HiverServer2:

To enable high availability for multiple HiveServer2 hosts, configure a load balancer to manage them. To increase stability and security, configure the load balancer on a proxy server.

Add couple of You should now see new hosts added as HiveServer2.

  • Go to the Hive service.
  • Click the Configuration tab -> Scope > HiveServer2 and Category -> Advanced
  • Locate the HiveServer2 advanced Snippet property or search for it by typing its name in the Search box.

HIVE10

The clients connecting to HiveServer2 now go through Zookeeper.

beeline -u “jdbc:hive2://dn1:2181,dn2:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2”

HIveMS7

The connection gets routed to the HiveServer2 instances in a round robin fashion.

Mandy!!!

 

 

 

 

 

 

Leave a Reply