Can we have Two hive metastores in same cluster ?
Do two hive metastores access the same backend database(MySQL..)?
If yes, you can have two hive metastores.
You need to modify the warden configuration.
Change "services" in /opt/mapr/conf/conf.d/warden.hivemeta.conf on hive metastore nodes as below.
Then restart mapr-warden on their nodes.
Two hive metastores will launch in the cluster.
Please provide more info about the environment you are using and what you wish to accomplish for us to further assist you.
We are using Mapr 5.1 Enterprise. Hive is used by many client connections (hive cli,hue,squirrel) . Few of the tables in hive are used by drill as well via hive storage plugin. Now , i am trying to see the possibility of setting up separate hive metastore in same cluster, migrate those tables used by new metastore to newly configured metastore and then access via drill . Only motivation is performance improvement on drill and not effect existing client connections.
Are you looking for HA for hive metastore.? If yes, its possible.For this you have to configure two metastore services in two nodes and need to do changes in hive-site.xml.For Steps:
1. Installing 2 hivemeta services in two different nodes.say VM204 and VM2012. add these two thrifts in hive-site.xml file with , delimiterExample:<property> <name>hive.metastore.uris</name> <value>thrift://VM204:9083,thrift://VM201:9083</value> </property>
3.Restart the hivemeta services in VM204 and VM201 nodes using below command.
#maprcli node services -name hivemeta -action restart -nodes VM201 VM204
Thanks Basapuram Kumar, for the response. Please see my earlier response on the thread for more details .
As you mentioned that using into multiple sessions like CLI's Drill plugin's, hue and sqirrls are causing performance problem.
Can you please provide below info.?
What kind of performance you have observed.?how many connections at most you have found in cluster at a time.?
What kind of performance slowness you have found .?Was there any hung in any of the client connection.?
Was there any instability in services running.?
What was the backend RDBMS metastore you have used.?
Please find my responses below:
What kind of performance you have observed.?
Not yet observed in specific.
how many connections at most you have found in cluster at a time.?
mostly 10 hiveserver2 connections via squirrel , 10 drill connections via squirrel (jdbc connection ), 10 in hue. All these number may vary during peak times. But all these connections are in parallel and may be few connections are serving more resource intensive yarn jobs.
What kind of performance slowness you have found .?
since our cluster is shared (multi-tenancy), we are seeing resource contention between yarn,drill,OS. Few times drill -bits crashed. drill bits and node manager are co-located.
Was there any hung in any of the client connection.?
few hiverserver2 oom issues .
apart from resource contention, i don't see none .
we use mysql for hive schema.
Are Takeshi and Basapuram's advice helpful or correct? Please show your appreciation by marking their replies "helpful" or "correct".
How can we use same MySQL for hive two hive metastores ?
http://maprdocs.mapr.com/home/Hive/Config-MySQLForHiveMetastore.htmljavax.jdo.option.ConnectionURL must point the same MySQL server.
Retrieving data ...