nishchai manjula

Handling Hive Metastore Database Table Version Inconsistency When Hive Client Uses an Embedded Metastore

Blog Post created by nishchai manjula Employee on Jun 19, 2017

Using an Embedded Metastore (Like MySQL or Oracle) to Connect Directly to a Metastore Database 

 

When your Hive Client (Hive CLI) connects directly to a metastore database, without using hive.metastore.uris, then Hive CLI will be using an embedded metastore to run its Hive job. During the upgrade process, if you have missed upgrading Hive CLI and are using a lower version, but your metastore database is already upgraded to the latest schema version using metastore scripts like (upgrade-<OLDVERSION>-to-<NEWVERSION>.mysql.sql, it can still connect. The old version of Hive Client will still be able to talk to the metastore database by modifying the schema to version, even though Hive has come up with property (hive.metastore.schema.verification) that can restrict schema modification. However, this property’s default value changes in different versions of Hive.    

 

Example 1:

Let’s say you are upgrading from Hive 1.2 to Hive 2.1.0. You have missed upgrading a few Clients, and therefore some of the nodes are still in Hive 1.2. And you have used the schematool to upgrade the metastore database. Now, after an upgrade of schema, your new schema will have its Version Table changed to the new Hive version 2.1. And your hive-site.xml is having the below properties: 

  1. jdo.option.ConnectionURL
  2. jdo.option.ConnectionDriverName
  3. jdo.option.ConnectionUserName
  4. jdo.option.ConnectionPassword

When you accidentally invoke the Hive Client that is on a lower Hive version (like Hive 1.2), Hive CLI will still connect to the metastore database by doing schema changes, using hive-metastore-1.2.0-mapr-*.jar.

 

Below is the Hive information message:

ERROR [main]: metastore.ObjectStore (ObjectStore.java:checkSchema(6691)) - Version information found in metastore differs 2.1.0 from expected schema version 1.2.0. Schema verification is disabled hive.metastore.schema.verification so setting version.

 

When checked, the metastore MySQL Version Table is now reflecting the Hive version 1.2. The below screenshot explains it:

 

When a user tries Hive Client using an upgraded Hive CLI (like Hive 2.1), then Hive CLI errors out with the below message:

Caused by: MetaException(message:Hive Schema version 2.1.0 does not match metastore's schema version 1.2.0 Metastore is not upgraded or corrupt)

This is because the metastore database version is already downgraded by the lower version Hive Client (Hive 1.2), and when the new version of Hive CLI tries to connect to the metastore database, it fails to connect.

 

Example 2: 

A similar issue is observed when Spark Client (like SPARK-SUBMIT,SPARK-SQL) connects to the Hive metastore database using hive-metastore-1.2.0-mapr-*.jar and having Hive metastore SCHEMA_VERSION=2.1.0 in its Version Table. This behavior is observed in Spark 2.1.0, because the Spark package is shipped with Hive 1.2 jars:
 

 

  

A user not using a remote Hive metastore service and neither schema verification will connect directly to the Hive metastore database (property hive.metastore.uris for remote metastore in hive-site.xml under Spark conf directory). Spark Client will be able to modify the schema of the Hive metastore database.

 

Remediation: 

1) The metastore version is controlled by a parameter (hive.metastore.schema.verification), which is default false in Hive 1.2 and below versions. This allows the metastore Client to connect a metastore database by doing the schema modification. Whereas in Hive 2.1, hive.metastore.schema.verification is true by default, which prevents Hive Client from changing the schema of the metastore database. Hence, when you try a connection to the Hive metastore database (which is already modified by Hive 1.2 Client) from Hive 2.1.0 Client, it errors out, even though the user hasn't used a hive.metastore.schema.verification, like Hive 1.2.

 

Best practice: 

i) Make sure hive.metastore.schema.verification is always set to true in hive-site.xml, when you are in Hive 1.2 and lower.

ii) Make the metastore database into read-only tables, so that the database administrator can control the upgrade of the tables during and after the upgrade process. Below are the grant privileges needed for Hive tables (tables may differ for different versions of Hive), so that you can prevent accidental changes to the schema of the Hive metastore database.

 

Grant privileges

Metastore DB tables

SELECT,INSERT,UPDATE,DELETE

BUCKETING_COLS ,CDS ,COLUMNS_V2 ,COMPACTION_QUEUE ,COMPLETED_TXN_COMPONENTS ,
DATABASE_PARAMS ,DBS ,DB_PRIVS ,DELEGATION_TOKENS ,FUNCS ,FUNC_RU ,GLOBAL_PRIVS ,
HIVE_LOCKS ,IDXS ,INDEX_PARAMS ,MASTER_KEYS ,NEXT_COMPACTION_QUEUE_ID ,
NEXT_LOCK_ID ,NEXT_TXN_ID ,NUCLEUS_TABLES ,PARTITIONS ,PARTITION_EVENTS ,PARTITION_KEYS ,
PARTITION_KEY_VALS ,PARTITION_PARAMS ,PART_COL_PRIVS ,PART_COL_STATS ,PART_PRIVS ,ROLES ,
ROLE_MAP ,SDS ,SD_PARAMS ,SEQUENCE_TABLE ,SERDES ,SERDE_PARAMS ,SKEWED_COL_NAMES ,
SKEWED_COL_VALUE_LOC_MAP ,SKEWED_STRING_LIST ,SKEWED_STRING_LIST_VALUES ,
SKEWED_VALUES ,SORT_COLS ,TABLE_PARAMS ,TAB_COL_STATS ,TBLS ,TBL_COL_PRIVS
,TBL_PRIVS ,TXNS ,TXN_COMPONENTS ,TYPES ,TYPE_FIELDS ,VERSION

SELECT

Version

 

2) For handling Spark 2.1.0, which is currently shipped with Hive 1.2 jar, users need to use a Hive remote metastore service (hive.metastore.uris), where metastore service is started with hive.metastore.schema.verification as TRUE for any Spark SQL context. This will force the Spark Client to talk to a higher version of the Hive metastore (like Hive 2.1.0), using lower Hive jars (like Hive 1.2), without modifying or altering the existing Hive schema of the metastore database.  

Outcomes