Hive Locks -- Table/Partition Level

Document created by Hao Zhu Employee on Feb 17, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: November 11, 2014

 

Hive supports concurrency and table/partition level locks. Currently only "Share" and "Exclusive" type locks are available in Hive. Share lock is for reading, and anything else requires an Exclusive lock.

 

The below information can help you get started with using this feature.

1. To enable the locking feature, hive.zookeeper.quorum and hive.support.concurrency need to be set in hive-site.xml.  If using Hiveserver, Hiveserver2 and/or Hive metastore the service needs to be restarted after updating the configuration in hive-site.xml.

<property>

  <name>hive.zookeeper.quorum</name>

  <value>n1a,n2a,n3a</value>

</property>

 

<property>

  <name>hive.support.concurrency</name>

  <value>true</value>

</property>

Replace the value of 'hive.zookeeper.quorum' with a comma separated list of hostnames for your nodes running Zookeeper. 

2. How to check if a specific table or partition is locked.

SHOW LOCKS <TABLE_NAME>; 
SHOW LOCKS <TABLE_NAME> EXTENDED;
SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>);
SHOW LOCKS <TABLE_NAME> PARTITION (<PARTITION_DESC>) EXTENDED;

3.  How to manually lock and unlock table or partitions in a Hive session.

Prepare the table and partition table:

create table passwords (col0 string, col1 string, col2 string, col3 string, col4 string,

col5 string, col6 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ":";

 

load data local inpath "/etc/passwd" into table passwords;

 

create table testpart (

col0 string, col1 string, col2 string,

col3 string, col4 string,col5 string, col6 string)

PARTITIONED BY (one STRING, two STRING) ;

 

INSERT OVERWRITE TABLE testpart

PARTITION(one = 'a', two = 'b')

SELECT * FROM passwords;

For Table:

 

0: jdbc:hive2://localhost:10000/default> lock table passwords shared;

No rows affected (0.226 seconds)

0: jdbc:hive2://localhost:10000/default> show locks extended;

+-----------------------------------------------+---------+

| tab_name | mode |

+-----------------------------------------------+---------+

| default@passwords | SHARED |

| LOCK_QUERYID: | NULL |

| LOCK_TIME:1415758308800 | NULL |

| LOCK_MODE:EXPLICIT | NULL |

| LOCK_QUERYSTRING:lock table passwords shared | NULL |

+-----------------------------------------------+---------+

5 rows selected (0.123 seconds)

0: jdbc:hive2://localhost:10000/default> unlock table passwords;

No rows affected (0.177 seconds)

0: jdbc:hive2://localhost:10000/default> show locks extended;

+-----------+-------+

| tab_name | mode |

+-----------+-------+

+-----------+-------+

No rows selected (0.105 seconds)

For partition:

 

0: jdbc:hive2://localhost:10000/default> lock table testpart PARTITION(one = 'a', two = 'b') exclusive;

No rows affected (0.205 seconds)

0: jdbc:hive2://localhost:10000/default> show locks extended; 

+---------------------------------------------------------------------------------+------------+

| tab_name | mode |

+---------------------------------------------------------------------------------+------------+

| default@testpart@one=a/two=b | EXCLUSIVE |

| LOCK_QUERYID: | NULL |

| LOCK_TIME:1415758674295 | NULL |

| LOCK_MODE:EXPLICIT | NULL |

| LOCK_QUERYSTRING:lock table testpart PARTITION(one = 'a', two = 'b') exclusive | NULL |

+---------------------------------------------------------------------------------+------------+

5 rows selected (0.104 seconds)

0: jdbc:hive2://localhost:10000/default> unlock table testpart PARTITION(one = 'a', two = 'b');

No rows affected (0.209 seconds)

0: jdbc:hive2://localhost:10000/default> show locks extended;

+-----------+-------+

| tab_name | mode |

+-----------+-------+

+-----------+-------+

No rows selected (0.174 seconds)

4.  How to get the Hive lock information from Zookeeper when locking occurs.

For example, if we lock the table "passwords" in shared mode:

lock table passwords shared;

Login to the Zookeeper quorum:

/opt/mapr/zookeeper/zookeeper-3.4.5/bin/zkCli.sh -server <IP of zookeeper>:<port of zookeeper>

 

[zk: xx.xx.xx.xx:5181(CONNECTED) 39] ls /hive_zookeeper_namespace/default/passwords 

[LOCK-SHARED-0000000000]

[zk: xx.xx.xx.xx:5181(CONNECTED) 40] get /hive_zookeeper_namespace/default/passwords

cZxid = 0x600001eeb

ctime = Tue Nov 11 18:21:04 GMT-08:00 2014

mZxid = 0x600001eeb

mtime = Tue Nov 11 18:21:04 GMT-08:00 2014

pZxid = 0x600001eec

cversion = 1

dataVersion = 0

aclVersion = 0

ephemeralOwner = 0x0

dataLength = 0

numChildren = 1

5. If DML/DDL only happens on leaf partition, then only leaf partition will be locked in Exclusive mode. Parent partitions and parent table will be locked in Shared mode. 

For example: If inserting to the leaf partition of table "testpart":

INSERT OVERWRITE TABLE testpart 
PARTITION(one = 'a', two = 'b')
SELECT  * FROM passwords;

Here is locking status from hive:

0: jdbc:hive2://localhost:10000/default> show locks extended;

+--------------------------------------------------------------------------------------------------------------------+------------+

| tab_name | mode |

+--------------------------------------------------------------------------------------------------------------------+------------+

| default@passwords | SHARED |

| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9 | NULL |

| LOCK_TIME:1415756712558 | NULL |

| LOCK_MODE:IMPLICIT | NULL |

| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords limit 1 | NULL |

| default@testpart | SHARED |

| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9 | NULL |

| LOCK_TIME:1415756712558 | NULL |

| LOCK_MODE:IMPLICIT | NULL |

| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords limit 1 | NULL |

| default@testpart@one=a | SHARED |

| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9 | NULL |

| LOCK_TIME:1415756712558 | NULL |

| LOCK_MODE:IMPLICIT | NULL |

| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords limit 1 | NULL |

| default@testpart@one=a/two=b | EXCLUSIVE |

| LOCK_QUERYID:mapr_20141111174545_9d578e82-0588-4dcf-80a5-0cd6ce8f05a9 | NULL |

| LOCK_TIME:1415756712558 | NULL |

| LOCK_MODE:IMPLICIT | NULL |

| LOCK_QUERYSTRING:INSERT OVERWRITE TABLE testpart PARTITION(one = 'a', two = 'b') SELECT * FROM passwords limit 1 | NULL |

+--------------------------------------------------------------------------------------------------------------------+------------+

Here is the corresponding Zookeeper znode information for the Hive locks:

[zk: xx.xx.xx.xx:5181(CONNECTED) 35] ls /hive_zookeeper_namespace/default/testpart 
[LOCK-SHARED-0000000000, one=a]
[zk: xx.xx.xx.xx:5181(CONNECTED) 37] ls /hive_zookeeper_namespace/default/testpart/one=a
[LOCK-SHARED-0000000000, two=b]
[zk: xx.xx.xx.xx:5181(CONNECTED) 38] ls /hive_zookeeper_namespace/default/testpart/one=a/two=b
[LOCK-EXCLUSIVE-0000000000]

6. Lock holder and lock waiter behavior.

When the lock waiter (the process trying to obtain a lock) is asking for a Share or Exclusive lock while the lock holder is holding an Exclusive lock OR the lock waiter is asking for an Exclusive lock while the lock holder is holding a Share lock the waiter will retry for "hive.lock.numretries" times (default 100 times). Each retry interval is "hive.lock.sleep.between.retries" seconds (default 60 seconds).

 

If the waiter still cannot obtain the desired lock after  "hive.lock.numretries" retries, the waiter will fail.
The two parameters can be set in hive-site.xml for system level or at session level.

 

For example, HiveServer1:

hive> set hive.lock.numretries=5;

hive> set hive.lock.sleep.between.retries=3;

hive> select count(*) from passwords;      

conflicting lock present for default@passwords mode SHARED

conflicting lock present for default@passwords mode SHARED

conflicting lock present for default@passwords mode SHARED   

conflicting lock present for default@passwords mode SHARED

FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time

HiveSever2:

0: jdbc:hive2://n1a:10000/default> set hive.lock.numretries=5;

No rows affected (0.005 seconds)

0: jdbc:hive2://n1a:10000/default> set hive.lock.sleep.between.retries=3;

No rows affected (0.003 seconds)

0: jdbc:hive2://n1a:10000/default> select count(*) from passwords;

Error: Error while processing statement: FAILED: Error in acquiring locks: Locks on the underlying objects cannot be acquired. retry after some time (state=42000,code=10)

7. What logging occurs when locking happens?

In /tmp/mapr/hive.log, eg:

ERROR [pool-1-thread-88]: ZooKeeperHiveLockManager (SessionState.java:printError(545)) - conflicting lock present for default@passwords mode SHARED

  Review the Hive session log, typically hive.log under /tmp/<user> where <user> is the user launching the Hive session for more information about Hive lock activity. 

Attachments

    Outcomes