Hive Metastore configuration with PostgreSQL

Blog Post created by bkumar on Dec 22, 2016

In this blog, I would like to add steps for MapR-Hive metastore configuration with PostgreSQL.


The Hive metastore service stores the metadata for Hive tables and partitions in a relational database, and provides clients access to this information using the MetaStore service. By default  Hive uses the Embedded database  Derby to store  Hive tables and partitions  info as its metadata. Since the Embedded database  Derby limited features and recommended to use any relational databases as its metastore . Here, I would like to add PostgreSQL as hive metastore. 



mapr-core-5.1 : MapR-5.1 core packages downloadable link
mapr-hivemetastore-1.2 : MapR-hive-1.2 downloadable link
mapr-hiveserver2-1.2 :MapR-Hive-1.2 downloadable link

mapr-hive-1.2 MapR-Hive-1.2 Downloadable link

Opearating System: CentOS6.x/Redhat-6.x


Here I have used two different nodes for database and hive services.


PostgresNode --->postgres-server running node.
HiveNode ---> hs2 , hivemetastore and postgres client  running node.


Configuring postgres server:


Step 1:

Installing postgres-server at PostgresNode node
[root@PostgresNode mapr]#yum install postgresql-server

Start the postgresql service using below command.
[root@PostgresNode mapr]#service postgresql initdb


Step 2:

Add the listen_addresses to all and standard_conforming_strings to off state. By default it is local
[root@PostgresNode mapr]# vim /var/lib/pgsql/data/postgresql.conf
                        listen_addresses = '*'
                        standard_conforming_strings = off


Step 3:

Configure the authentication for the network in pg_hba.conf.
The following example allows all users to connect from all hosts to all databases:
[root@PostgresNode mapr]# vim /var/lib/pgsql/data/pg_hba.conf
                     host all all md5


Restart the postgresql service
[root@PostgresNode mapr]# service postgresql start
Starting postgresql service: [ OK ]


Use chkconfig utility to ensure that PostgreSQL server will start at a boot time.
For example:
[root@PostgresNode mapr]#chkconfig postgresql on

Use the chkconfig utility to verify that PostgreSQL server will be started at boot time,
for example:
[root@PostgresNode mapr]# chkconfig --list postgresql
                     postgresql 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@PostgresNode mapr]#


Step 4:

Install the Postgres JDBC Driver on metastore running node.
[root@HiveNode ~]#yum install postgresql-jdbc

[root@HiveNode ~]# ll /usr/share/java/postgresql-jdbc.jar
lrwxrwxrwx. 1 root root 27 Dec 19 09:57 /usr/share/java/postgresql-jdbc.jar -> postgresql-jdbc-8.4.704.jar


create a symbolic link to postgres-jdbc driver using below command.
[root@HiveNode ~]# ln -s /usr/share/java/postgresql-jdbc.jar /opt/mapr/hive/hive-1.2/lib/postgresql-jdbc.jar


verify the symbolic link has done or not.
[root@HiveNode ~]# ll /opt/mapr/hive/hive-1.2/lib/postgresql-jdbc.jar
lrwxrwxrwx. 1 root root 35 Dec 19 09:58 /opt/mapr/hive/hive-1.2/lib/postgresql-jdbc.jar -> /usr/share/java/postgresql-jdbc.jar
[root@HiveNode ~]#


Step 5:

Create the metastore database and user account
[root@HiveNode ~]# locate hive-schema

copy this required hive-schema file into postgres-server running node and grand required privileges
[root@HiveNode ~]# ll /opt/mapr/hive/hive-1.2/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql
-rw-r--r--. 1 mapr root 41969 Sep 27 00:50 /opt/mapr/hive/hive-1.2/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql

Copying this hive-schema file to postgres-server node.
[root@HiveNode ~]# scp /opt/mapr/hive/hive-1.2/scripts/metastore/upgrade/postgres/hive-schema-1.2.0.postgres.sql root@PostgresNode:/tmp/
root@PostgresNode's password:
hive-schema-1.2.0.postgres.sql 100% 41KB 41.0KB/s 00:00


Step 6:
Creating metastore in postgres DB and provide required privileges to it.
[root@PostgresNode mapr]# su postgres
bash-4.1$ psql -U postgres
postgres=# CREATE USER hiveuser WITH PASSWORD 'mapr'
postgres-# CREATE DATABASE metastore;
postgres=# \c metastore
psql (8.4.20)
You are now connected to database "metastore".
metastore=# \i /tmp/hive-schema-1.2.0.postgres.sql





postgres=# \c metastore
psql (8.4.20)
You are now connected to database "metastore".
metastore=# \pset tuples_only on
Showing only tuples.
metastore=# \o /tmp/grant-privs
metastore=# SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' || schemaname || '". "' ||tablename ||'" TO hiveuser ;' FROM pg_tables WHERE tableowner = CURRENT_USER and schemaname = 'public';

metastore=# \o
metastore=# \pset tuples_only off
Tuples only is off.
metastore=# \i /tmp/grant-privs

100% 41KB 41.0KB/s 00:00



Step 7:
From the metastore running node install the postgres client package and try to loggining in
[root@HiveNode ~]#yum install postgresql
[root@HiveNode ~]# psql -h PostgresNode -U hiveuser -d metastore
Password for user hiveuser:<mapr>

metastore=> \c metastore
psql (8.4.20)
You are now connected to database "metastore".
metastore=> \l
                                       List of databases
Name | Owner | Encoding | Collation | Ctype | Access privileges
metastore | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres: postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres: postgres=CTc/postgres
test | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |


Step 8:

Configure the Metastore Service to Communicate with the PostgreSQL Database

[root@HiveNode mapr]#  vim /opt/mapr/hive/hive-1.2/conf/hive-site.xml







<description>IP address (or fully-qualified domain name) and port of the metastore host</description>



Step 9:
Restart the hs2 and hivemeta services using below comand on these services running node.
[root@HiveNode mapr]# maprcli node services -name hs2 -action restart -nodes `hostname`

[root@HiveNode mapr]# maprcli node services -name hivemeta -action restart -nodes `hostname`


Step 10:
Once hs2 and hivemeta services are up , loggin to hive client and do operations.
[mapr@HiveNode mapr]$ hive
hive> show databases;
hive> create table test(id int);
hive> load data local inpath '/home/mapr/test.txt' into table test;

hive> select *from test;


Step 11:

Check the created hive table  schema in PostgreSQL metastore DB "TBLS" table. use below query to check the same.

metastore=> select *from TBLS;



The output will be looking like as above.


So from the above seen TBLS table, we have observed that the hive created tables are created inside the postgreSQL DB in TBLS table for its metadata. If the hive created tables are not shown in this TBLS table inside metastore Database(In PostgreSQL), check whether the metastore properly configured or not and metastore service is properly pointed to DB or not.


For any clarifications, feel free to comment here at MapR converged Community..