AnsweredAssumed Answered

Improve hive query performance underlying HBase table

Question asked by Santhoshkb on Jun 1, 2017
Latest reply on Jun 5, 2017 by maprcommunity

I have created Hive table from HBase using Create external table.

 

Command:

CREATE EXTERNAL TABLE table_abc(rowkey string, value map<String,String>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "data:")
TBLPROPERTIES("hbase.table.name" = "hbase_table","hbase.mapred.output.outputtable" = "hbase_table");


Hive table name is table_abc with 2 columns 1. Row Key (Default) 2. Value(Map datatype)

 

Total count in hbase_table/Table_abc is around 300 million

 

Data looks like(In Hive table)

 

RowKey(Column 1)       Value(Map Datatype - Column 2)

abc                                  { "Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
bdc                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
efg                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
hik                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}

 

Select * from table_abc where rowkey='abc' is working fine.

But I have a use case where I can search a record using map column as well
i.e Select * from table_abc where value["Akey"] = "Avalue"
above query is taking more than 10-15 mins to execute, sometimes timeout error.

 

How can I improve the performance of Hive underlying Hbase table, Please help

 

I have used below options, nothing seems to be working out

 

set mapreduce.map.memory.mb=7000;
set mapreduce.reduce.memory.mb=7000;
set mapred.reduce.child.java.opts=-Xmx8000m;
set mapred.map.child.java.opts=-Xmx8000m;
set hive.auto.convert.join=false;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true; 

Outcomes