AnsweredAssumed Answered

Improve hive query performance underlying HBase table

Question asked by Santhoshkb on Jun 1, 2017
Latest reply on Jun 5, 2017 by maprcommunity

I have created Hive table from HBase using Create external table.



CREATE EXTERNAL TABLE table_abc(rowkey string, value map<String,String>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "data:")
TBLPROPERTIES("" = "hbase_table","hbase.mapred.output.outputtable" = "hbase_table");

Hive table name is table_abc with 2 columns 1. Row Key (Default) 2. Value(Map datatype)


Total count in hbase_table/Table_abc is around 300 million


Data looks like(In Hive table)


RowKey(Column 1)       Value(Map Datatype - Column 2)

abc                                  { "Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
bdc                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
efg                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}
hik                                  {"Akey":"Avalue","Bkey":"Bvalue","Ckey":"Cvalue","Dkey":"Dvalue","Ekey":"Evalue"}


Select * from table_abc where rowkey='abc' is working fine.

But I have a use case where I can search a record using map column as well
i.e Select * from table_abc where value["Akey"] = "Avalue"
above query is taking more than 10-15 mins to execute, sometimes timeout error.


How can I improve the performance of Hive underlying Hbase table, Please help


I have used below options, nothing seems to be working out


set mapreduce.reduce.memory.mb=7000;
set hive.cbo.enable=true;
set hive.compute.query.using.stats=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;