hello , can anyone post on performance tuning while we run hive query on hbase table.
hive gives fast result but when we hive on a hbase tables it slower.
Hi Datta Sri,
It would be beneficial if you provide more details about your environment and versions you are on.
We are running hive query accessing hbase table . We are running on 5.1
hive query .. it just a select statement ..same table we have in hbase... running hive query and accessing the table
hvie-hbase is slower ..
Hi dong meng,
Do you have any suggestions for Datta to help improving speed on hbase?
Can you share few thoughts ?
Not sure if carol mcdonald's An In-Depth Look at the HBase Architecture may help you.
try explain to see what your query is doing. LanguageManual Explain - Apache Hive - Apache Software Foundation
Hive on top of HBase will be slower than Hive going against ORC or Parquet file formats. The short answer is that there's a higher cost in retrieving data from HBase than from reading directly from the file system.
The other issue is that the query tends to perform a complete table scan unless you modify your query to perform range scans. You have to add a filter to your where clause against the row key.
HBase was really designed to allow for fast access to small ranges within a table. (A get() is really a specialized scan) So you end up paying a price when you try to run queries that need to hit the full table.
Please let us know if Carol and Michael's feedback are helpful. If they are, please help to mark them "Helpful" or "Correct" to show your appreciation and help the rest of community to learn. If you need further assistance, please answer Carol's question.
Retrieving data ...