AnsweredAssumed Answered

HBase completebulkload in M7

Question asked by volans on Dec 9, 2013
Latest reply on Dec 11, 2013 by volans
We are writing a mapreduce job to bring lots of updates to a HBase table and currently this job is being tested in a M5 cluster.

Our current approach is to use HFileOutputFormat, essentially generating HBase data files in HDFS. Then run completebulkload tool (or do LoadIncrementalHFiles.doBulkLoad() in the code) to import updates to HBase.

Now the question is whether this approach still works in M7. M7 has no region servers but part of what completebulkload does is to determine the region that the output HFile belongs to, talk to that region server, and then move the HFile into its storage directory. And given M7 does not have region servers, I have the following questions:

 - Would hbase jar in M7 come with completebulkload? More importantly, would completebulkupload work in M7?
 - If so, would it work just as efficient, or otherwise?
 - If not, how would I bring HFiles into HBase? If that is not possible, what is the recommended way of doing bulk HBase update in map/reduce job other than using TableOutputFormat?

We need an approach that works in both M5 and M7. Any input is appreciated. Thanks.

Outcomes