AnsweredAssumed Answered

HBase Region Sizing / Splitting

Question asked by tc_dev on Sep 18, 2013
Latest reply on Sep 18, 2013 by aditya
I have several related questions / comments about region sizing.

## 1) Ideal Region Sizing?

HBase documentation strongly recommends "Bigger Regions" to reduce per-region overhead (per $ in HBase book ). Is there a counter-point to this view?

In other words why have 256M as the default instead of just making all regions 4G or so? Specifically are there any issues / caveats with having 4G vs. 256M regions on MapRFS? Why not make all regions 4G?

## 2) Manual Compactions After Merge Causes Splits?

After getting rid of several small regions through merge tool I found that the merge simply copies store files together, without compacting them. So I decided to run compaction manually via shell (major_compact 'tablename').

I was very surprised to find that for every table this operation forced the split of __exactly__ one region. The bigger tables split one largest region. A smaller table with just one 139M region also split in two, despite being under the 256M size default. This behavior just seems very wrong. What could be causing it and how to fix it?

## 3) Region Max Size Defaults Ignored?

Previous item brings up a bigger issue of max size defaults being ignored. Aside from this happening in the above case, initial load of data into the table caused creation of several 20M, 50M and 100M regions, while other regions grew to >2G.

I did not have *hbase.hregion.max.filesize* set, but isn't it supposed to default to 256M if not specified explicitly? Whatever the effective default is supposed to be I think it was really strange to have the same table produce regions sized two orders of magnitude apart sitting side by side (as well as split regions under the max size).

## Summary

Do you have any recommendations for fixing the above issues, setting sensible region sizing and actually enforcing them (preventing premature / unsanctioned region splitting). I understand one of the answers would be "use M7 and get rid of regions", but for now I just want to deploy regular HBase before exploring the upgrade to M7.