What is meant by Compaction in Hbase?
(At a very high level....)
Suppose you have an HBase table, and the data in that table changes (or is added to) over time. As an example, take customer data. Initially, you may only have a potential customer's name. In later interactions, you collect additional information on the customer (like purchases made, or credit card information). Or, that information might change (the customer updates their credit card, or gets a new email address). The initial entry (customer name) is written as part of one file. With HBase, data cannot be updated in place. So the new (or updated) information gets written to a different file. Over time, you end up with information for that one customer in several different files. To read all that customer's information, you need to access multiple files, which impacts performance.
Compaction reads the information from all those different files, writes it all into a single file, and also deletes information that is no longer valid. It makes access to the data more efficient.
Retrieving data ...