AnsweredAssumed Answered

Mapr M7 : Best strategy for deleting data from M7 (last version) for a given set of keys

Question asked by bhardwaj_rajesh on Jul 24, 2014
We have complex spark jobs, sometimes jobs fail in the middle (and if some data is written into HBase, that data is wrong). Is there a way to delete the last version of data for a given set of keys or can anyone suggest a better way

**Update : Found the answer**
Delete functionality in Hbase
public class Delete
extends Mutation
implements Comparable<Row>
Used to perform Delete operations on a single row.
To delete an entire row, instantiate a Delete object with the row to delete. To further define the scope of what to delete, perform additional methods as outlined below.

To delete specific families, execute deleteFamily for each family to delete.

To delete multiple versions of specific columns, execute deleteColumns for each column to delete.

To delete specific versions of specific columns, execute deleteColumn for each column version to delete.

Specifying timestamps, deleteFamily and deleteColumns will delete all versions with a timestamp less than or equal to that passed. If no timestamp is specified, an entry is added with a timestamp of 'now' where 'now' is the servers's System.currentTimeMillis(). Specifying a timestamp to the deleteColumn method will delete versions only with a timestamp equal to that specified. If no timestamp is passed to deleteColumn, internally, it figures the most recent cell's timestamp and adds a delete at that timestamp; i.e. it deletes the most recently added cell.

The timestamp passed to the constructor is used ONLY for delete of rows. For anything less -- a deleteColumn, deleteColumns or deleteFamily -- then you need to use the method overrides that take a timestamp. The constructor timestamp is not referenced.