While MapR DB uses tablets and stores the data in a proprietary file format, I have to ask what, if any statistics are captured? (e.g. Min / Max of the rowkey?)
What about ORC or Parquet files? Since these are columnar formats, are they capturing more metadata that would allow for a more efficient query against the data?
The issue is one of performance. Since MapRDB doesn't have its own query language, you have to run either Hive or Drill on top of MapRDB. While I haven't tested Drill, Hive does have some challenges when you want to do range scans instead of full table scans. You can do things like implement secondary indexes and write inner select queries but it would be nice to know if there are other things we can do to improve performance.
Oh and I guess I could include SparkSQL but again, still need to know what we can do to take advantage of the different file structures.