I am wondering how we can handle inflow of frequently changing data in MapR FS as files and later use the files from drill to query them .
Scenario is like : We have Purchase Order ( PO) Data coming from ERP as file extract into MapR-FS ( we have no Plans to use Mapr-DB as we have to do further processing to load into MapR-DB tables).
On day 1 File contain PO1 , PO2 and PO3 with same values in columns C1 , C2 , C3 as X , Y , 'blank" with order QTY = 100 units and Received Qty = 0
On Day 2 file contain few changes to PO1 with new values in the columns C1 , C2 , C3 as X, Z, Z with Order Qty = 100 units and received Qty = 95 units and we also get new PO4 and PO5
In such cases how Drill should know which files to pick and columns to add as sum and columns not to add sum and from which file latest information need to pick. As each file is not full snapshot of the all POs , daily new POS and change in PO will be coming in files.
Did you guys having scenarios like that build in Big-data directly in FS level ?
As per my understanding Big-data prefer Immutability in Data like web logs once record comes it never changes.