AnsweredAssumed Answered

Best Spark MapR-DB Bulk Loading Strategy

Question asked by john.humphreys on Jun 8, 2017
Latest reply on Apr 19, 2018 by john.humphreys



I just found the MapR-DB OJAI Connector for Apache Spark documentation, and it seems like what I will need in order to efficiently load a ton of data (230,000 data points a second) into MapR-DB. I have some questions about the best way to do this though.


Bulk Loading

I assume that for my kind of data rates, I need to use bulk loading (tell me if not! ).

  • Can I repeatedly bulk-load to the same table while it is being queried?  I thought I read that this was not possible.
  • If you can't repeatedly load to the same table, how would you go about quickly loading a big block of data once a minute into MapR-DB?  A table a minute would be a little crazy.
  • Is there an optimal batch size for record batch-insertion?  (and does Binary vs JSON change this?)


Binary vs JSON


Thank You!