I just found the MapR-DB OJAI Connector for Apache Spark documentation, and it seems like what I will need in order to efficiently load a ton of data (230,000 data points a second) into MapR-DB. I have some questions about the best way to do this though.
I assume that for my kind of data rates, I need to use bulk loading (tell me if not! ).
- Can I repeatedly bulk-load to the same table while it is being queried? I thought I read that this was not possible.
- If you can't repeatedly load to the same table, how would you go about quickly loading a big block of data once a minute into MapR-DB? A table a minute would be a little crazy.
- Is there an optimal batch size for record batch-insertion? (and does Binary vs JSON change this?)
Binary vs JSON
- Will binary be much faster than JSON during loading?
- How do binary and JSON compare for storage size in general? Is JSON vastly larger?