AnsweredAssumed Answered

MapR Doc DB Storage?

Question asked by MichaelSegel on Jan 9, 2017
Latest reply on Jan 16, 2017 by maprcommunity

MapR-DB does not store the documents as a whole in a single location. Instead, MapR-DB creates fields for each attribute and nested documents/attributes. This allows MapR-DB to access the information very efficiently. When you read, for example using projection, or when you edit a document, only the necessary fields will be modified. MapR-DB can store very large documents, for example, multi-GB documents, if the application requires it.

This was taken from the MapR Documentation.

So if I understand this correctly...


For each JSON record you have a structure of <label attribute>:<object> where the object is either a single Attribute Value, an Array of Values, or a nested object. (A set of attributes which contain objects)

See link:

A document looks like this:

{     "_id" : "001",     "first_name" : "John",     "last_name" : "Doe",     "age" : 45,     "email" : "jd@mydoc.com",     "interests" : ["sports", "movies"],     "address" : {             "street" : "1015 Main Street",             "city" : "San Jose",             "state" : "CA",             "zip" : "95106"         } }

So when it comes to storing the data...

 

For a given record, you will have one row in MapRDB where each attribute would be a single cell and the descriptor is the attribute?

Or do you store the attribute separately and assign a tag of sorts as the column descriptor?

(This would reduce the storage requirements because attribute descriptors tend to be long. E.g first_name)

[The attribute to tag mapping could be stored in a separate table. ]

 

But then do you store the arrays in a single cell?

 

What about sub records like Address?  Would that also be stored in a single cell, or a set of cells? With different tags?

 

The reason I ask is that there are a couple of ways to do this and each option has its pluses and minuses.

 

While this is internal to MapR DocDB, understanding the storage schema is going to be important in terms of understanding how to improve performance.

 

Thx

 

-Mike

Outcomes