AnsweredAssumed Answered

How does node and volume topology impact storage efficiency and how rigidly is it enforced?

Question asked by davidehle on Mar 15, 2017
Latest reply on Mar 24, 2017 by mufeed

If a cluster is segmented into multiple layers of node topologies Ex:







Based on the MapR 5.2 Docs, I believe the CLDB will try to distribute container replicas so that the loss of any one rack will not lose all replicas of the container.


Question: If the MapRFS is near capacity, and the balancer can not make enough space to balance large containers, what happens? Is there any alert if distribution across topology is not at target? 


Using a Volume Topology allows assigning volumes to a specific Node Topology.  All the containers on those volumes will then be stored on nodes in associated Node Topology.


1. If  the volume with an assigned Volume Topology, or containers in the volume grows too big to be accommodated by the SPs in the associated Node Topology, what happens?    

    * Do new writes fail, even if there is sufficient capacity on Nodes/SP outside the assigned topology?  

    * Is data written to nodes outside the topology?

    * What alert if any would be raised? 

2. In the example above, if 3 replicas are requested, but the Node topology associated with the Volume Topology was switch1 with only two sub-topologies, rack2 and rack3, how would the 3rd replica be distributed to reduce risk?
3. How does capacity balancing work with regards to node and volume topologies? Ex -if "cldb.balancer.disk.threshold.percentage" was set to 50% 
5. Are there any problematic interactions between Label based scheduling and Topologies if the labels and topologies to not match?

In general are there any frequently overlooked pitfalls or risks involved in using node or volume topologies? I have read the MapR 5.2 Documentation, which does a good job of explaining what can be done, but does not detail risks or possible issues that might arise.

Thanks in advance for any answers!