Can anyone clarify how load balancing is taken in MapR Cluster?
Load balancing is a way to make sure that popular data does not cause bottlenecks in the system, if many users or applications are trying to access that data at the same time. If you make a mirror copy of a volume that is frequently accessed, then read requests can use the mirror volume as well as the source volume.
Can you provide a little more detail about what you are asking? The possibilities that come to mind are using local mirrors for load balancing, or using the role and/or disk balancers. Or you might be asking about load balancing in general. If you would provide a little more detail, I will try to help!
I am asking this question wrt ESS 102. ".
MapR-FS also provides the option of mirroring entire volumes. Mirroring provides
additional remote disaster recovery back-ups, as well as local load balancing.
By default there will be 3 replicas for every data. In case if many users or applications try to access the data at the same time, does the requests try to read from replica 1, replica 2 and replica 3 which all belong to same container? Correct If I am wrong. When does the situation of accessing from Mirror volume comes?
To answer your question directly, yes all replicas can / will be read from.
A deeper look is as such:
There are two types of containers in MapR
Assuming three way replication you have a Master Container, Secondary, and Tail container. All writes are written to the Master but all reads can be read from any container.
When a file access request is made to the CLDB a mapping is provided back to the client, the client can read from any of these containers within the mapping.
I am coming across this new terminology Name Container and Data Container. Can you please elaborate on it.
The storage structure of MapR-FS is as such:
Each volume that is created receives a single Name Container which is replicated throughout its volume. The amount of Data Containers that live within your volume depends on your data set, for the most part this doesnt matter and for this explanation lets assume your dataset required 100 data containers.
Your volume consists of 1 Name Container and 100 Data Containers. Your Name Container contains all metadata for all of your Data Containers within that volume as well as the first 64k of each file within that volume. After replication you have 3 copies of each of these containers.
To simplify this, the Name Container is responsible for tracking whats within the Data Containers and the Data Containers store the raw data.
Hope this clears it up
To add to what Deborah mentioned. MapR can spread read requests through mirrors if done correctly.
I Have Three Volumes:
If I want to increase read spread when accessing /1/2 I would need to create local mirrors on all 3 volumes. This will essentially make your file system read only, however a special .rw directory will be created in /.rw for any updates that are needed after the load balancing was enabled.
Found 53 items
drwxr-xr-x - mapr mapr 42 2016-07-12 11:26 /.rw/1
drwxr-xr-x - mapr mapr 42 2016-07-12 11:26 /.rw/1-m
drwxr-xr-x - mapr mapr 43 2016-07-12 11:35 /.rw/2-m
-rw-r--r-- 3 mapr mapr 45 2016-07-12 11:01 /.rw/Trolltech.conf
drwxr-xr-x - mapr mapr 0 2016-07-12 10:07 /.rw/apps
-rw-r--r-- 3 mapr mapr 148 2016-07-12 11:01 /.rw/asound.conf
drwxr--r-- - root root 1 2016-07-12 10:28 /.rw/benchmarks
Trying to copy a file into any of my volumes will results in an error unless I copy them into .rw.
Also please note any break in the mirror volume chain will also remove load balancing, if i removed /1 from being mirrored for example.
Hope this helps.
Retrieving data ...