AnsweredAssumed Answered

The balancer is not working correctly

Question asked by pablo on Jun 1, 2012
Latest reply on Jun 1, 2012 by steven
Hello,

The data is not correctly balanced in the cluster. Below there you have my scenerie:

 1. I'm working with a trial of M5 version.
 2. List item4.
 3. I have 1 cluster with 6 nodes.
 4. I have 3 nodes for CLDB.
 5. I check that CLDB is running as well.

Before to insert the data, the balancer was turned **on**,

    maprcli config save -values {"cldb.balancer.disk.paused":"0"}
    maprcli config save -values {"cldb.balancer.role.paused":"0"}

Also, only for test purposes I configured the balancer to move the data if the node is at least 5% full because I want to start the balance of the nodes before to have a node at 70% of capacity.

    maprcli config save -values {"cldb.balancer.disk.threshold.percentage":"5"}

But the nodes are not correctly balanced because the data is distributed only in the nodes where the CLDB process is running.

    $ maprcli dump balancerinfo
    ip:port              usedMB  fullnessLevel  fsid                 spid                              percentage  outTransitMB  inTransitMB  capacityMB 
    10.90.245.70:5660-   4246    Average        5596617164943689343  23788a04beb467dc004fa16f5f06bf1a  9           0             0            44641      
    10.93.73.184:5660-   4246    Average        6472875469568091424  b9585b01f7c2dd52004fa16fa3032d2d  9           0             0            44641      
    10.87.141.213:5660-  4246    Average        8510787454975032236  f44cb37836c1029b004fa1db6a068688  9           0             0            44641      

So, I checked the nodes info,

    $ maprcli node list -columns id,h,hn,br,da,dtotal,dused,davail,fs-heartbeat
    id                   davail  dused  bytesReceived  hostname                                   dtotal  health  fs-heartbeat  ip            
    357962820359655901   0       0      0              domU-12-31-38-00-4E-75.compute-1.internal  0       4       1338570769    10.252.81.127 
    6472875469568091424  39      4      565            ip-10-120-179-43.ec2.internal              43      0       0             10.120.179.43 
    8510787454975032236  39      4      561            ip-10-120-241-56.ec2.internal              43      0       0             10.120.241.56 
    1592873460759174276  0       0      0              ip-10-87-147-136.ec2.internal              0       4       1338570769    10.87.147.136 
    5596617164943689343  39      4      3474           ip-10-93-73-184.ec2.internal               43      0       0             10.93.73.184  
    6028873629841166767  0       0      0              ip-10-93-73-28.ec2.internal                0       4       1338570769    10.93.73.28   

 - As you can see, I have three nodes in a state number 4 (Upgrating). What does it means?
 - I'm thinking that this could be the reason because the data is not correctly distributed in each node. What you think?
 - Do you know how can I resolve the state number 4 on the nodes?

Thanks!

Outcomes