AnsweredAssumed Answered

Balancing after adding drives

Question asked by nneubauer on Dec 10, 2012
Latest reply on Dec 16, 2012 by nabeel
I recently added 3 drives to 3 nodes in my MapR deployment which seemed to work fine. Afterwards I increased my redundancy level from 2 to 3 on a specific volume (Hypertable database volume) and MapR started to rereplicate. However, database didn't come up and I decided to reboot all nodes. Afterwards I see the volume in question having

18% 1x redundancy
13% 2x redundancy
68% 3x redundancy

(which doesn't add up 100 but might be rounding related...)

However, that did not change for the past 24 hours and I see in the cldb.log file lots of:

    2012-12-11 11:52:00,616 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-719]: Processing stale containers  on StoragePool 0cb6fa6e3376285800503ca4d001d25c from FileServer XXX.XXX.XXX.XXX:5660-

No errors on the master node in mfs.log. I'm not sure what is going on and if processes are supposed to take this long. Can/Should I manually start rereplicating? Machine Monitoring shows about no CPU activity, as well as no Hard Drive or Network activity, as well.

Any advice?


#edit

----


1) I'm running: 1.2.3.12961.GA

2) maprcli dump rereplicationinfo -json:

    [root@masternode~]# maprcli dump rereplicationinfo -json
    {
     "timestamp":1355301682947,
     "status":"OK",
     "total":0,
     "data":[
     
     ]
    }

---

IPs shortened. Interestingly one of the nodes is flagged as "Invalid". I don't know what that exactly means but the node is green in the web interface of MapR and has plenty of space left on (two) hard drives. It was *not* one of the nodes I added disks to. However, I noted that the interface shows 64% disk usage while the breakdown of the disks in the machine shows one volume with 48% usage and one volume with 0% usage. That does not add up, it seems.

    [root@masternode ~]# maprcli dump replicationmanagerinfo -volumename hypertable -json
    {
     "timestamp":1355301757255,
     "status":"OK",
     "total":5,
     "data":[
      {
       "VolumeName":"hypertable",
       "VolumeId":187333067,
       "VolumeTopology":"/default-rack",
       "VolumeUsedSizeMB":334466,
       "VolumeReplication":3,
       "VolumeMinReplication":2
      },
      {
       "ContainerId":2322,
       "Epoch":20,
       "Master":"131.173.32.143:5660--20-VALID",
       "ActiveServers":{
        "IP:Port":[
         "143:5660--20-VALID",
         "119:5660--18-RESYNC"
        ]
       },
       "InactiveServers":{
        "IP:Port":"145:5660--20-INVALID"
       },
       "UnusedServers":{
       
       },
       "OwnedSizeMB":"3.15 GB",
       "SharedSizeMB":"0 MB",
       "LogicalSizeMB":"3.15 GB",
       "Mtime":"Mon Dec 10 14:57:15 CET 2012"
      },
      {
       "ContainerId":2140,
       "Epoch":31,
       "Master":"131.173.32.143:5660--31-VALID",
       "ActiveServers":{
        "IP:Port":[
         "143:5660--31-VALID",
         "114:5660--31-VALID",
         "119:5660--29-RESYNC"
        ]
       },
       "InactiveServers":{
       
       },
       "UnusedServers":{
       
       },
       "OwnedSizeMB":"568 MB",
       "SharedSizeMB":"0 MB",
       "LogicalSizeMB":"568 MB",
       "Mtime":"Mon Dec 10 14:57:15 CET 2012",
       "NameContainer":"true"
      },
      {
       "ContainerId":2569,
       "Epoch":17,
       "Master":"131.173.32.143:5660--17-VALID",
       "ActiveServers":{
        "IP:Port":[
         "143:5660--17-VALID",
         "119:5660--16-RESYNC"
        ]
       },
       "InactiveServers":{
        "IP:Port":"145:5660--17-INVALID"
       },
       "UnusedServers":{
       
       },
       "OwnedSizeMB":"12.41 GB",
       "SharedSizeMB":"0 MB",
       "LogicalSizeMB":"12.52 GB",
       "Mtime":"Mon Dec 10 13:19:03 CET 2012"
      },
      {
       "ContainerId":2571,
       "Epoch":17,
       "Master":"131.173.32.143:5660--17-VALID",
       "ActiveServers":{
        "IP:Port":[
         "143:5660--17-VALID",
         "119:5660--16-RESYNC"
        ]
       },
       "InactiveServers":{
        "IP:Port":"145:5660--17-INVALID"
       },
       "UnusedServers":{
       
       },
       "OwnedSizeMB":"5.72 GB",
       "SharedSizeMB":"0 MB",
       "LogicalSizeMB":"5.72 GB",
       "Mtime":"Mon Dec 10 11:54:38 CET 2012"
      }
     ]
    }

Outcomes