AnsweredAssumed Answered

Replication failure after CLDB outage

Question asked by terrys on Jan 1, 2013
Latest reply on Jan 8, 2013 by terrys
I am currently having an issue with replicaiton completing on 2 nodes in my cluster, specifically the logs and metrics volume.  The problem started with we encountered an out of space condition on the cluster, which caused a CLDB failure.  I was able to successfully restart CLDB on a different node which still had some disk free, and we were then able to add additional disk to the cluster and delete some old data.  The data volumes successfully replicated fully again after the outage, however, these metrics and logs volumes have been stuck at partial replication for a week.

I ran a maprcli dump volumeinfo -volumename command, and see there are some containers with a master of "unknown ip", which I assume is the problem.  Since this is log and metric data, I'm less interested in recovering it and would rather see the cluster health go back to normal.  What is the best way to resolve this?

Snipped output of maprcli dump volumeinfo

     {
                        "ContainerId":2355,
                        "Epoch":9,
                        "Master":"unknown ip (0)-0-VALID",
                        "ActiveServers":{

                        },
                        "InactiveServers":{

                        },
                        "UnusedServers":{
                                "IP:Port":[
                                        "172.20.3.44:5660--9",
                                        "172.20.3.41:5660--9"
                                ]
                        },
                        "OwnedSizeMB":"115 MB",
                        "SharedSizeMB":"0 MB",
                        "LogicalSizeMB":"115 MB",
                        "TotalSizeMB":"115 MB",
                        "NumInodesInUse":1899,
                        "Mtime":"Fri Dec 21 09:27:58 CST 2012",
                        "NameContainer":"false"
                },

Outcomes