AnsweredAssumed Answered

How to Fix Stale Containers ... Removing the stale containers also fine

Question asked by hittalamani on Mar 10, 2017
Latest reply on Mar 29, 2017 by maprcommunity

We have a Cluster with 6 Nodes, In a Power Outage one of the node went down and few of the clusters are stale and the cluster has slowed down ...

 

5309 is one of the Container with the following 

 

53096760522710595 MB10.0.28.153--10-BECOME_MASTER10.0.28.153--10-BECOME_MASTER,10.0.28.156--2-RESYNC,

 

Its not becoming valid neither getting removed ...

 

In  cldb Logs mentioned

 

2017-03-10 13:14:30,7112 ERROR shard.cc:285 DeleteDanglingProc 5309.95.3280906 rpc failed 119
2017-03-10 13:14:30,7112 ERROR purgefidmap.cc:397 Purge fidmap 2049.243.7950110 : ShardDelete failed 119
2017-03-10 13:14:30,7112 ERROR truncate.cc:1189 PurgeFidmap failed 119 in inline trunc of fidmap 2049.243.7950110
2017-03-10 13:15:30,7152 INFO remotefs.cc:322 request 42 to master 10.0.28.153:5660 for container 5309 failed with 19, will retry after refreshing ContainerInfo fro

 

How can I remove these stale/invalid Containers ... I'm using M3 with Mapr5.0 

 

I tried Restarting the MFS... Restarting the Cluster ...Fsck , GFSCK .... 

 

gfsck never succeeded in the attempts ...

 

java.lang.Exception: PhaseOne failed with status 22
at com.mapr.fs.globalfsck.GlobalFsck.Run(GlobalFsck.java:1206)
at com.mapr.fs.globalfsck.GlobalFsckClient.main(GlobalFsckClient.java:154)
remove volume mapr.cluster.root from global-fsck mode (ret = 22) ...
GlobalFsck failed (error 22)

 

And i'm not able to access the NFS data correctly its hung ...Community ManagerCommunity Manager Mufeed Usman

Outcomes