AnsweredAssumed Answered

unable to restart mapr after crash

Question asked by sirpy on Nov 11, 2011
Latest reply on Nov 22, 2011 by sirpy
My mapr node crashed after issueing a delete of a large volume files(500GB+), which caused the server to hang, with some panic messages.
After rebooting the cldb files to come backup. looking at the logs I see the following
the mfs log shows these errors:

    2011-11-12 11:08:50,0258 ERROR IOMgr fs/server/io/spinit.cc:286 clnt x.x.0.0:0 req 0 seq 0 SP(SP1) Marked with ERROR(118), hence failing online
    2011-11-12 11:08:50,0260 INFO IOMgr fs/server/io/spinit.cc:46 clnt x.x.0.0:0 req 0 seq 0 Storage Pool DeInit()
    2011-11-12 11:08:50,0261 ERROR Global fs/server/mapserver/loadsp.cc:121 clnt x.x.0.0:0 req 0 seq 0 Not a XENIX named type file. sp(/dev/sdb)->Init failed(118)
    2011-11-12 11:08:51,8431 INFO IOMgr fs/server/io/loadcidmap.cc:731 clnt x.x.0.0:0 req 0 seq 0 SP(SP2) Cidmap Loaded cid 2186 rootblk: 0x16542320
    2011-11-12 11:08:51,8431 INFO IOMgr fs/server/io/loadcidmap.cc:731 clnt x.x.0.0:0 req 0 seq 0 SP(SP2) Cidmap Loaded cid 2188 rootblk: 0x16582df8
    2011-11-12 11:08:51,8431 INFO IOMgr fs/server/io/spinit.cc:736 clnt x.x.0.0:0 req 0 seq 0 SP(SP2) Containers loaded
    2011-11-12 11:08:51,8431 INFO IOMgr fs/server/io/spinit.cc:744 clnt x.x.0.0:0 req 0 seq 0 Deleting empty container done for sp SP2
    2011-11-12 11:08:51,8431 INFO IOMgr fs/server/io/spinit.cc:979 clnt x.x.0.0:0 req 0 seq 0 SP(SP2) Initialized
    2011-11-12 11:08:51,8431 INFO Global fs/server/mapserver/loadsp.cc:137 clnt x.x.0.0:0 req 0 seq 0 Done loading all SPs in disktab
    2011-11-12 11:08:52,9246 ERROR Replication fs/server/common/cldbha.cc:255 clnt x.x.0.0:0 req 0 seq 0 Got error Connection reset by peer (104) while trying to register with CLDB 192.168.0.120:7222

the cldb log shows processing of many stale containers:

    2011-11-12 01:01:49,869 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 120 from 65.49.37.107:5660 Generating reply with status: 3
    2011-11-12 01:01:50,175 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-2]: Processing stale containers  on StoragePool bb5ff1e6e0d312d7004e9e84b3074e6c from FileServer 65.49.37.107:5660-216.218.130.50:5660-216.218.130.52:5660-192.168.0.120:5660-
    2011-11-12 01:01:50,481 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-2]: Processing stale containers  on StoragePool bb5ff1e6e0d312d7004e9e84b3074e6c from FileServer 65.49.37.107:5660-216.218.130.50:5660-216.218.130.52:5660-192.168.0.120:5660-

after he's done with that comes the error message:

    2011-11-12 01:06:03,143 FATAL com.mapr.fs.cldb.CLDB [WaitForLocalKvstore Thread]: CLDBShutdown: CLDB had master lock and was waiting for its local mfs to become Master.Waited for 5 (minutes) but mfs did not become Master. Shutting down CLDB to release master lock.
    2011-11-12 01:06:03,144 INFO  com.mapr.fs.cldb.CLDBServer [WaitForLocalKvstore Thread]: Shutdown: Stopping CLDB
    2011-11-12 01:06:03,144 INFO  com.mapr.fs.cldb.CLDB [Thread-10]: CLDB shutdown

Outcomes