AnsweredAssumed Answered

CLDB Move Successful the Overnight Cluster Dies

Question asked by mandoskippy on Oct 26, 2012
Latest reply on Oct 27, 2012 by mandoskippy
Greetings all, I am stuck. I moved my CLDB with help from mapr support on Friday.  I went to bed last night happy that all was well. Then this morning I couldn't connect to anything.  I brought all wardens down, and all zookeepers down. Then I started the Zookeepers and the CLDB Warden. The logs from the CLDB are below.


I am guessing the:
<pre>
2012-10-27 07:31:24,725 WARN Containers [RPC-thread-7]: FileServer volume list from 192.168.0.100:5660-192.168.200.100:5660- is missing volume 1
2012-10-27 07:31:24,725 WARN Containers [RPC-thread-7]: FileServer VolumeList from 192.168.0.100:5660-192.168.200.100:5660- is missing volume 1 Requesting FileServer to  verify 1 containers of volume
</pre>

Is the issue but I am not sure what it means or how I can fix it. I did add a disk to the CLDB node, but both disk appear healthy to Ubuntu.  One thing of note, MapR references things by /dev/sd[a-z]  but when I added the disk, the one disk that was in the computer which was /dev/sdc became /dev/sdd and the new disk was /dev/sdc Perhaps MapR should use UUID?  IF this is the issue how can I fix this?

The last part of the log just shows what it CLDB sits on status 3 waiting to become master.  At this point I am lost as I am not sure how to fix the problem, nor bring my cluster up in any shape... HELP!
<pre>

2012-10-27 07:31:10,253 INFO CLDB [main]: CLDBInit: Exporting program 2345
2012-10-27 07:31:10,253 INFO CLDB [main]: CLDBInit: Starting HTTP Server
2012-10-27 07:31:10,253 INFO CLDBServer [main]: Init: Start HTTP Server
2012-10-27 07:31:10,254 INFO HttpServer [main]: WebServer: Starting WebServer
2012-10-27 07:31:10,254 INFO HttpServer [main]: Listener started on SelectChannelConnector@0.0.0.0:7221 port 7221
2012-10-27 07:31:10,254 INFO HttpServer [main]: Starting Jetty WebServer
2012-10-27 07:31:10.254::INFO:  jetty-6.1.14
2012-10-27 07:31:10.482::INFO:  Started SelectChannelConnector@0.0.0.0:7221
2012-10-27 07:31:10,482 INFO CLDBServer [main]: HTTP Server
2012-10-27 07:31:10,970 INFO CLDBServer [RPC-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.0.100:47558 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:13,976 INFO CLDBServer [RPC-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.0.100:47558 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:14,089 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:15,095 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:17,095 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:18,977 INFO CLDBServer [RPC-thread-3]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.0.100:47558 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:20,096 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:21,632 INFO CLDBServer [RPC-thread-4]: FSRegister: Request  FSID: 8690764841490470700 FSNetworkLocation: / FSHost:Port: 192.168.0.100:5660-192.168.200.100:5660- FSHostName: hadoopmapr1.local StoragePools f1367efeb792c17f00508b28da0cf1d9- Capacity: 0 Available: 0 Used: 0 Role: 0 isDCA: false Received registration request
2012-10-27 07:31:21,632 INFO CLDBServer [RPC-thread-4]: Cluster uuid is -7334162873534348519-6609004683031823124
2012-10-27 07:31:21,650 INFO FileServerMetrics [RPC-thread-4]: Initializing File Server Metrics with hostName=hadoopmapr1.local
2012-10-27 07:31:21,650 INFO FileServer [RPC-thread-4]: Instantiating fileserver metrics with context:com.mapr.fs.cldb.counters.MapRGangliaContext31
2012-10-27 07:31:21,654 INFO CLDBServer [RPC-thread-4]: FSRegister: Registered FileServer: 192.168.0.100:5660-192.168.200.100:5660-
2012-10-27 07:31:24,097 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:24,709 INFO LoadTracker [RPC-thread-5]: Adding sp sp f1367efeb792c17f00508b28da0cf1d9 on fs 192.168.0.100:5660-192.168.200.100:5660- [ percentage 6 used 94260 capacity 1371067 inTransit 0 outTransit 0]  to list Average
2012-10-27 07:31:24,710 INFO CLDBServer [RPC-thread-5]: Allocating WorkUnit type : NOCOMPRESS_LIST_UPDATED for container 0 with sequence number 0 to 192.168.0.100:5660-192.168.200.100:5660-
2012-10-27 07:31:24,725 WARN Containers [RPC-thread-7]: FileServer volume list from 192.168.0.100:5660-192.168.200.100:5660- is missing volume 1
2012-10-27 07:31:24,725 WARN Containers [RPC-thread-7]: FileServer VolumeList from 192.168.0.100:5660-192.168.200.100:5660- is missing volume 1 Requesting FileServer to  verify 1 containers of volume
2012-10-27 07:31:24,813 INFO CLDBServer [RPC-thread-8]: Allocating WorkUnit type : VOLUME_CONTAINERS_MISSING_VERIFY for container 0 with sequence number 0 to 192.168.0.100:5660-192.168.200.100:5660-
2012-10-27 07:31:25,978 INFO CLDBServer [RPC-thread-5]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.0.100:47558 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
2012-10-27 07:31:29,098 INFO CLDBServer [Lookup-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 5 from 192.168.0.100:1111 Rejecting rpc with status 3 as CLDB is waiting for local kvstore to become master.
</pre>

Outcomes