AnsweredAssumed Answered

Made some partition changes; CLDB now not starting

Question asked by rpark31 on Dec 27, 2011
Latest reply on Dec 28, 2011 by rpark31
Hi,

I have an Ubuntu M3 cluster. I had to make some changes to the storage, including removing a disk partition and resizing then reformatting an existing partition. I also deleted /opt/mapr/conf/disktab on one of the nodes for the deleted partition, because I couldn't see any other way of removing the partition.

Now when I try to restart the cluster, the CLDB log complains that it lost a container and CLDB shuts down.

The cluster has 3 nodes but for troubleshooting purposes, I removed all but node 1.

Here is the relevant entry in /opt/mapr/logs/cldb.log:

<code>
2011-12-27 14:52:50,685 INFO  com.mapr.fs.cldb.CLDB [main]: Initializing CLDB
2011-12-27 14:52:50,686 INFO  com.mapr.fs.cldb.CLDB [main]: Loading properties file : /opt/mapr/conf/cldb.conf
2011-12-27 14:52:50,806 INFO  com.mapr.fs.cldb.counters.CLDBMetrics [main]: Initializing CLDB Metrics with serviceName: cldbServer
2011-12-27 14:52:50,809 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Using hostname file /opt/mapr/hostname
2011-12-27 14:52:50,810 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Using hostid file /opt/mapr/hostid
2011-12-27 14:52:50,810 INFO  com.mapr.fs.cldb.CLDB [main]: CLDB Startup
2011-12-27 14:52:50,810 INFO  com.mapr.fs.cldb.CLDB [main]: CLDB Properties from configuration file: {cldb.jmxremote.port=7220, cldb.web.port=7221, cldb.ignore.stale.zk=true, cldb.port=7222, cldb.zookeeper.servers=192.168.2.41:5181, cldb.numthreads=10, cldb.min.fileservers=1, hadoop.version=0.20.2}
2011-12-27 14:52:50,810 INFO  com.mapr.fs.cldb.CLDB [main]: CLDB Command line args: /opt/mapr/conf/cldb.conf
2011-12-27 14:52:50,810 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Initializing CLDB
2011-12-27 14:52:50,811 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Starting RPCServer on port 7222 with num thread 20 and heap size of 632(MB)
2011-12-27 14:52:50,833 INFO  com.mapr.fs.cldb.CLDB [main]: MapR BuildVersion: 1.2.0.12140GA
2011-12-27 14:52:50,833 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Start CLDBServer
2011-12-27 14:52:50,865 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDBInit: HostName : node1
2011-12-27 14:52:50,866 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDBInit: ServerId: 7213960227882613830
2011-12-27 14:52:50,866 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDBInit: Cluster name : my.cluster.com
2011-12-27 14:52:50,874 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDB creds setting uid as 0
2011-12-27 14:52:50,874 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDB creds setting adding gid 0
2011-12-27 14:52:50,874 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBState: CLDB State change : INITIAZING
2011-12-27 14:52:50,874 INFO  com.mapr.fs.cldb.CLDBServer [main]: Read 1 nodes from the node list file
2011-12-27 14:52:50,879 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: ZooKeeperClient init: zk timeout = 30000 ms
2011-12-27 14:52:50,938 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event None on path null
2011-12-27 14:52:52,963 INFO  com.mapr.fs.cldb.CLDBServer [main]: CLDB Init: ZooKeeper Servers configured. Connected to 192.168.2.41:5181
2011-12-27 14:52:53,009 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: ZooKeeperClient: CLDB has latest epoch. Checking cleanbit
2011-12-27 14:52:53,009 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: ZooKeeperClient: KvStore is clean and of latest epoch CLDB trying to become Master
2011-12-27 14:52:53,012 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: ZooKeeperClient: CLDB is current Master
2011-12-27 14:52:53,012 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: CLDB became master. Initializing KvStoreContainer for cid: 1
2011-12-27 14:52:53,014 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [main]: Storing KvStoreContainerInfo to ZooKeeper  Container ID:1 VolumeId:1 Servers:  192.168.2.41:5660--7-VALID Inactive Servers:  192.168.2.43:5660--7-VALID 192.168.2.42:5660--7-VALID Unused Servers:  Latest epoch:7
2011-12-27 14:52:53,022 INFO  com.mapr.fs.cldb.CLDBServer [main]: Starting thread to monitor waiting for local kvstore to become master
2011-12-27 14:52:53,110 INFO  com.mapr.fs.cldb.ActiveContainersMap [main]: Initializing containers cache with 1000000 number of entires
2011-12-27 14:52:53,472 INFO  com.mapr.volumemirror.VolumeMirror [main]: Initializing volume mirror thread ...
2011-12-27 14:52:53,493 INFO  com.mapr.fs.cldb.http.HttpServer [main]: Creating listener for 0.0.0.0
2011-12-27 14:52:53.512::INFO:  Logging to STDERR via org.mortbay.log.StdErrLog
2011-12-27 14:52:53,570 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBState: CLDB State change : WAIT_FOR_FILESERVERS
2011-12-27 14:52:53,570 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Exporting program 2345
2011-12-27 14:52:53,571 INFO  com.mapr.fs.cldb.CLDB [main]: CLDBInit: Starting HTTP Server
2011-12-27 14:52:53,571 INFO  com.mapr.fs.cldb.CLDBServer [main]: Init: Start HTTP Server
2011-12-27 14:52:53,571 INFO  com.mapr.fs.cldb.http.HttpServer [main]: WebServer: Starting WebServer
2011-12-27 14:52:53,573 INFO  com.mapr.fs.cldb.http.HttpServer [main]: Lisetner started on SelectChannelConnector@0.0.0.0:7221 port 7221
2011-12-27 14:52:53,573 INFO  com.mapr.fs.cldb.http.HttpServer [main]: Starting Jetty WebServer
2011-12-27 14:52:53.573::INFO:  jetty-6.1.14
2011-12-27 14:52:53.899::INFO:  Started SelectChannelConnector@0.0.0.0:7221
2011-12-27 14:52:53,899 INFO  com.mapr.fs.cldb.CLDBServer [main]: HTTP Server
2011-12-27 14:52:54,690 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.2.41:53784 Generating reply with status: 3
2011-12-27 14:52:54,722 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 40 from 192.168.2.41:53784 Generating reply with status: 3
2011-12-27 14:52:58,710 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.2.41:46169 Generating reply with status: 3
2011-12-27 14:52:58,737 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 40 from 192.168.2.41:46169 Generating reply with status: 3
2011-12-27 14:53:01,458 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: FSRegister: Request  FSID: 7213960227882613830 FSNetworkLocation: / FSHost:Port: 192.168.2.41:5660- FSHostName: node1 StoragePools  Capacity: 0 Available: 0 Used: 0 Role: 0 isDCA: false Received registration request
2011-12-27 14:53:01,460 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: Cluster uuid is -8679376084245423669-5092532303636286670
2011-12-27 14:53:01,484 INFO  com.mapr.fs.cldb.counters.FileServerMetrics [pool-1-thread-1]: Initializing File Server Metrics with hostName=node1
2011-12-27 14:53:01,484 INFO  com.mapr.fs.cldb.topology.FileServer [pool-1-thread-1]: Instantiating fileserver metrics with context:com.mapr.fs.cldb.counters.MapRGangliaContext31
2011-12-27 14:53:01,486 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: FSRegister: Registered FileServer: 192.168.2.41:5660-
2011-12-27 14:53:01,638 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: Allocating WorkUnit type : NOCOMPRESS_LIST_UPDATED for container 0 with sequence number 0 to 192.168.2.41:5660-
2011-12-27 14:53:02,719 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 192.168.2.41:54520 Generating reply with status: 3
2011-12-27 14:53:02,747 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 40 from 192.168.2.41:54520 Generating reply with status: 3
...
2011-12-27 14:54:35,013 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-1]: Processing FileServerVolumeList from fileServer: 192.168.2.41:5660- VolumeID: 1
2011-12-27 14:54:35,013 WARN  com.mapr.fs.cldb.Containers [pool-1-thread-1]: FileServer volume list from 192.168.2.41:5660- is missing volume 1
2011-12-27 14:54:35,018 WARN  com.mapr.fs.cldb.Containers [pool-1-thread-1]: FileServer VolumeList from 192.168.2.41:5660- is missing volume 1 Requesting FileServer to  verify 1 containers of volume
2011-12-27 14:54:35,104 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: Allocating WorkUnit type : VOLUME_CONTAINERS_MISSING_VERIFY for container 0 with sequence number 0 to 192.168.2.41:5660-
2011-12-27 14:54:35,111 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-1]: Processing ReadWriteContainers from FileServer: 192.168.2.41:5660- for volumeID: 1 confirmed 0 missing containers
2011-12-27 14:54:35,112 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-1]: Missing container containerId: 1 on StoragePool 07b8cfdb12343bc9004ea8df320446b1 from Server: 192.168.2.41:5660-
2011-12-27 14:54:35,112 FATAL com.mapr.fs.cldb.Containers missingContainersReReplicate [pool-1-thread-1]: FileServer : 192.168.2.41:5660- reported that it lost container 1 FileServer is local server. Stopping CLDB
2011-12-27 14:54:35,114 FATAL com.mapr.fs.cldb.CLDB shutdown [pool-1-thread-1]: CLDBShutdown: FileServer : 192.168.2.41:5660- reported that it lost container 1 FileServer is local server. Stopping CLDB
2011-12-27 14:54:35,114 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: Shutdown: Stopping CLDB
2011-12-27 14:54:35,115 INFO  com.mapr.fs.cldb.CLDB [Thread-9]: CLDB ShutDown Hook called
2011-12-27 14:54:35,115 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Thread-9]: Zookeeper Client: Closing client connection:
2011-12-27 14:54:35,121 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event NodeDeleted on path /datacenter/controlnodes/cldb/active/CLDBMaster
2011-12-27 14:54:35,122 INFO  com.mapr.fs.cldb.CLDB [Thread-9]: CLDB shutdown
</code>

Outcomes