AnsweredAssumed Answered

CLDB shuts down with �??CLDB is waiting for local kvstore to become master.�?

Question asked by paz on Jun 11, 2014
Latest reply on Jun 12, 2014 by paz
I creating a new cluster manually. I am at the point where I start the cluster.

The zookeeper has been started and there is one master and many leaders as expected.
The warden was started on the cldb node and both the warden and cldb come up fine.
When the warden was started on the remaining nodes the cldb came down.
Details are as follows:-

!warden is running on all nodes
<code>
# service  mapr-warden status
</code>
!on host mapr05
<code>
WARDEN running as process 2681.
</code>
!on host mapr06
<code>
WARDEN running as process 1355.
</code>
!on host mapr07
<code>
WARDEN running as process 1354.
</code>

!on the cldb node the cldb status is
<code>
# service  mapr-cldb status
0.20.2
/opt/mapr/pid/cldb.pid exists with pid 12498 but no CLDB.
</code>
 
! The log files show the following

#warden.log
<pre>
2014-06-12 07:08:34,038 WARN  com.mapr.warden.service.baseservice.Service$ServiceMonitorRun [cldb_monitor]: Service: cldb is not ready to be started after 1 min wait. Starting it anyway
2014-06-12 07:08:34,038 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [/etc/init.d/mapr-cldb, start], Directory: /etc/init.d
2014-06-12 07:08:35,251 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [cldb_monitor]: Error while running command: [/etc/init.d/mapr-cldb, start]
2014-06-12 07:08:35,252 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [cldb_monitor]: 0.20.2
2014-06-12 07:08:49,658 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [Thread-11-EventThread]: SessionExpiredException while trying to checkZKNodeForExistence of: /services/cldb/master
2014-06-12 07:08:49,658 ERROR com.mapr.warden.service.baseservice.Service process [Thread-11-EventThread]: ZK Session was either or closed or expired for service: cldb
</pre>
 
#cldb.log
<pre>
2014-06-12 07:10:19,200 INFO CLDBServer [Lookup-7]: Rejecting RPC 2345.5 from 10.0.2.15:41627 with status 3 as CLDB is waiting for local kvstore to become master.
Unable to obtain binding for request from 192.168.56.226:5660. Closing connection
Unable to obtain binding for request from 192.168.56.227:5660. Closing connection
Unable to obtain binding for request from 192.168.56.226:5660. Closing connection
            <<< This appears multiple times in the log file >>>
Unable to obtain binding for request from 192.168.56.227:1111. Closing connection
Unable to obtain binding for request from 192.168.56.226:5660. Closing connection
2014-06-12 07:13:07,355 FATAL CLDB [WaitForLocalKvstore Thread]: CLDBShutdown: CLDB had master lock and was waiting for its local mfs to become Master.Waited for 7 (minutes) but mfs did not become Master. Shutting down CLDB to release master lock.
2014-06-12 07:13:07,355 INFO CLDBServer [WaitForLocalKvstore Thread]: Shutdown: Stopping CLDB
2014-06-12 07:13:07,359 INFO CLDB [Thread-13]: CLDB ShutDown Hook called
2014-06-12 07:13:07,360 INFO ZooKeeperClient [Thread-13]: Setting the clean cldbshutdown flag to true
2014-06-12 07:13:07,375 INFO ZooKeeperClient [Thread-13]: Zookeeper Client: Closing client connection:
2014-06-12 07:13:07,387 INFO CLDBServer [main-EventThread]: The CLDB received notification that a ZooKeeper event of type NodeDeleted occurred on path /datacenter/controlnodes/cldb/active/CLDBMaster
2014-06-12 07:13:07,389 INFO ZooKeeper [Thread-13]: Session: 0x2468bd60e79000b closed
2014-06-12 07:13:07,389 INFO CLDB [Thread-13]: CLDB shutdown
</pre>

I've tried looking for this error but was unable to get a hit.
Could you please advise what I may do next to get the cldb to run when the warden is started on the remaining nodes?

Thanks
Paz

Outcomes