AnsweredAssumed Answered

Mastercldb not available with "Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online"

Question asked by paz on Jun 15, 2014
Latest reply on Jun 16, 2014 by paz
Hi I'm new to MAPR and am building out a three node cluster as follows same as the M3 build in the docs. The build uses an additional two 12g disks per node for the mapr-fs:-
                                  cldb ZooKeeper Metric  JobTrak
   ip                     disk Node     nfs Websrv FServer  TskTrak                      
192.168.56.225 /dev/sdb,/dev/sbc   mapr01 x x x   x  x 
192.168.56.226 /dev/sdb,/dev/sbc  mapr02  x  x x x  x 
192.168.56.227 /dev/sdb,/dev/sbc  mapr03  x   x x x x

I have installed the software and am at the point where I start up the cluster for the first time (Initialisation).
So far the zookeeper is running on all three nodes, the warden has been started on one node (cldb node)
and the cldb seems to be started. However the master cldb is not available.
Please see status as follows:-
<pre>
[root@mapr01 conf]# service mapr-zookeeper status
JMX enabled by default
Using config: /opt/mapr/zookeeper/zookeeper-3.4.5/conf/zoo.cfg
zookeeper running as process 27637.

[root@mapr01 conf]# service mapr-warden status
WARDEN running as process 27966.

[root@mapr01 conf]# service mapr-cldb status
0.20.2
CLDB running as process 6893.

[root@mapr01 conf]# maprcli node cldbmaster
ERROR (10009) -  Couldn't connect to the CLDB service
[root@mapr01 conf]#
</pre>

So from this I conclude the master service is not available even though the cldb is online.
Inspecting the log files for significant events I get the following. Looks like the cldb.log has multiple lines of the same errors shown.

<pre>
#cldb.log
2014-06-16 13:51:06,636 INFO WaitForLocalKvstoreThread [WaitForLocalKvstore Thread]: Tried re-replicating CLDB volume 110 times
2014-06-16 13:51:22,721 INFO CLDBServer [Lookup-5]: Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online
2014-06-16 13:52:06,639 WARN DefaultContainerPlacementPolicy [WaitForLocalKvstore Thread]: Number of fileServers in the volume's topology /data is 0. Picking nodes from topology / to do the container creates
2014-06-16 13:52:52,723 INFO CLDBServer [Lookup-7]: Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online
2014-06-16 13:54:22,725 INFO CLDBServer [Lookup-1]: Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online
2014-06-16 13:55:52,727 INFO CLDBServer [Lookup-3]: Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online

#warden.log
2014-06-16 11:47:29,274 INFO  com.mapr.warden.service.baseservice.Service [cldb_monitor]: Need delayed alarm raising for: NODE_ALARM_SERVICE_CLDB_DOWN
2014-06-16 11:47:29,763 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [Thread-11-EventThread]: SessionExpiredException while trying to checkZKNodeForExistence of: /services/kvstore/mapr01.localdomain
2014-06-16 11:47:29,764 ERROR com.mapr.warden.service.baseservice.DependentService checkifDependentServiceChanged [Thread-11-EventThread]: ZK Session was either or closed or expired for service: cldb

CLDBServer: Rejecting RPC 2345.5 from 127.0.0.1:0 with status 3 as the minimum replication count of CLDB volume is not met. Waiting for additional nodes to come online
</pre>

Please advise as I don't get any hits on the above so am unable to diagnose this. I have two disks per node so I would of thought this is fine for replication as follows:-
<code>
[root@mapr01 conf]#  cat /opt/mapr/conf/disktab
# MapR Disks Sun Jun 15 08:34:00 2014
/dev/sdb DC9701C7-83F4-DA7A-52FB-09534C9D5300
/dev/sdc 3728E2B0-7961-1A5E-1AA8-03544C9D5300

[root@mapr02 ~]# cat /opt/mapr/conf/disktab
# MapR Disks Sun Jun 15 08:07:43 2014
/dev/sdb 7EE0305A-00BF-A61B-811E-042C469D5300
/dev/sdc C8288F5F-07EB-299E-88BD-092C469D5300

[root@mapr03 ~]# cat /opt/mapr/conf/disktab
# MapR Disks Sun Jun 15 08:08:36 2014
/dev/sdb B330DC76-AFB6-BD5C-7ACC-0455469D5300
/dev/sdc 8F47644E-CFB8-2E4A-3378-0256469D5300


Not sure how to avoid this error and proceed? Let me know if you need more details.?
Paz

Outcomes