Trouble getting started with  M5 -CentOS installation on 4 node cluster

Question asked by prasanth on Jul 29, 2012
Latest reply on May 1, 2014 by nkapinos
I have followed exactly the procedure given in the link: `` to set up a 4 node cluster (3 zookeeper nodes and 3 cldb nodes exactly as given in the link)

To begin with I started off with flat file of 20GB(later tried with 50GB also) to test without formatting the disks. After running the configure command and the disksetup command successfully. I was careful enough to start the zookeepers on the nodes in order mentioned in the configure params. I have started warden on the first node and gave it root permissions then registered with the licence all without any trouble. On finishing the registration process, i started warden all other nodes.

Now I cannot see any nodes other than node on which the webserver is running in the dashboard. A job tracker is running on second node and i can access the web interface of the jobtracker on second node, but the node itself is not seen in the dashboard. on checking the nodes list using **maprcli node list -columns hostname** I get only one node(the first node).

I looked at the cldb logs of nodes 2 and 3. there the cldb status is stuck at initializing and the log dump shows the following and last lines repeat forever and cldb never gets started properly.

    ZooKeeperClient: **KvStore does not have epoch entry CLDB trying to wait until it is Ready**
    2012-07-30 01:21:37,049 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: Waiting for local KvStoreContainer to become valid. KvStore ContainerInfo  Container ID:1 Master: Servers: Inactive Servers:  Unused Servers:  Latest epoch:4 SizeMB:0 CLDB ServerID : 8982509962313828465
    2012-07-30 01:21:47,170 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-1]: RPC: PROGRAMID: 2345 PROCEDUREID: 31 from Generating reply with status: 30
    2012-07-30 01:22:50,178 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 31 from Generating reply with status: 30

I have disables selinux and firewalls. enables passwordless login for all combinations of nodes. i have root permissions... i literally get no error except a perpetual waiting state of cldb nodes 2 and 3. only node 1 appear on the web dashboard. I have tried every thread in this forum but still could not make any progress.

Any help will be greatly appreciated.

log dumps are as follows:
on node 2:
on node 1:

warden.log :
on node 2:
on node 1: