AnsweredAssumed Answered

Error while starting warden

Question asked by kshitij_s on Mar 14, 2012
Latest reply on Mar 14, 2012 by kshitij_s
I'm trying to install MapR on a cluster with Ubuntu 10.04 with custom kernel (3.3-rc3). There is only one master node (running only 1 zookeeper and 1 cldb for the entire cluster), and all the other nodes of the cluster only have mfs and tasktracker services installed.

Each cluster machine has two network interfaces, and MAPR_SUBNETS is configured appropriately to only use one interface. The /etc/hosts file has the DNS mappings also set correctly.

After following the instruction for M3 install in the documentation, when I start warden on the first node (the master node), I get the following error:

--------------
2012-03-14 09:48:31,323 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Thread: 41, NodeCreated: /services/hoststats/tmel-bd-n11.tmel.vmem.int

2012-03-14 09:48:31,350 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Need delayed alarm clearing for: NODE_ALARM_SERVICE_HOSTSTATS_DOWN

2012-03-14 09:48:31,350 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats. Event state: SyncConnected. Event type: NodeChildrenChanged

2012-03-14 09:48:31,351 ERROR com.mapr.warden.service.CLDBService isToStartNow [cldb_monitor]: Exception while trying to get children of: /datacenter/controlnodes/cldb/active/CLDBNodes
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /datacenter/controlnodes/cldb/active/CLDBNodes
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
        at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
        at com.mapr.warden.service.baseservice.common.ZKUtilsLocking.getZkNodeChildren(ZKUtilsLocking.java:89)
        at com.mapr.warden.service.CLDBService.isToStartNow(CLDBService.java:299)
        at com.mapr.warden.service.baseservice.Service$ServiceMonitorRun.run(Service.java:1559)
        at java.lang.Thread.run(Thread.java:662)

2012-03-14 09:48:31,353 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [nice, -n, -10, /etc/init.d/mapr-cldb, start], Directory: /etc/init.d/

2012-03-14 09:48:31,379 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [nice, -n, -10, /etc/init.d/mapr-hoststats, start], Directory: /etc/init.d/

2012-03-14 09:48:31,434 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]:

2012-03-14 09:48:32,443 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Starting CLDB, logging to /opt/mapr/logs/cldb.log


---------
**The zookeper log has no error, but the INFO messages look like this >>**

2012-03-14 09:48:34,064 - INFO  [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x17 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb

2012-03-14 09:48:34,073 - INFO  [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x18 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb/epoch Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb/epoch

2012-03-14 09:48:34,081 - INFO  [ProcessThread:-1:PrepRequestProcessor@407] - Got user-level KeeperException when processing sessionid:0x136121abb510010 type:create cxid:0x19 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error Path:/datacenter/controlnodes/cldb/epoch/1 Error:KeeperErrorCode = NodeExists for /datacenter/controlnodes/cldb/epoch/1

----

**The cldb log has the following (trimmed) output >>**

2012-03-14 10:06:21,121 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: FSRegister: Request  FSID: 2836519655559445901 FSNetworkLocation: / FSHost:Port: 10.11.91.208:5660-172.16.200.2:5660- FSHostName: tmel-bd-n12.tmel.vmem.int StoragePools f282cb583e097e88004f5ff2f804576c-7f6bacbec8c602d3004f5ff2f909a659-04bceaaae9520992004f5ff2f602a258- Capacity: 6406795 Available: 6404544 Used: 2250 Role: 0 isDCA: false Received registration request

2012-03-14 10:06:21,121 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: FSRegister: CLDB waiting for local mfs to register and become master, requesting fileserver 10.11.91.208:5660-172.16.200.2:5660- FSID: 2836519655559445901 to try again by returning ESRCH

2012-03-14 10:06:21,910 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 10.11.91.207:51687 Generating reply with status: 3

2012-03-14 10:06:21,929 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 40 from 10.11.91.207:51687 Generating reply with status: 3

2012-03-14 10:06:25,674 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-2]: RPC: PROGRAMID: 2345 PROCEDUREID: 103 from 10.11.91.207:53592 Generating reply with status: 3

Outcomes