AnsweredAssumed Answered

mapr hoststats failed to start

Question asked by dretkal on Sep 15, 2014
Latest reply on Sep 16, 2014 by mufeed
When I start the warden service the hoststats does not start correctly. It appears that everything else is working correctly, but the UI is not showing any stats. I have verified the wardden.conf file is correct with the other nodes that are work.

<pre>
2014-09-11 18:25:01,587 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/kvstore/lsaw01. Event state: SyncConnected. Event type: NodeDataChanged
2014-09-11 18:25:01,587 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x4864e2817c0022 local:/10.10.10.1:54691 remoteserver:lsaw01/10.10.10.1:5181 lastZxid:137438954466 xid:16 sent:16 recv:20 queuedpkts:0 pendingresp:0 queuedevents:0
2014-09-11 18:25:01,587 INFO  com.mapr.warden.service.baseservice.DependentService [main-EventThread]: Process path: /services/kvstore/lsaw01. Event state: SyncConnected. Event type: NodeDataChanged
2014-09-11 18:25:01,587 INFO  com.mapr.warden.service.baseservice.DependentService [main-EventThread]: Process path: /services/kvstore/lsaw01. Event state: SyncConnected. Event type: NodeDataChanged
2014-09-11 18:25:01,596 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: -------------Service is starting for: cldb
2014-09-11 18:25:01,596 INFO  com.mapr.warden.service.baseservice.Service$ServiceMonitorRun [main-EventThread]: Command: [/etc/init.d/mapr-cldb, status], Directory: /etc/init.d/
2014-09-11 18:25:01,596 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/cldb/lsaw01. Event state: SyncConnected. Event type: NodeCreated
2014-09-11 18:25:01,596 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x14864e2817e001a local:/10.10.10.1:37190 remoteserver:/10.10.10.2:5181 lastZxid:137438954470 xid:26 sent:26 recv:31 queuedpkts:0 pendingresp:0 queuedevents:1
2014-09-11 18:25:01,596 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Thread: 36, NodeCreated: /services/cldb/lsaw01
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/cldb. Event state: SyncConnected. Event type: NodeChildrenChanged
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: -------------Service is starting for: hoststats
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x14864e2817e001a local:/10.10.10.1:37190 remoteserver:/10.10.10.2:5181 lastZxid:137438954470 xid:27 sent:27 recv:32 queuedpkts:0 pendingresp:0 queuedevents:0
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service$ServiceMonitorRun [main-EventThread]: Command: [/etc/init.d/mapr-hoststats, status], Directory: /etc/init.d/
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats/lsaw01. Event state: SyncConnected. Event type: NodeCreated
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x34864e30bac000f local:/10.10.10.1:58755 remoteserver:/10.10.10.4:5181 lastZxid:137438954470 xid:22 sent:22 recv:27 queuedpkts:0 pendingresp:0 queuedevents:1
2014-09-11 18:25:01,597 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Thread: 39, NodeCreated: /services/hoststats/lsaw01
2014-09-11 18:25:01,598 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats. Event state: SyncConnected. Event type: NodeChildrenChanged
2014-09-11 18:25:01,598 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x34864e30bac000f local:/10.10.10.1:58755 remoteserver:/10.10.10.4:5181 lastZxid:137438954470 xid:23 sent:23 recv:28 queuedpkts:0 pendingresp:0 queuedevents:0
2014-09-11 18:25:01,599 INFO  com.mapr.warden.service.baseservice.Service [cldb_monitor]: Alarm clearing command: [/opt/mapr/bin/maprcli, alarm, clear, -alarm, NODE_ALARM_SERVICE_CLDB_DOWN, -entity, lsaw01]
2014-09-11 18:25:01,601 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Alarm clearing command: [/opt/mapr/bin/maprcli, alarm, clear, -alarm, NODE_ALARM_SERVICE_HOSTSTATS_DOWN, -entity, lsaw01]
2014-09-11 18:25:02,783 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: Command: [/etc/init.d/mapr-cldb, start], Directory: /etc/init.d
2014-09-11 18:25:02,803 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [/etc/init.d/mapr-hoststats, start], Directory: /etc/init.d
2014-09-11 18:25:02,853 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:02,853 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:02,901 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:02,902 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:02,949 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:02,950 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:02,997 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:02,997 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:03,862 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [cldb_monitor]: 0.20.2
Starting CLDB, logging to /opt/mapr/logs/cldb.log

2014-09-11 18:25:12,998 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2014-09-11 18:25:12,998 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
2014-09-11 18:25:13,013 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats/lsaw01. Event state: SyncConnected. Event type: NodeDataChanged
2014-09-11 18:25:13,013 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x34864e30bac000f local:/10.10.10.1:58755 remoteserver:/10.10.10.4:5181 lastZxid:137438954494 xid:25 sent:26 recv:32 queuedpkts:0 pendingresp:0 queuedevents:0
2014-09-11 18:25:13,061 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: Monitor command: [/etc/init.d/mapr-hoststats, status]can not determine if service: hoststats is running. Retrying. Retrial #1. Total retries count is: 3
2014-09-11 18:25:13,061 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: hoststats is stopped

2014-09-11 18:25:13,062 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [/etc/init.d/mapr-hoststats, start], Directory: /etc/init.d
2014-09-11 18:25:13,108 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:13,109 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:13,157 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:13,157 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:13,204 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:13,204 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:13,254 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:13,255 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:13,862 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2014-09-11 18:25:13,862 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: []
2014-09-11 18:25:13,876 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/cldb/lsaw01. Event state: SyncConnected. Event type: NodeDataChanged
2014-09-11 18:25:13,876 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x14864e2817e001a local:/10.10.10.1:37190 remoteserver:/10.10.10.2:5181 lastZxid:137438954495 xid:30 sent:31 recv:37 queuedpkts:0 pendingresp:0 queuedevents:0
2014-09-11 18:25:23,255 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2014-09-11 18:25:23,255 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
2014-09-11 18:25:23,317 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: Monitor command: [/etc/init.d/mapr-hoststats, status]can not determine if service: hoststats is running. Retrying. Retrial #2. Total retries count is: 3
2014-09-11 18:25:23,317 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: hoststats is stopped

2014-09-11 18:25:23,318 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [/etc/init.d/mapr-hoststats, start], Directory: /etc/init.d
2014-09-11 18:25:23,364 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:23,364 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:23,409 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:23,409 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:23,457 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:23,457 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:23,504 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]: Error while running command: [/etc/init.d/mapr-hoststats, start]
2014-09-11 18:25:23,504 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [hoststats_monitor]:
2014-09-11 18:25:33,505 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
2014-09-11 18:25:33,505 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
2014-09-11 18:25:33,611 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: Monitor command: [/etc/init.d/mapr-hoststats, status]cannot determine if service: hoststats is running. Number of retrials exceeded. Closing Zookeeper
2014-09-11 18:25:33,611 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: 63 about to close zk for service: hoststats
2014-09-11 18:25:33,614 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats/lsaw01. Event state: SyncConnected. Event type: NodeDeleted
2014-09-11 18:25:33,614 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CLOSED sessionid:0x34864e30bac000f local:/10.10.10.1:58755 remoteserver:/10.10.10.4:5181 lastZxid:137438954500 xid:27 sent:30 recv:37 queuedpkts:0 pendingresp:0 queuedevents:1
2014-09-11 18:25:33,614 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK is closed for service: hoststats
2014-09-11 18:25:33,616 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_FAIL, hostName, ma_host, ma_process]
2014-09-11 18:25:33,616 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
2014-09-11 18:25:33,617 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Alarm raising command: [/opt/mapr/bin/maprcli, alarm, raise, -alarm, NODE_ALARM_SERVICE_HOSTSTATS_DOWN, -entity, lsaw01, -description, Can not determine if service: hoststats is running. Check logs at: /opt/mapr/logs/hoststats.log]
2014-09-11 18:26:03,619 INFO  com.mapr.warden.service.baseservice.Service [Thread-14-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2014-09-11 18:26:03,619 INFO  com.mapr.warden.service.baseservice.Service [Thread-14-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x14864e2817e001b local:/10.10.10.1:37196 remoteserver:/10.10.10.2:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:1
2014-09-11 18:26:03,619 INFO  com.mapr.warden.service.baseservice.Service [Thread-14]: Connected to ZK: 10.10.10.1:5181,10.10.10.2:5181,10.10.10.3:5181,10.10.10.4:5181,10.10.10.5:5181With State: State:CONNECTED Timeout:30000 sessionid:0x14864e2817e001b local:/10.10.10.1:37196 remoteserver:/10.10.10.2:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:1
2014-09-11 18:26:03,619 INFO  com.mapr.warden.service.baseservice.Service [Thread-14-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
2014-09-11 18:26:03,621 INFO  com.mapr.warden.service.baseservice.Service [Thread-14]: Node: /nodes/lsaw01/services/hoststats does not exist yet
2014-09-11 18:29:47,606 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2014-09-11 18:29:47,920 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: Response is null. Most likely hoststats is not accepting requests or it is down.
2014-09-11 18:29:48,452 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2014-09-11 18:34:47,606 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2014-09-11 18:34:48,460 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2014-09-11 18:35:17,924 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: Response is null. Most likely hoststats is not accepting requests or it is down.
2014-09-11 18:39:47,606 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2014-09-11 18:39:48,457 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2014-09-11 18:40:33,926 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: Response is null. Most likely hoststats is not accepting requests or it is down.
2014-09-11 18:44:47,607 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2014-09-11 18:44:48,453 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2014-09-11 18:46:02,928 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: Response is null. Most likely hoststats is not accepting requests or it is down.

Outcomes