AnsweredAssumed Answered

mfs is not running. Because of this, the remaining services(hoststats,cldb,nfs) are also not running .

Question asked by pavankumar on Jun 16, 2015
Latest reply on Jun 17, 2015 by pavankumar
i am trying to install MapR on 9 nodes manually following instructions here  http://doc.mapr.com/display/MapR3/Advanced+Installation+Topics .
mfs.err  log file content
=======================================
tcmalloc: large alloc 2235367424 bytes == 0x1aaa000 @
tcmalloc: large alloc 152600707072 bytes == 0x8708e000 @
tcmalloc: large alloc 8541085696 bytes == 0x241d114000 @
tcmalloc: large alloc 22776225792 bytes == 0x261aa72000 @
Loading /opt/mapr/server/permissions/libmapr_roles_refimpl.so
Resolving function 'getSecurityMembership()'
Resolving function 'cleanup()'
Scanning directory '/opt/mapr/server/filters'
Loading /opt/mapr/server/filters/libmaprhbase-filters.so
Resolving function 'maprhbase_RegisterFilters()'
bind: error 98
bind: error 98
tcmalloc: large alloc 43832893440 bytes == 0x27f9a000 @
tcmalloc: large alloc 1826373632 bytes == 0xa60bb8000 @
tcmalloc: large alloc 4870324224 bytes == 0xacdb30000 @
Loading /opt/mapr/server/permissions/libmapr_roles_refimpl.so
Resolving function 'getSecurityMembership()'
Resolving function 'cleanup()'
Scanning directory '/opt/mapr/server/filters'
Loading /opt/mapr/server/filters/libmaprhbase-filters.so
Resolving function 'maprhbase_RegisterFilters()'
tcmalloc: large alloc 2235367424 bytes == 0x1aaa000 @
tcmalloc: large alloc 152600707072 bytes == 0x8708e000 @
tcmalloc: large alloc 8541085696 bytes == 0x241d114000 @
tcmalloc: large alloc 22776225792 bytes == 0x261aa72000 @
Loading /opt/mapr/server/permissions/libmapr_roles_refimpl.so
Resolving function 'getSecurityMembership()'
Resolving function 'cleanup()'
Scanning directory '/opt/mapr/server/filters'
Loading /opt/mapr/server/filters/libmaprhbase-filters.so
Resolving function 'maprhbase_RegisterFilters()'
bind: error 98
bind: error 98
bind: error 98
bind: error 98
bind: error 98
tcmalloc: large alloc 3002728448 bytes == 0x1aaa000 @
tcmalloc: large alloc 204986023936 bytes == 0xb4d16000 @
tcmalloc: large alloc 8541085696 bytes == (nil) @





warden.log file
==========================================================
2015-06-16 13:56:02,539 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x14dfd34b4c1002b local:/10.25.18.121:43276 remoteserver:apus5.labs.teradata.com/10.25.18.125:5181 lastZxid:8589936096 xid:44 sent:223 recv:239 queuedpkts:0 pendingresp:0 queuedevents:2
2015-06-16 13:56:02,539 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK is closed for service: hoststats
2015-06-16 13:56:02,540 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: Process path: /services/hoststats/master. Event state: SyncConnected. Event type: NodeDeleted
2015-06-16 13:56:02,540 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK Connect state:State:CLOSED sessionid:0x14dfd34b4c1002b local:0.0.0.0/0.0.0.0:43276 remoteserver:apus5.labs.teradata.com/10.25.18.125:5181 lastZxid:8589936096 xid:44 sent:223 recv:239 queuedpkts:0 pendingresp:0 queuedevents:1
2015-06-16 13:56:02,540 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK is closed for service: hoststats
2015-06-16 13:56:02,567 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_FAIL, hostName, ma_host, ma_process]
2015-06-16 13:56:02,567 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
2015-06-16 13:56:02,567 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Need delayed alarm raising for: NODE_ALARM_SERVICE_HOSTSTATS_DOWN
2015-06-16 13:56:18,451 INFO  com.mapr.warden.service.baseservice.DependentService [main-EventThread]: can not start until kvstore is started.
2015-06-16 13:56:18,657 INFO  com.mapr.warden.service.baseservice.DependentService [main-EventThread]: Process path: /services/kvstore/apus1.labs.teradata.com. Event state: SyncConnected. Event type: NodeCreated
2015-06-16 13:56:18,658 ERROR com.mapr.warden.service.baseservice.DependentService process [main-EventThread]: Keeper Exception during watcher processing for: /services/kvstore/apus1.labs.teradata.com. Ignoring event
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /services/kvstore/apus1.labs.teradata.com
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
        at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
        at com.mapr.warden.service.baseservice.common.ZKUtilsLocking.getData(ZKUtilsLocking.java:39)
        at com.mapr.warden.service.baseservice.DependentService$DependentWatcher.process(DependentService.java:378)
        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2015-06-16 13:56:28,592 INFO  com.mapr.warden.service.baseservice.Service [Thread-10-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2015-06-16 13:56:28,592 INFO  com.mapr.warden.service.baseservice.Service [Thread-10-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x4dfd35b5d0001e local:/10.25.18.121:60617 remoteserver:apus4.labs.teradata.com/10.25.18.124:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:1
2015-06-16 13:56:28,592 INFO  com.mapr.warden.service.baseservice.Service [Thread-10]: Connected to ZK: apus4:5181,apus5:5181,apus6:5181With State: State:CONNECTED Timeout:30000 sessionid:0x4dfd35b5d0001e local:/10.25.18.121:60617 remoteserver:apus4.labs.teradata.com/10.25.18.124:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:1
2015-06-16 13:56:28,592 INFO  com.mapr.warden.service.baseservice.Service [Thread-10-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
2015-06-16 13:56:28,593 INFO  com.mapr.warden.service.baseservice.Service [Thread-10]: Node: /nodes/apus1.labs.teradata.com/services/kvstore does not exist yet
2015-06-16 13:56:32,388 ERROR com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext run [Thread-5]: RPC request failed with status: 1
2015-06-16 13:56:32,554 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2015-06-16 13:56:32,554 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20034 local:/10.25.18.121:35747 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2015-06-16 13:56:32,554 INFO  com.mapr.warden.service.baseservice.Service [Thread-12]: Connected to ZK: apus4:5181,apus5:5181,apus6:5181With State: State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20034 local:/10.25.18.121:35747 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:0 xid:2 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2015-06-16 13:56:32,554 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
2015-06-16 13:56:32,554 INFO  com.mapr.warden.service.baseservice.Service [Thread-12]: Node: /nodes/apus1.labs.teradata.com/services/hoststats does not exist yet
2015-06-16 13:58:58,801 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2015-06-16 13:58:59,513 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2015-06-16 14:00:00,548 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: Process path: /services/hoststats. Event state: SyncConnected. Event type: NodeChildrenChanged
2015-06-16 14:00:00,548 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20034 local:/10.25.18.121:35747 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:8589936103 xid:6 sent:25 recv:27 queuedpkts:0 pendingresp:0 queuedevents:0
2015-06-16 14:00:00,551 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeCreated
2015-06-16 14:00:00,551 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20025 local:/10.25.18.121:35731 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:8589935983 xid:42 sent:255 recv:276 queuedpkts:0 pendingresp:0 queuedevents:0
2015-06-16 14:00:00,551 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Thread: 26, NodeCreated: /services/nfs/master
2015-06-16 14:00:00,558 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: NodeCreated: Thread: 26, MasterIP: apus7.labs.teradata.com
2015-06-16 14:00:10,742 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDataChanged
2015-06-16 14:00:10,743 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20025 local:/10.25.18.121:35731 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:8589936142 xid:44 sent:258 recv:280 queuedpkts:0 pendingresp:0 queuedevents:0
2015-06-16 14:00:31,171 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/nfs/master. Event state: SyncConnected. Event type: NodeDeleted
2015-06-16 14:00:31,171 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24dfd3494d20025 local:/10.25.18.121:35731 remoteserver:apus6.labs.teradata.com/10.25.18.126:5181 lastZxid:8589936148 xid:46 sent:262 recv:285 queuedpkts:0 pendingresp:0 queuedevents:0

hoststats.log file
====================
Hoststats is shutting down with signal: 15. No further requests will be served**** starting hoststats **** args: 5660 /opt/mapr/server/data/TaskTracker.stats -S 1
isGatherStats=true
Setting continuous mode
2015-06-16 13:24:23,6606 Program: hoststats on Host:  IP: 0.0.0.0, Port: 1111, PID: 22547
2015-06-16 13:24:27,3444 ERROR Hoststats hoststats.cc:260 Thread: 140535019980544 Oops! Failed to connect to maprfs!  with error count since last shown error: 0
2015-06-16 13:29:34,0312 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 48
2015-06-16 13:34:44,2990 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 40
2015-06-16 13:39:45,3758 ERROR Hoststats hoststats.cc:260 Thread: 140535019980544 Oops! Failed to connect to maprfs!  with error count since last shown error: 37
2015-06-16 13:44:54,8128 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 36
2015-06-16 13:49:55,0502 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 35
2015-06-16 13:55:05,2957 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 35
2015-06-16 14:00:15,7764 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 35
2015-06-16 14:05:16,0193 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 33
2015-06-16 14:10:26,2700 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 35
2015-06-16 14:15:36,5159 ERROR Hoststats hoststats.cc:260 Thread: 140535077947136 Oops! Failed to connect to maprfs!  with error count since last shown error: 34

Outcomes