AnsweredAssumed Answered

can't start mapr after install : Cannot open /proc/net/dev : No such file or directory

Question asked by laurent55 on Sep 9, 2013
Latest reply on Sep 9, 2013 by laurent55
Hi

I recently tried to install mapR M5 version and I'm not able to stat warden service on my node01 (the machine warden is installed on).

Here is an extract of the logs, I can't figure out what's going on :

    2013-09-04 18:45:55,950 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [cldb_monitor]: []
    2013-09-04 18:45:55,965 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/cldb/ns308207.ovh.net. Event state: SyncConnected. Event type: NodeDataChanged
    2013-09-04 18:45:55,965 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x40e9bf6c300009 local:/xxx.xxx.xxx.xxx:36512 remoteserver:node01/xxx.xxx.xxx.xxx:5181 lastZxid:4294967542 xid:27 sent:29 recv:35 queuedpkts:0 pendingresp:0 queuedevents:0
    2013-09-04 18:46:03,491 INFO  com.mapr.warden.service.CLDBService [main-EventThread]: Process path: /datacenter/controlnodes/cldb/active/CLDBRunningMaster. Event state: SyncConnected. Event type: NodeCreated
    2013-09-04 18:46:03,491 INFO  com.mapr.warden.WardenManager [main-EventThread]: Process path: /datacenter/controlnodes/cldb/active/CLDBRunningMaster. Event state: SyncConnected. Event type: NodeCreated
    2013-09-04 18:46:03,493 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process delayed alarms: 3
    2013-09-04 18:46:03,499 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/cldb/master. Event state: SyncConnected. Event type: NodeDataChanged
    2013-09-04 18:46:03,499 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x40e9bf6c300009 local:/xxx.xxx.xxx.xxx:36512 remoteserver:node01/xxx.xxx.xxx.xxx:5181 lastZxid:4294967546 xid:32 sent:34 recv:42 queuedpkts:0 pendingresp:0 queuedevents:0
    2013-09-04 18:46:05,020 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
    2013-09-04 18:46:05,021 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
    2013-09-04 18:46:05,034 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: Monitor command: [/etc/init.d/mapr-hoststats, status]can not determine if service: hoststats is running. Retrying. Retrial #2. Total retries count is: 3
    2013-09-04 18:46:05,034 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]:  * hoststats is not running
    
    2013-09-04 18:46:05,034 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]: Command: [/etc/init.d/mapr-hoststats, start], Directory: /etc/init.d/
    2013-09-04 18:46:05,065 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [hoststats_monitor]:
    2013-09-04 18:46:15,065 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_RUN, hostName, ma_host, ma_process]
    2013-09-04 18:46:15,065 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
    2013-09-04 18:46:15,083 ERROR com.mapr.warden.service.baseservice.Service$ServiceMonitorRun run [hoststats_monitor]: Monitor command: [/etc/init.d/mapr-hoststats, status]cannot determine if service: hoststats is running. Number of retrials exceeded. Closing Zookeeper
    2013-09-04 18:46:15,083 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: 46 about to close zk for service: hoststats
    2013-09-04 18:46:15,090 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats/ns308207.ovh.net. Event state: SyncConnected. Event type: NodeDeleted
    2013-09-04 18:46:15,090 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x140e9bf6c0f000b local:/xxx.xxx.xxx.xxx:35846 remoteserver:node02/xxx.xxx.xxx.xxx:5181 lastZxid:4294967541 xid:30 sent:34 recv:41 queuedpkts:0 pendingresp:1 queuedevents:1
    2013-09-04 18:46:15,091 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main-EventThread]: SessionExpiredException while trying to checkZKNodeForExistence of: /services/kvstore/ns308207.ovh.net
    2013-09-04 18:46:15,091 ERROR com.mapr.warden.service.baseservice.DependentService checkifDependentServiceChanged [main-EventThread]: ZK Session was either or closed or expired for service: hoststats
    2013-09-04 18:46:15,091 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main-EventThread]: SessionExpiredException while trying to checkZKNodeForExistence of: /services/hoststats/ns308207.ovh.net
    2013-09-04 18:46:15,091 ERROR com.mapr.warden.service.baseservice.Service process [main-EventThread]: ZK Session was either or closed or expired for service: hoststats
    2013-09-04 18:46:15,091 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: Process path: /services/hoststats/master. Event state: SyncConnected. Event type: NodeDeleted
    2013-09-04 18:46:15,092 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK Connect state:State:CLOSED sessionid:0x140e9bf6c0f000b local:0.0.0.0/0.0.0.0:35846 remoteserver:node02/xxx.xxx.xxx.xxx:5181 lastZxid:4294967550 xid:32 sent:34 recv:42 queuedpkts:0 pendingresp:0 queuedevents:1
    2013-09-04 18:46:15,092 INFO  com.mapr.warden.service.baseservice.Service [main-EventThread]: ZK is closed for service: hoststats
    2013-09-04 18:46:15,115 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: [e_SERV_FAIL, hostName, ma_host, ma_process]
    2013-09-04 18:46:15,115 INFO  com.mapr.job.mngmnt.hadoop.metrics.WardenRequestBuilder [hoststats_monitor]: []
    2013-09-04 18:46:15,115 INFO  com.mapr.warden.service.baseservice.Service [hoststats_monitor]: Alarm raising command: [/opt/mapr/bin/maprcli, alarm, raise, -alarm, NODE_ALARM_SERVICE_HOSTSTATS_DOWN, -entity, ns308207.ovh.net, -description, Can not determine if service: hoststats is running. Check logs at: /opt/mapr/logs/hoststats.log]
    2013-09-04 18:46:45,101 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
    2013-09-04 18:46:45,101 INFO  com.mapr.warden.service.baseservice.Service [Thread-12]: Connected to ZK: xxx.xxx.xxx.xxx:5181,xxx.xxx.xxx.xxx:5181,xxx.xxx.xxx.xxx:5181With State: State:CONNECTED Timeout:30000 sessionid:0x240e9d9f6450003 local:/xxx.xxx.xxx.xxx:51769 remoteserver:node03/xxx.xxx.xxx.xxx:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
    2013-09-04 18:46:45,101 INFO  com.mapr.warden.service.baseservice.Service [Thread-12-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x240e9d9f6450003 local:/xxx.xxx.xxx.xxx:51769 remoteserver:node03/xxx.xxx.xxx.xxx:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
    2013-09-04 18:46:45,103 INFO  com.mapr.warden.service.baseservice.Service [Thread-12]: Node: /nodes/ns308207.ovh.net/services/hoststats does not exist yet

Looking at the logs, it seems there is an issue with hoststats.
Ok, let's take a look in hoststats logs :

    root@node01:~# tail -f /opt/mapr/logs/hoststats.log
    
    **** starting hoststats **** args: 5660 /opt/mapr/logs/TaskTracker.stats -S 1
    isGatherStats=true
    Error: Cannot open /proc/net/dev : No such file or directory
    Setting continuous mode
    2013-09-04 18:45:55,0265 Program: hoststats on Host:  IP: 0.0.0.0, Port: 1111, PID: 5733
    **** starting hoststats **** args: 5660 /opt/mapr/logs/TaskTracker.stats -S 1
    isGatherStats=true
    Error: Cannot open /proc/net/dev : No such file or directory
    Setting continuous mode
    2013-09-04 18:46:05,0709 Program: hoststats on Host:  IP: 0.0.0.0, Port: 1111, PID: 6294

I think there is something to find on permissions side (I added a "mapr" user as indicated in the tutorial and if I try to cat as mapr, I officialy had nothing) :

    root@node01:~# cat /proc/net/dev
    Inter-|   Receive                                                |  Transmit
     face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    dummy0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
     bond0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
      eth0: 880504758  616319    0   18    0     0          0         0  8399989  102548    0    0    0     0       0          0
    ip6tnl0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        lo: 4880020   49941    0    0    0     0          0         0  4880020   49941    0    0    0     0       0          0
      sit0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
     tunl0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    
    root@node01:~# su mapr
    mapr@node01:~# cat /proc/net/dev
    cat: /proc/net/dev: No such file or directory

I added the passwordless mapr user in /etc/sudoers to be sure this wasn't an issue.

Any clue ?

Outcomes