AnsweredAssumed Answered

tasktracker,jobtracker and cldb not working

Question asked by satyajit on Sep 4, 2013
Latest reply on Sep 14, 2013 by nabeel
Hi ,

i have installed mapr on a single node with all the services but i see the warden service is failing in bringing up the jobtracker and tasktracker services.

even the CLDB is not running ,when manually starting it using "service mapr-cldb start" some times it works but most of the time it doesnt.

PFB log data in cldb.log after "service mapr-cldb start" is issued.

<pre>
2013-09-05 08:45:30,791 INFO ClientCnxn [main-SendThread(mapr:5181)]: Opening socket connection to server mapr/192.168.2.3:5181
2013-09-05 08:45:32,516 WARN ClientCnxn [main-SendThread(mapr:5181)]: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-09-05 08:45:33,860 INFO ClientCnxn [main-SendThread(mapr:5181)]: Opening socket connection to server mapr/192.168.2.3:5181
2013-09-05 08:45:36,859 WARN ClientCnxn [main-SendThread(mapr:5181)]: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-09-05 08:45:38,693 INFO ClientCnxn [main-SendThread(mapr:5181)]: Opening socket connection to server mapr/192.168.2.3:5181
2013-09-05 08:45:39,860 WARN ClientCnxn [main-SendThread(mapr:5181)]: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-09-05 08:45:41,108 INFO ClientCnxn [main-SendThread(mapr:5181)]: Opening socket connection to server mapr/192.168.2.3:5181
2013-09-05 08:45:44,108 WARN ClientCnxn [main-SendThread(mapr:5181)]: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-09-05 08:45:45,931 INFO ClientCnxn [main-SendThread(mapr:5181)]: Opening socket connection to server mapr/192.168.2.3:5181
2013-09-05 08:45:47,108 WARN ClientCnxn [main-SendThread(mapr:5181)]: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)

</pre>

but after that i see cldb up in jps.

warden.log data-----------------------
<pre>
2013-09-05 08:42:45,984 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:43:15,985 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:43:45,987 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:44:15,988 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:44:45,988 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:44:51,315 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: /opt/mapr/server/pullcentralconfig process terminated with status: 0
2013-09-05 08:45:15,989 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:45:45,990 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:46:15,991 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:46:45,992 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
2013-09-05 08:47:15,762 INFO  com.mapr.warden.centralconfig.PullCentralConfigTaskScheduler [PullCentralConfigTask]: Launching a separate process to execute /opt/mapr/server/pullcentralconfig
2013-09-05 08:47:15,993 ERROR com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement connect [main]: Could not connect to ZK within: 30000 ms. Check if ZK connection defined correctly: mapr:5181. No data from ZK will be returned.
-----------------------------------------
mfs.log data
2013-09-04 23:21:06,4604 INFO  fileserver.cc:7777 x.x.0.0:0 recieved updated no-compress list from cldb: bz2,gz,tgz,tbz2,zip,z,Z,mp3,jpg,jpeg,mpg,mpeg,avi,gif,png,lzo,j
2013-09-04 23:21:06,5615 ERROR  fileserver.cc:6642 x.x.0.0:0 heartbeat thread not scheduled for 72371 msec
2013-09-04 23:23:53,4511 INFO  fileserver.cc:7168 x.x.0.0:0 Sending full container report to cldb.
2013-09-04 23:23:53,4512 INFO  fs/server/container/containerreport.h:77 x.x.0.0:0 ID : 1
2013-09-04 23:23:53,4721 INFO  fileserver.cc:7278 x.x.0.0:0 Sending vol list with 1 volumes.
2013-09-04 23:26:58,0445 INFO  fileserver.cc:7168 x.x.0.0:0 Sending full container report to cldb.
2013-09-04 23:26:58,0445 INFO  fs/server/container/containerreport.h:77 x.x.0.0:0 ID : 1
2013-09-04 23:26:58,0458 INFO  fileserver.cc:7278 x.x.0.0:0 Sending vol list with 1 volumes.
2013-09-04 23:28:05,8219 ERROR  cldbha.cc:929 x.x.0.0:0 Failed to reach CLDB node due to error Connection reset by peer (104) for operation 2345.33 at 192.168.2.3:7222. Will retry after finding CLDB master.
2013-09-04 23:28:05,8221 ERROR  cldbha.cc:675 x.x.0.0:0 Got error Connection reset by peer (104) while trying to register with CLDB 192.168.2.3:7222
</pre>
---------------------------------------------


Outcomes