AnsweredAssumed Answered

zookeeper dies after service mapr-warden start command on cldb node

Question asked by suxingfate on Jan 20, 2015
Latest reply on Jan 21, 2015 by suxingfate
I'm using mapr 3.1.

1 I startup all 3 zookeeper in my 3 nodes.
2 service mapr-zookeeper qstatus show that follower or leader is assign correctly
3 when I use service mapr-warden start to start warden, then zookeeper becomes dead and giving the following log.
4 if I use service mapr-zookeeper qstatus to query the status, it will hang there giving no response.

Could you help me. Thanks.

**

 1. zookeeper.log

**

2015-01-20 18:53:17,299 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:java.class.path=/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../build/classes:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../build/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../zookeeper-3.4.5-mapr-1406.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.4.5/conf::/opt/mapr/lib/maprfs-1.0.3-mapr-3.1.1.jar::::/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/flexjson-2.1.jar:
2015-01-20 18:53:17,299 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:java.library.path=/opt/mapr/lib
2015-01-20 18:53:17,299 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:java.io.tmpdir=/tmp
2015-01-20 18:53:17,300 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:java.compiler=
2015-01-20 18:53:17,300 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:os.name=Linux
2015-01-20 18:53:17,300 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:os.arch=amd64
2015-01-20 18:53:17,301 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:os.version=3.0.80-0.5.1.5639.0.PTF-default
2015-01-20 18:53:17,301 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:user.name=mapr
2015-01-20 18:53:17,301 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:user.home=/home/mapr
2015-01-20 18:53:17,301 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Environment@100] - Server environment:user.dir=/root
2015-01-20 18:53:17,303 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:ZooKeeperServer@162] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /opt/mapr/zkdata/version-2 snapdir /opt/mapr/zkdata/version-2
2015-01-20 18:53:17,303 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Follower@63] - FOLLOWING - LEADER ELECTION TOOK - 229
2015-01-20 18:53:17,332 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Learner@322] - Getting a diff from the leader 0xa0000004c
2015-01-20 18:53:17,336 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:FileTxnSnapLog@240] - Snapshotting: 0xa0000004c to /opt/mapr/zkdata/version-2/snapshot.a0000004c
2015-01-20 18:53:17,338 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:FileTxnSnapLog@240] - Snapshotting: 0xa0000004c to /opt/mapr/zkdata/version-2/snapshot.a0000004c
STARTED
2015-01-20 18:53:26,044 [myid:] - INFO  [main:FourLetterWordMain@43] - connecting to localhost 5181
2015-01-20 18:53:26,059 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxnFactory@197] - Accepted socket connection from /127.0.0.1:42991
2015-01-20 18:53:26,126 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn@821] - Processing srvr command from /127.0.0.1:42991
2015-01-20 18:53:26,129 [myid:1] - INFO  [Thread-3:NIOServerCnxn@1001] - Closed socket connection for client /127.0.0.1:42991 (no session established for client)
2015-01-20 18:54:46,848 [myid:1] - WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:Follower@118] - Got zxid 0xb00000001 expected 0x1
2015-01-20 18:54:46,848 [myid:1] - INFO  [SyncThread:1:FileTxnLog@199] - Creating new log file: log.b00000001
2015-01-20 18:55:27,844 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxnFactory@197] - Accepted socket connection from /169.254.200.2:51628
2015-01-20 18:55:27,848 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:ZooKeeperServer@832] - Client attempting to renew session 0x24b06f99e9c0000 at /169.254.200.2:51628
2015-01-20 18:55:27,849 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:Learner@107] - Revalidating client: 0x24b06f99e9c0000
2015-01-20 18:55:27,855 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:5181:ZooKeeperServer@595] - Established session 0x24b06f99e9c0000 with negotiated timeout 30000 for client /169.254.200.2:51628
2015-01-20 18:55:27,858 [myid:1] - ERROR [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxnFactory$1@44] - Thread Thread[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181,5,main] died
java.lang.NoClassDefFoundError: org/apache/commons/codec/binary/Base64
        at com.mapr.security.simplesasl.SimpleSaslServer.evaluateResponse(SimpleSaslServer.java:33)
        at org.apache.zookeeper.server.ZooKeeperSaslServer.evaluateResponse(ZooKeeperSaslServer.java:149)
        at org.apache.zookeeper.server.ZooKeeperServer.processSasl(ZooKeeperServer.java:932)
        at org.apache.zookeeper.server.ZooKeeperServer.processPacket(ZooKeeperServer.java:905)
        at org.apache.zookeeper.server.NIOServerCnxn.readRequest(NIOServerCnxn.java:365)
        at org.apache.zookeeper.server.NIOServerCnxn.readPayload(NIOServerCnxn.java:202)
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:236)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.codec.binary.Base64
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 9 more
2015-01-20 18:55:58,004 [myid:1] - INFO  [CommitProcessor:1:NIOServerCnxn@1001] - Closed socket connection for client /169.254.200.2:51628 which had sessionid 0x24b06f99e9c0000
2015-01-20 18:57:23,538 [myid:] - INFO  [main:FourLetterWordMain@43] - connecting to localhost 5181
                  

 1. **and warden.log**

no jobtracker to stop
Header: hostName: SC-2, Time Zone: China Standard Time, processName: warden, processId: 26146, MapR Build Version: 3.1.1.26113.GA
2015-01-20 18:54:46,655 INFO  com.mapr.warden.WardenMain [main]: Log dir: /opt/mapr/hadoop/hadoop-0.20.2/logs
2015-01-20 18:54:46,688 INFO  com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext [main]: init MAPRContext
2015-01-20 18:54:46,688 INFO  com.mapr.job.mngmnt.hadoop.metrics.MaprRPCContext [main]: init MAPRContextHS
2015-01-20 18:54:46,701 INFO  com.mapr.warden.WardenMain [main]: My pid: 26372
2015-01-20 18:54:46,859 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main]: Connected to ZK: 169.254.200.1:5181,169.254.200.2:5181,169.254.200.3:5181With State: State:CONNECTED Timeout:30000 sessionid:0x24b06f99e9c0000 local:/169.254.200.2:40816 remoteserver:PL-3/169.254.200.3:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2015-01-20 18:54:46,860 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24b06f99e9c0000 local:/169.254.200.2:40816 remoteserver:PL-3/169.254.200.3:5181 lastZxid:0 xid:1 sent:1 recv:1 queuedpkts:0 pendingresp:0 queuedevents:0
2015-01-20 18:54:46,860 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2015-01-20 18:54:46,867 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
Warden started
Warden started
2015-01-20 18:55:06,959 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: Disconnected. Event type: None
2015-01-20 18:55:06,966 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:55:07,210 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24b06f99e9c0000 local:/169.254.200.2:38102 remoteserver:SC-1/169.254.200.1:5181 lastZxid:0 xid:3 sent:2 recv:2 queuedpkts:0 pendingresp:0 queuedevents:1
2015-01-20 18:55:07,210 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2015-01-20 18:55:07,210 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
2015-01-20 18:55:27,319 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:55:27,319 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: Disconnected. Event type: None
2015-01-20 18:55:27,856 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: ZK Connect state:State:CONNECTED Timeout:30000 sessionid:0x24b06f99e9c0000 local:/169.254.200.2:51628 remoteserver:SC-2/169.254.200.2:5181 lastZxid:0 xid:4 sent:3 recv:3 queuedpkts:1 pendingresp:0 queuedevents:0
2015-01-20 18:55:27,856 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SyncConnected. Event type: None
2015-01-20 18:55:27,856 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: SaslAuthenticated. Event type: None
2015-01-20 18:55:47,955 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:55:47,955 INFO  com.mapr.warden.service.baseservice.zksessionmgmnt.ZookeeperClientSessionManagement [main-EventThread]: Process path: null. Event state: Disconnected. Event type: None
2015-01-20 18:55:59,041 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:56:09,490 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:56:21,308 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:56:32,098 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
2015-01-20 18:56:42,883 ERROR com.mapr.warden.service.baseservice.common.ZKUtilsLocking checkZKNodeForExistence [main]: Lost connection to ZK while trying to checkZKNodeForExistence of: /nodes/SC-2/stop. Retrying...
                                                                                                                            


Outcomes