AnsweredAssumed Answered

Zookeeper fail tolerant does not work

Question asked by tienthanhakay on Aug 8, 2013
Latest reply on Aug 12, 2013 by tienthanhakay
Hi all!

I'm using MapR M5 trial.

I want to test fail tolerance of zookeeper.

I known if cluster have 2n+1 zookeeper nodes then the cluster can tolerance n fail nodes.

Now, My cluster have 3 zookeeper nodes(87.mapr; 88.mapr; 89.mapr), but when I stop zookeeper on one node (87.mapr) then the cluster is down.

CLDB can not connects to zookeeper.

This is CLDB log on 87.mapr(Log after stop zookeeper on 87.mapr):

`2013-08-09 18:11:56,051 WARN ClientCnxn [main-SendThread(87.mapr:5181)]: Session 0x4062b853df000c for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)
2013-08-09 18:11:56,281 INFO ClientCnxn [RPC-thread-8-SendThread(87.mapr:5181)]: Opening socket connection to server 88.mapr/10.0.1.88:5181
2013-08-09 18:11:56,282 INFO ClientCnxn [RPC-thread-8-SendThread(88.mapr:5181)]: Socket connection established to 88.mapr/10.0.1.88:5181, initiating session
2013-08-09 18:11:56,283 INFO ClientCnxn [RPC-thread-8-SendThread(88.mapr:5181)]: Unable to read additional data from server sessionid 0x4062b853df000d, likely server has closed socket, closing socket connection and attempting reconnect
2013-08-09 18:11:56,694 INFO ClientCnxn [RPC-thread-8-SendThread(88.mapr:5181)]: Opening socket connection to server 89.mapr/10.0.1.89:5181
2013-08-09 18:11:56,695 INFO ClientCnxn [RPC-thread-8-SendThread(89.mapr:5181)]: Socket connection established to 89.mapr/10.0.1.89:5181, initiating session
2013-08-09 18:11:56,696 INFO ClientCnxn [RPC-thread-8-SendThread(89.mapr:5181)]: Unable to read additional data from server sessionid 0x4062b853df000d, likely server has closed socket, closing socket connection and attempting reconnect
2013-08-09 18:11:56,828 INFO ClientCnxn [main-SendThread(87.mapr:5181)]: Opening socket connection to server 89.mapr/10.0.1.89:5181
2013-08-09 18:11:56,828 INFO ClientCnxn [main-SendThread(89.mapr:5181)]: Socket connection established to 89.mapr/10.0.1.89:5181, initiating session
2013-08-09 18:11:56,829 INFO ClientCnxn [main-SendThread(89.mapr:5181)]: Unable to read additional data from server sessionid 0x4062b853df000c, likely server has closed socket, closing socket connection and attempting reconnect
2013-08-09 18:11:57,425 INFO ClientCnxn [RPC-thread-8-SendThread(89.mapr:5181)]: Opening socket connection to server 87.mapr/10.0.1.87:5181
2013-08-09 18:11:57,425 WARN ClientCnxn [RPC-thread-8-SendThread(87.mapr:5181)]: Session 0x4062b853df000d for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1143)`

And this is log of zookeeper on 88.mapr(Log after stop zookeeper on 87.mapr):

`2013-08-09 18:13:52,434 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn$Factory@251] - Accepted socket connection from /10.0.1.90:40785
2013-08-09 18:13:52,434 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn@639] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-08-09 18:13:52,434 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn@1435] - Closed socket connection for client /10.0.1.90:40785 (no session established for client)
2013-08-09 18:13:52,526 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn$Factory@251] - Accepted socket connection from /10.0.1.89:55510
2013-08-09 18:13:52,527 - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn@639] - Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
2013-08-09 18:13:52,527 - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:5181:NIOServerCnxn@1435] - Closed socket connection for client /10.0.1.89:55510 (no session established for client)
`

Please tell my mistake or helpful information to me.

Thanks a lot!

Update:

This is content of /opt/mapr/zookeeper/zookeeper-3.3.6/conf/zoo.cfg file on all of zookeeper node (the same on each other)

   # The number of milliseconds of each tick

tickTime=2000

   # The number of ticks that the initial

   # synchronization phase can take

initLimit=20

   # The number of ticks that can pass between

   # sending a request and getting an acknowledgement

syncLimit=10

   # the directory where the snapshot is stored.

dataDir=/opt/mapr/zkdata

   # the port at which the clients will connect

clientPort=5181

   # max number of client connections

maxClientCnxns=100

server.0=87.mapr:2888:3888

server.1=88.mapr:2888:3888

server.2=89.mapr:2888:3888

Update:

This is more log of zookeeper:

On 88.mapr zookeeper node((Log when I restart 87.mapr zookeeper)):

`2013-08-10 09:20:15,894 - WARN  [QuorumPeer:/0:0:0:0:0:0:0:0:5181:Learner@228] - Unexpected exception, tries=0, connecting to 87.mapr/10.0.1.87:2888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
        at java.net.Socket.connect(Socket.java:579)
        at org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:220)
        at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:65)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
2013-08-10 09:20:15,924 - INFO  [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 25769805178 (n.zxid), 7 (n.round), LOOKING (n.state), 0 (n.sid), FOLLOWING (my state)
`

On 89.mapr zookeeper node(Log when I restart 87.mapr zookeeper):

`
2013-08-10 09:19:35,505 - WARN  [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 0 at election address 87.mapr/10.0.1.87:3888
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
        at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:115)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)
        at java.lang.Thread.run(Thread.java:722)
2013-08-10 09:19:35,506 - INFO  [WorkerReceiver Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 25769805178 (n.zxid), 7 (n.round), LOOKING (n.state), 2 (n.sid), LOOKING (my state)
`

Outcomes