Fixing zookeeper quorum issue

Document created by rsingh on Feb 13, 2016
Version 1Show Document
  • View in full screen mode

Author: Rajkumar Singh

 

Original Publication Date: May 3, 2015

 

Env:

M3/M5

 

Symptom:

user can see the following zookeeper logs when he tries to check the qstatus

service mapr-zookeeper qstatus

 

Zookeeper logs:

WARN [WorkerSender Thread:QuorumCnxManager@384] - Cannot open channel to 2 at election address mr3/10.10.10.4:3888 java.net.NoRouteToHostException: No route to host at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:115)

at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)

at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:340)

at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:360)

at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:333)

at java.lang.Thread.run(Thread.java:724) 2013-07-29 12:14:01,262 - WARN [QuorumPeer:/0:0:0:0:0:0:0:0:5181:QuorumCnxManager@384] - Cannot open channel to 2 at election address mr3/10.10.10.4:3888 java.net.NoRouteToHostException: No route to host

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)

at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:115)

at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:371)

at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:404)

at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:688)

at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:622) 2013-07-29 12:14:01,264 - WARN

Root Cause:

During the leader election zookeeper node is unable to connect to the other node due to network problem or some intermittent issue with the zookeeper process. the same can be verified by using ping or telnet on dest zookeeper node on port 3888.

Solution:

Check the connectivity among the zookeeper nodes using telnet as follows

telnet xxx.xxx.xxx.xxx 2181

 

Trying xxx.xxx.xxx.xxx...

Connected to xxx.xxx.xxx.xxx.

Escape character is '^]'.

Connection closed by foreign host.

if unable to connect to the zookeeper then ensure the connectivity between the zookeeper nodes.

Attachments

    Outcomes