AnsweredAssumed Answered

org.apache.zookeeper.KeeperException$SessionExpiredException and CLDB shutdown

Question asked by akusanagi Employee on Aug 30, 2012
Latest reply on Aug 30, 2012 by nabeel
I have a 8-node cluster and tested CLDB failover. When the node 'mr2' is a CLDB master, I shutdown 'mr2', then 'mr1' was going to become a new CLDB master. However, before finishing the failover process, CLDB on 'mr1' went down due to org.apache.zookeeper.KeeperException$SessionExpiredException, and finally 'mr3' became a CLDB master.

What are possible reasons that prevented 'mr1' from becoming a CLDB master?

Is the problem related to this?

> http://www.mapr.com/doc/display/MapR/Version+1.2.9+Release+Notes
>
> (Issue 7218) Fixed CLDB rpc problems
> on multi-homed servers (caused CLDB
> shutdown on ZooKeeper restart)

Here is cldb.log on 'mr1'  ('mr2' shutdown happened at 15:12:41):

    2012-07-20 15:04:04,571 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event None on path null
    2012-07-20 15:04:04,571 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: onZKDisconnect: Received disconnect from ZooKeeper.
    2012-07-20 15:04:05,243 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event None on path null
    2012-07-20 15:04:05,243 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: onZKConnect: Received connect from ZooKeeper.
    2012-07-20 15:04:05,246 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: onZKConnect: Established connection to ZooKeeper
    2012-07-20 15:04:25,248 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: CLDB has latest epoch. Checking cleanbit
    2012-07-20 15:04:25,248 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: KvStore is clean and of latest epoch CLDB trying to become Master
    2012-07-20 15:04:25,251 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient createActiveEphemeralMasterZNode: /datacenter/controlnodes/cldb/active/CLDBMaster already exists
    2012-07-20 15:04:25,251 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: Some other CLDB become master. Current CLDB is Slave
    2012-07-20 15:04:25,252 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: CLDB got role of slave
    2012-07-20 15:09:13,114 ERROR com.mapr.fs.cldb.CLDBServer [pool-2-thread-3]: ContainerLookup: CID: 0 REQFROM: 10.10.10.1:9998 Container not found
    2012-07-20 15:13:23,423 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event NodeDeleted on path /datacenter/controlnodes/cldb/active/CLDBMaster
    2012-07-20 15:13:43,425 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: CLDB has latest epoch. Checking cleanbit
    2012-07-20 15:13:43,425 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: KvStore is clean and of latest epoch CLDB trying to become Master
    2012-07-20 15:13:43,429 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: ZooKeeperClient: CLDB is current Master
    2012-07-20 15:13:43,429 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: CLDB became master. Initializing KvStoreContainer for cid: 1
    2012-07-20 15:13:43,430 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Wait for ZooKeeper Connected thread]: Storing KvStoreContainerInfo to ZooKeeper  Container ID:1 Servers:  10.10.10.1:5660--23-VALID(6831581122999395286) Inactive Servers:  10.10.10.2:5660--23-VALID(7388837246089933381) 10.10.10.3:5660--23-VALID(5789078640259552394) Unused Servers:  Latest epoch:23 SizeMB:0
    2012-07-20 15:13:43,432 INFO  com.mapr.fs.cldb.CLDBServer [Wait for ZooKeeper Connected thread]: Starting thread to monitor waiting for local kvstore to become master
    2012-07-20 15:17:45,079 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: FSRegister: Request  FSID: 1733265305624617987 FSNetworkLocation: / FSHost:Port: 10.10.10.4:5660- FSHostName: mr4 StoragePools 5a02633da77c4de4005004bf22008798-8f8050ee3c1bd4e8005004bf36091a41-34ec0c6b7035c458005004bf2c05b50a- Capacity: 999262 Available: 811808 Used: 187453 Role: 0 isDCA: false Received registration request
       :
    2012-07-20 15:18:00,238 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.8:52289 Generating reply with status: 3
    2012-07-20 15:18:00,238 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.8:52761 Generating reply with status: 3
    2012-07-20 15:18:00,251 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.3:60942 Generating reply with status: 3
    2012-07-20 15:18:00,261 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: FSRegister: Request  FSID: 6831581122999395286 FSNetworkLocation: / FSHost:Port: 10.10.10.1:5660- FSHostName: mr1 StoragePools b3c908bc172c59fe005004bfe7018710-64da57498c143800005004bfe80571a0-cc635544c4023b1f005004bfe90b3d18- Capacity: 4258570 Available: 3917780 Used: 340790 Role: 0 isDCA: false Received registration request
    2012-07-20 15:18:00,262 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: Cluster uuid is -8018526689394170827-1451703810922269592
    2012-07-20 15:18:00,275 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.7:42329 Generating reply with status: 3
    2012-07-20 15:18:00,282 INFO  com.mapr.fs.cldb.counters.FileServerMetrics [pool-1-thread-6]: Initializing File Server Metrics with hostName=mr1
    2012-07-20 15:18:00,282 INFO  com.mapr.fs.cldb.topology.FileServer [pool-1-thread-6]: Instantiating fileserver metrics with context:com.mapr.fs.cldb.counters.MapRGangliaContext31
    2012-07-20 15:18:00,283 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: FSRegister: Registered FileServer: 10.10.10.1:5660-
    2012-07-20 15:18:00,287 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.5:45302 Generating reply with status: 3
    2012-07-20 15:18:00,289 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.7:58045 Generating reply with status: 3
    2012-07-20 15:18:00,294 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.7:52217 Generating reply with status: 3
    2012-07-20 15:18:00,296 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 2 from 10.10.10.1:5660 Generating reply with status: 3
    2012-07-20 15:18:00,303 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: Allocating WorkUnit type : NOCOMPRESS_LIST_UPDATED for container 0 with sequence number 0 to 10.10.10.1:5660-
    2012-07-20 15:18:00,319 INFO  com.mapr.fs.cldb.Containers [pool-1-thread-6]: Processing stale containers  on StoragePool b3c908bc172c59fe005004bfe7018710 from FileServer 10.10.10.1:5660-
    2012-07-20 15:18:00,320 FATAL com.mapr.fs.cldb.CLDB [pool-1-thread-6]: CLDBShutdown: ZooKeeperClient getKvStoreContainerInfo: Could not read container info from ZooKeeper
    2012-07-20 15:18:00,321 FATAL com.mapr.fs.cldb.CLDB [pool-1-thread-6]: CLDB Exception
    org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /datacenter/controlnodes/cldb/epoch/1/KvStoreContainerInfo
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at com.mapr.fs.cldb.zookeeper.ZooKeeperClient.getKvStoreContainerInfo(ZooKeeperClient.java:304)
at com.mapr.fs.cldb.ActiveContainersMap.getKvStoreContainerInfo(ActiveContainersMap.java:773)
at com.mapr.fs.cldb.ActiveContainersMap.containerLookupWithoutLocations(ActiveContainersMap.java:335)
at com.mapr.fs.cldb.ActiveContainersMap.containerLookup(ActiveContainersMap.java:325)
at com.mapr.fs.cldb.Containers.processStaleContainer(Containers.java:5487)
at com.mapr.fs.cldb.Containers.processACRStaleContainersOnStoragePool(Containers.java:6067)
at com.mapr.fs.cldb.Containers.fileServerActiveContainerReport(Containers.java:3130)
at com.mapr.fs.cldb.CLDBServer.fileServerActiveContainerReport(CLDBServer.java:4821)
at com.mapr.fs.cldb.CLDBServer.processRpc(CLDBServer.java:2623)
at com.mapr.fs.cldb.CLDBServer.requestArrived(CLDBServer.java:1723)
at com.mapr.fs.Rpc$RpcExecutor.run(Rpc.java:133)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
    2012-07-20 15:18:00,321 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-6]: Shutdown: Stopping CLDB
    2012-07-20 15:18:00,321 INFO  com.mapr.fs.cldb.CLDBServer [pool-1-thread-7]: RPC: PROGRAMID: 2345 PROCEDUREID: 8 from 10.10.10.3:59233 Generating reply with status: 30
    2012-07-20 15:18:00,322 INFO  com.mapr.fs.cldb.CLDB [Thread-10]: CLDB ShutDown Hook called
    2012-07-20 15:18:00,322 INFO  com.mapr.fs.cldb.zookeeper.ZooKeeperClient [Thread-10]: Zookeeper Client: Closing client connection:
    2012-07-20 15:18:00,325 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event NodeDeleted on path /datacenter/controlnodes/cldb/active/CLDBMaster
    2012-07-20 15:18:00,325 INFO  com.mapr.fs.cldb.CLDBServer [main-EventThread]: ZooKeeper event of type: NodeDeleted on path /datacenter/controlnodes/cldb/active/CLDBMaster
    2012-07-20 15:18:00,325 INFO  com.mapr.fs.cldb.CLDB [Thread-10]: CLDB shutdown
    Header: hostName: mr1, Time Zone: Japan Standard Time, processName: cldb, processId: 6251, MapR Build Version: 1.2.7.14133EMC
    2012-07-20 15:18:48,512 INFO  com.mapr.fs.cldb.CLDB [main]: Initializing CLDB

Outcomes