AnsweredAssumed Answered

Spark Streaming - Zoo Keeper Timeout?

Question asked by john.humphreys on Jan 30, 2018
Latest reply on Feb 20, 2018 by Harikrishnan Cheneperth Kunhumveettil

My long-running spark-streaming job is clearly dying periodically from a zoo-keeper expiration.


The job also sometimes dies after a week due to a YARN ticket expiration (I know how to fix that one); but since that happens sometimes, I think the ZK timeout may be longer than a week and may not be directly tied to spark?


Is there a configuration setting I can use to stop this, or is there a function I can call to fix it when it happens? (like respond to a listener/etc).


Note that I think the critical line here is probably Session expired for /services/resourcemanager/master.


[2018-01-30 00:23:14,134] WARN ZK Reset due to SessionExpiration for ZK:,, (com.mapr.util.zookeeper.ZKDataRetrieval)
[2018-01-30 00:23:22,564] ERROR ZK Session expired. Need to reset ZK completely for node: /services/resourcemanager/master (com.mapr.baseutils.zookeeper.ZKUtils)
[2018-01-30 00:23:22,564] ERROR Most likely SessionExpirationException. Need to reset ZK and call myself again (com.mapr.util.zookeeper.ZKDataRetrieval)
com.mapr.baseutils.zookeeper.ZKClosedException: ZK client was closed


Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /services/resourcemanager/master
at org.apache.zookeeper.KeeperException.create(
at org.apache.zookeeper.KeeperException.create(
at org.apache.zookeeper.ZooKeeper.getData(
at com.mapr.baseutils.zookeeper.ZKUtils.getData(
... 112 more
[2018-01-30 00:23:22,574] ERROR Unable to determine ResourceManager service address from Zookeeper at,, (org.apache.hadoop.yarn.client.MapRZKRMFinderUtils)
[2018-01-30 00:23:22,576] ERROR Failed to properly truncate all lineage (and checkpoint). ($)