AnsweredAssumed Answered

CLDB Down Alarm on several nodes

Question asked by communityadmin on Jun 25, 2014
Latest reply on May 18, 2016 by hejorgel
I'm seeing the "CLDB Down Alarm" in the dashboard for several nodes in a four-node cluster. On each node where the alarm is present, jps shows that a CLDB process *is* running, but 'service mapr-cldb status' returns "/opt/mapr/logs/cldb.pid exists with pid 21000 but no CLDB." Additionally, the PID listed in /opt/mapr/logs/cldb.pid does not match the PID of the running CLDB process. /opt/mapr/logs/cldb.log is also filled with
<pre>
Error: Exception thrown by the agent : java.rmi.server.ExportException: Port already in use: 7220; nested exception is: java.net.BindException: Address already in use
</pre>
which I suppose is because CLDB is already running.

I've tried killing the running CLDB process, stopping Warden, removing the cldb.pid file, and then restarting Warden; this clears the alarm and makes the PIDs for the running CLDB service and cldb.pid match, but the alarm seems to come back after a while, with the same symptoms described above.

It seems that Warden doesn't know about the running CLDB process, and is trying to start another one, but is failing because port 7220 is already being used. Could CLDB somehow be getting started outside of Warden? Or if not, why doesn't Warden know about the running CLDB process?

Outcomes