AnsweredAssumed Answered

Machine overload causes apparent CLDB failure

Question asked by jimmy888 on Jul 15, 2011
Latest reply on Jul 15, 2011 by lohit
Hi, There:
  I am doing some stress testing of the mapr. I prepared a directory of 2.8G and wrote a simple script to copy the dir to the mapr nfs directory, delete it, then repeat the process.  I also have a hbase there receiving modest amount of data insertion.
  The cldb died in 3 minutes. I tried to restart the warden, and wait for a moment,  and then try to connect back again, and the cldb is still dead. Looks at this time , I can't do anything to revive the cluster any more. I dumped the debug log into ubuntu@ec2-204-236-174-71.us-west-1.compute.amazonaws.com:/home/ubuntu/2011-07-15_16-28-04.tbz2  . Does anybody know what I need to do ?
  the cluster I have consists of 7 nodes, running on /dev/loop0 device of 80G each. I installed cldb on 1 node and the cluster works fine. I then tried to see if cldb HA works, so I installed cldb on 2 other nodes and then found that the cldb on the 2 additional node doesn't run because of M3 license restriction. so I removed cldb on the 2 additional node and then restarted warden. The mapr admin console shows those 2 nodes status to be yellow, it seems that even after I removed cldb on those 2 nodes and restarted warden, the 2 nodes still tries to restart cldb there.  I ignored that warning, and continued the additional test.

Outcomes