AnsweredAssumed Answered

Getting HeapDumpOnOutOfMemoryError

Question asked by rpark31 on Oct 25, 2011
Latest reply on Oct 25, 2011 by rpark31
I have a 3 node M3 cluster. Each node has 4 GB. I've been able to successfully get the cluster started. Each node was using about 65-70% of its available memory but the cluster was still starting up.

When I start up mapr-warden now, however, I get these entries in my process list:

`16178 pts/0    Sl     0:00 java -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/core
16200 pts/0    S      0:00 bash /opt/mapr/server/createsystemvolumes.sh
16330 ?        S<Lsl   0:01 /opt/mapr/server/mfs -b -f /ramfs/mapr/cachefile -O /opt/mapr/conf/mapr-clusters.conf -p 5660 -n inode:6
16503 ?        S<s    0:00 /opt/mapr/server/hoststats 5660 /opt/mapr/logs/TaskTracker.stats
19419 pts/0    S<l    0:02 java -server -Xmx479m -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapD
`

Any idea what this means? I checked the logs but couldn't find any entry that could help diagnose the problem. I get a similar Out of Memory error on each cluster node.

I increased the amount of memory to each node: either 6 or 8 GB. But I still get the error.

Do I have to tweak a heapsize variable somewhere in the conf directory to allocate more memory?

(After edit)

OK, when I type ps -aef | grep mapr, this is what I get:

`root       918     1  0 Oct25 ?        00:00:03 java -XX:ErrorFile=/opt/cores/hs_err_pid%p.log -XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/cores -XX:+UseConcMarkSweepGC -Dlog.file=/opt/mapr/logs/warden.log -Djava.library.path=/opt/mapr/lib -classpath /opt/mapr:/opt/mapr/conf:/opt/mapr/lib/JPam-1.1.jar:/opt/mapr/lib/adminuiapp-0.1.jar:/opt/mapr/lib/ant-1.7.1.jar:/opt/mapr/lib/baseutils-0.1.jar:/opt/mapr/lib/cldb-0.1.jar:/opt/mapr/lib/cliframework-0.1.jar:/opt/mapr/lib/commons-codec-1.4.jar:/opt/mapr/lib/commons-el-1.0.jar:/opt/mapr/lib/commons-email-1.2.jar:/opt/mapr/lib/commons-logging-1.0.4.jar:/opt/mapr/lib/commons-logging-api-1.0.4.jar:/opt/mapr/lib/eval-0.5.jar:/opt/mapr/lib/globalfsck-0.1.jar:/opt/mapr/lib/google-collect-1.0.jar:/opt/mapr/lib/hadoop-metrics-0.20.2-dev.jar:/opt/mapr/lib/jasper-compiler-5.5.12.jar:/opt/mapr/lib/jasper-runtime-5.5.12.jar:/opt/mapr/lib/jetty-6.1.14.jar:/opt/mapr/lib/jetty-plus-6.1.14.jar:/opt/mapr/lib/jetty-util-6.1.14.jar:/opt/mapr/lib/json-20080701.jar:/opt/mapr/lib/jsp-2.1.jar:/opt/mapr/lib/jsp-api-2.1.jar:/opt/mapr/lib/junit-3.8.1.jar:/opt/mapr/lib/junit-4.5.jar:/opt/mapr/lib/kvstore-0.1.jar:/opt/mapr/lib/libprotodefs.jar:/opt/mapr/lib/log4j-1.2.14.jar:/opt/mapr/lib/log4j-1.2.15.jar:/opt/mapr/lib/logging-0.1.jar:/opt/mapr/lib/mail.jar:/opt/mapr/lib/maprbuildversion.jar:/opt/mapr/lib/maprcli-0.1.jar:/opt/mapr/lib/maprsecurity-0.1.jar:/opt/mapr/lib/maprutil-0.1.jar:/opt mapr/lib/protobuf-java-2.3.0-lite.jar:/opt/mapr/lib/servlet-api-2.5-6.1.14.jar:/opt/mapr/lib/volumemirror-0.1.jar:/opt/mapr/lib/warden-0.1.jar:/opt/mapr/lib/zookeeper-3.3.2.jar -Dcom.sun.management.jmxremote -Dpid=873 -Dpname=warden -Dmapr.home.dir=/opt/mapr com.mapr.warden.WardenMain /opt/mapr/conf/warden.conf
root       996     1  0 Oct25 ?        00:00:03 java -Dzookeeper.log.dir=/opt/mapr/zookeeper/zookeeper-3.3.2/logs -Dzookeeper.root.logger=WARN, ROLLINGFILE -XX:ErrorFile=/opt/mapr/zookeeper/zookeeper-3.3.2/logs/hs_err_pid%p.log -cp /opt/mapr/zookeeper/zookeeper-3.3.2/bin/../build/classes:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../build/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../zookeeper-3.3.2.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../lib/log4j-1.2.15.jar:/opt mapr/zookeeper/zookeeper-3.3.2/bin/../lib/jline-0.9.94.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/bin/../src/java/lib/*.jar:/opt/mapr/zookeeper/zookeeper-3.3.2/conf: -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/mapr/zookeeper/zookeeper-3.3.2/conf/zoo.cfg
root      1226     1  0 Oct25 ?        00:00:01 /opt/mapr/server/mfs -b -f /ramfs/mapr/cachefile -O /opt/mapr/conf/mapr-clusters.conf -p 5660 -n inode:6:log:6:meta:10:dir:40:small:15 -m 1197
root      1406     1  0 Oct25 ?        00:00:01 /opt/mapr/server/hoststats 5660 /opt/mapr/logs/TaskTracker.stats`

This looks a lot better.

But now when I type this command:

`/opt/mapr/bin/maprcli acl edit -type cluster -user <user>:fc`

I get the dreaded "ERROR (10009) - Couldn't connect to the CLDB service".

I'm not sure why I'm getting this: I looked through the other messages mentioning that error and I tried running zkdatacleaner.sh, restarting mapr-warden and mapr-zookeeper, and running both configure.sh and disksetup again.

Outcomes