AnsweredAssumed Answered

Which process/service is consuming most of the memory on MapR node and why?

Question asked by AmarnathVibhute on Aug 14, 2016
Latest reply on Aug 16, 2016 by AmarnathVibhute

Hello all,

I have 3 node cluster with MapR M3 V5 (Hadoop 2.7.0-mapr-1506) installed on it.  I am facing memory consumption issue on the first node after starting MapR services.

Below is some information which will give more insights to the problem :

# clush -a free -h

Node 1:                   total       used       free     shared    buffers     cached

Node 1: Mem:          252G       234G        17G        50M       445M       962M

Node 1: -/+ buffers/cache:       232G        19G

Node 1: Swap:          23G         0B        23G

Node 2:                   total       used       free     shared    buffers     cached

Node 2: Mem:          252G        16G      235G        63M       472M       1.4G

Node 2: -/+ buffers/cache:        15G       237G

Node 2: Swap:          23G         0B        23G

Node 3:                   total       used       free     shared    buffers     cached

Node 3: Mem:          252G        19G      232G        80M       440M       3.1G

Node 3: -/+ buffers/cache:        16G       236G

Node 3: Swap:          23G         0B        23

 

To check which processes are utilising more memory, I used 'top' command on node 1 and sorted on memoty :

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

9261 mapr      10 -10 10.6g  10g  16m S  0.7  4.1  25:09.07 mfs

10397 mapr      20   0 11.8g 1.3g  26m S  0.7  0.5  15:23.15 java

8562 mapr      20   0 37.5g 755m  21m S  0.0  0.3   4:00.35 java

9009 mapr      20   0 6569m 670m  30m S  0.0  0.3   2:19.63 java

7659 mapr      10 -10 33.8g 668m  13m S  0.0  0.3   4:32.67 java

15718 mapr      10 -10  724m 598m 1688 S  0.0  0.2   0:15.66 nfsserver

 

So, this information is not giving me the root cause of 92% memory utilisation on node 1 (as 'mfs' service is just utilising around 12GB of memory).

 

I checked few more details :

# maprcli node list -columns csvc

hostname   configuredservice                                                                                                  

Host 1  hbmaster,hbregionserver,hbaserest,webserver,nodemanager,drill-bits,cldb,fileserver,nfs,resourcemanager,hoststats                      

Host 2  fileserver,hivemeta,hbregionserver,hbaserest,webserver,nodemanager,drill-bits,hs2,hcat,resourcemanager,hoststats,hue                   

Host 3  fileserver,oozie,hbasethrift,httpfs,historyserver,hbregionserver,hbaserest,sqoop2,nodemanager,drill-bits,spark-historyserver,hoststats 

 

# maprcli service list

name                 state  logpath                                    displayname

oozie                0      /opt/mapr/oozie/oozie-4.2.0/logs           Oozie

hue                  0      /opt/mapr/hue/hue-3.9.0/logs/              HueWebServer

resourcemanager      0      /opt/mapr/hadoop/hadoop-2.7.0/logs         ResourceManager

hivemeta             0      /opt/mapr/hive/hive-1.2/logs/mapr          HiveMetastore

hbregionserver       0      /opt/mapr/hbase/hbase-0.98.12/logs         HBase RegionServer

nfs                  0      /opt/mapr/logs/nfsserver.log               NFS Gateway

hbasethrift          0      /opt/mapr/hbase/hbase-0.98.12/logs         HBaseThriftServer

httpfs               0      /opt/mapr/httpfs/httpfs-1.0/logs           Httpfs

webserver            0      /opt/mapr/logs/adminuiapp.log              Webserver

cldb                 0      /opt/mapr/logs/cldb.log                    CLDB

nodemanager          0      /opt/mapr/hadoop/hadoop-2.7.0/logs         NodeManager

historyserver        0      /opt/mapr/hadoop/hadoop-2.7.0/logs         JobHistoryServer

hoststats            0      /opt/mapr/logs/hoststats.log               HostStats

spark-historyserver  0      /opt/mapr/spark/spark-1.5.2/logs/          SparkHistoryServer

sqoop2               0      /opt/mapr/sqoop/sqoop-2.0.0/server/logs/   Sqoop2

hbmaster             0      /opt/mapr/hbase/hbase-0.98.12/logs         HBase Master

fileserver           0      /opt/mapr/logs/mfs.log                     FileServer

drill-bits           0      /opt/mapr/drill/drill-1.4.0/logs/          Drillbit

hcat                 0      /opt/mapr/hive/hive-1.2/logs/mapr/webhcat  WebHcat

hs2                  0      /opt/mapr/hive/hive-1.2/logs/mapr          HiveServer2

hbaserest            0      /opt/mapr/hbase/hbase-0.98.12/logs         HBase Rest Server

 

This result was a bit surprising to me as all the services it is showing in state '0' which is not configured. But I do execute below command on all 3 nodes for configuring the cluster :

# ./configure.sh -C Node1:7222 -Z Node1:5181 -N ClusterName

 

Attached is /opt/mapr/logs/configure.log for reference.

 

Also, I checked '/etc/security/limits.conf' file and below is the information it has :

mapr - memlock unlimited

mapr - core unlimited

mapr - nofile 64000

mapr - nproc unlimited

mapr - nice -10

 

Some additional checks I did :

# lsof | wc -l

606566

# lsof | grep java | wc -l

601885

# ps auxf | grep java | wc -l

803

# ps auxf | grep mapr | wc -l

821

 

Few more steps I did to overcome memory issue:

1. Stopped warden & zookeeper services and checked memory status, it was still showing 227G consumed even after 2-3 hours.

2. When I killed all process by 'mapr' (around 821 processes) then immediately node was showing only 17GB memory is being utilised.

3. I started MapR on cluster again and for the first couple of hours memory consumption was within 20GB, but today again its showing 234GB memory utilised.

 

Attaching warden.log for more details.

 

Tried to collect support-dump information but because of memory fully utilised, getting below error:

# /opt/mapr/support/tools/mapr-support-dump.sh

2016-08-14 15:49:57.435 INFO Starting Support dump collection. For diagnostics, refer to support_dump.log inside the dump

2016-08-14 15:49:57.443 INFO Collecting system information

2016-08-14 15:50:11.174 INFO Skipping /etc/pam.conf since it does not exist

2016-08-14 15:50:12.944 INFO Collecting mapr logs

/opt/mapr/support/tools/utils.sh: fork: Cannot allocate memory

 

Please suggest how to identify which process/service is using most of the memory and how to release it?

 

Thanks,

Amarnath

Attachments

Outcomes