AnsweredAssumed Answered

Best practices way to investigate mapr resource utilization?

Question asked by reedv on Jan 23, 2018
Latest reply on Feb 15, 2018 by maprcommunity

Currently have a situation where I'm seeing high mem., CPU, and disk utilization across all nodes in the MCS and would like to better investigate what the problem is. What are some standard things to look at to try to arrive at some actionable response?

This is what I am seeing in the MCS:

 

Looking at mem. usage (node001):

Running a "$top" command on node001, I see that the main memory user on the host node is the mapr FS service (using 10% of RAM):

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND  

66405 mapr 20 0 9430872 7.147g 27408 S 3.7 10.0 1686:06 mfs
160245 mapr 20 0 7185556 2.877g 47348 S 0.3 4.0 161:32.77 java
3821 mapr 20 0 3830500 1.227g 29268 S 2.7 1.7 1893:09 collectd
39444 mapr 10 -10 911332 826164 35552 S 0.0 1.1 33:45.93 nfsserver
161817 mapr 20 0 3041392 722632 43600 S 0.3 1.0 139:33.07 java
73244 mapr 10 -10 5762212 661584 32912 S 3.7 0.9 1281:29 java
6444 mapr 20 0 3061988 441112 41644 S 0.3 0.6 149:12.51 java
137457 mapr 20 0 1990884 429708 48760 S 0.0 0.6 78:44.17 gnome-she+
54184 mapr 20 0 3092952 394424 93564 S 3.7 0.5 581:49.39 Web Conte+
153187 mapr 20 0 2250048 352268 48780 S 0.0 0.5 296:30.21 gnome-she+

and a the host is locally using a total of 30% of available mem. Combining this with the mem. usage listed in the service tab of node001 of the MCS, I see there is ~20% of available mem. being used (which I assume comes from the host node percentage).

This would mean that 50% of the available mem. is being used, not 80%. So I don't get what is happening here.

 

Looking at disk usage (node001):

Looking at the disk usage of node001 with the command "$ncdu /", I see (note that the mapr cluster is mounted to the host nodes via NFS):

16.8 GiB [##########] /mapr
12.6 GiB [####### ]      /opt
8.3 GiB   [#### ]            /home
3.6 GiB   [## ]                /usr

... <negligible>

There is 108GB total allocated to the root partition of the host node, yet the sum of the used space is ~40GB, so ~40% of disk space seems to be being used, not 90%. Even if I think to myself, "Maybe the space on the local FS being used by /mapr via NFS is extra on top of the maprFS being hosted by the node, so I'm storing the information twice and need to add that on as well to account correctly" that still only brings total disk usage up to ~60GB, 60%, which still not not come close to the disk usage reported by the MCS.

 

So these are the steps that I did to attempt to investigate these resource usages, yet I could not get a good answer. Does anyone have any other advice for what I can do to figure out how to track down these usages and fix it?

Outcomes