How to handle core files found in MapR

Document created by jbubier Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Jonathan Bubier

 

Original Publication Date: March 27, 2015

 

When a core file is found under /opt/cores on a node MapR raises an alarm to alert an administrator that a service has possibly exited unexpectedly.  This alarm is logged as NODE_ALARM_CORE_PRESENT.  All core files found generated by the MapR software should be reported to MapR Support to be investigated further. Depending on the type of core file found the information to gather from the core will differ. Use the following steps to determine the type of core and gather the corresponding diagnostics to provide to MapR Support for further investigation.

 

Identify core file

MapR raises the 'Cores Present' alarm whenever a file is found under /opt/cores/ on any cluster node.  This is the location where cores generated by the system are typically placed and is the default location used by MFS, MapR's NFS server, and the Java processes (CLDB, JT, TT, map-reduce task attempts, etc.) running on MapR. It is important to first identify what process generated the file under /opt/cores/ to know what binary is needed to debug it further. Core files generated by MFS are prefixed with 'mfs' and core files generated by MapR's NFS server are prefixed with 'nfsserver'.  Ex:

$ ls -l /opt/cores/ -rw-r--r-- 1 root root 4126609408 May  2  2014 mfs.core.6246.host1.domain.prv 
-rw-r--r--  1 root  root  6062415872 May 16  2013 nfsserver.core.22447.host1.domain.prv

Core files generated by a Java process will be prefixed with 'java' and will typically be accompanied by an error log file which provides more diagnostic information about the core file and the JVM before it exited. 

 

If the core file found under /opt/cores/ is not prefixed by either 'mfs','nfs', or 'java' use the file command to identify what binary created the core file.  Ex:

$ file /opt/cores/mfs.core.6246.host1.domain.prv 
/opt/cores/mfs.core.6246.host1.domain.prv: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/opt/mapr/server/mfs -b -p 5660 -m 45158 -g -O /opt/mapr/conf/mapr-clusters.conf'

If the file under /opt/cores/ is either not a legitimate core file or is a core file generated by a non-MapR related process move the file outside of /opt/cores/ to clear up the alarm in MapR.  If the file is found to be a core generated by a MapR process the next step is to gather diagnostic information about the core to debug it further.

 

Capture core diagnostics

For core files generated by MFS or MapR's NFS server use gdb to capture the backtrace in the core.  This is best done on the node where the core was generated to ensure system libraries are consistent when debugging the core.  To use gdb, specify the path to the binary that generated the core and the core file.  Ex:

$ gdb /opt/mapr/server/mfs /opt/cores/mfs.core.6246.host1.domain.prv 
$ gdb /opt/mapr/server/nfsserver /opt/cores/nfsserver.core.22447.host1.domain.prv

If the core file was generated by a Java process use the java binary that generated the core to debug it with gdb.  As the node may have multiple versions of Java installed the java binary that generated the core can be determined using the file command as mentioned above.

 

Once the core file is opened in gdb enable output logging to an external file and capture the backtrace using the following commands:

(gdb) set logging file /tmp/gdb.out 
(gdb) set logging on Copying output to /tmp/gdb.out.
(gdb) thread apply all bt

Note in the above that /tmp/gdb.out is just an example of the output file path.  The logging file can be specified using a different file and file path based if desired. Use the 'quit' command to exit from gdb and verify that the output file (/tmp/gdb.out) contains the backtrace from the core file.  Once the output is captured, provide it along with the core file and a support-dump from the node to MapR support for further analysis.

 

Summary of diagnostics to collect

For a quick summary, the following is the information needed when debugging a particular type of core file.

MFS / NFS:

  • Core file backtrace from gdb - /tmp/gdb.out above
  • Core file generated by MFS / NFS
  • MFS or NFS server binary - /opt/mapr/server/mfs or /opt/mapr/server/nfsserver respectively
  • Support-dump generated by /opt/mapr/support/tools/mapr-support-dump.sh

Java:

  • Core file backtrace from gdb
  • Core file generated by Java process
  • Java binary
  • Java error log associated with core file - Typically mapreduce error log or Hotspot error log
  • Support-dump generated by /opt/mapr/support/tools/mapr-support-dump.sh

 

Collect the necessary diagnostics for each core file and contact MapR Support at support@mapr.com or using the Support Portal to diagnose each core further. 

 

 

 

Attachments

    Outcomes