MCS shows yarn services down while they are running

Document created by najmuddin_chirammal Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Najmuddin Chirammal

 

Original Publication Date: June 26, 2015

 

Issue

MapR UI (MCS) shows Nodemanager/ResourceManager/HistoryServer down even though the services are running on respective nodes.

Environment

  • MapR Cluster 4.0.1,4.0.2 and 4.1.0
  • Using Yarn Services. (NodeManager,ResourceManager,HistoryServer)

Resolution

  • Copy the current NodeManager, HistorySserver & ResourceManager pid files to /opt/mapr/pid directory.
# cp -p /tmp/yarn-mapr-nodemanager.pid /tmp/yarn-mapr-resourcemanager.pid /tmp/mapred-mapr-historyserver.pid /opt/mapr/pid/

If -p option is not used, make sure the destination file has same permission/ownership as the source (Should be owned by MAPR_USER)

  • Update PID file location

Add/Modify the following environment variables in /opt/mapr/conf/env.sh.

export YARN_PID_DIR="${MAPR_HOME}/pid" 
export HADOOP_MAPRED_PID_DIR="${MAPR_HOME}/pid"
  • Run following commands as MAPR_USER (mapr by default) on respective nodes to make sure the status reported correctly.
su - mapr -c '/opt/mapr/hadoop/hadoop-2.4.1/sbin/mr-jobhistory-daemon.sh status historyserver' 
su - mapr -c '/opt/mapr/hadoop/hadoop-2.4.1/sbin/yarn-daemon.sh status nodemanager'
su - mapr -c '/opt/mapr/hadoop/hadoop-2.4.1/sbin/yarn-daemon.sh status resourcemanager'

Root Cause

Status of Yarn services are determined by checking PID status read from the pid file created by respective services. If the PID files get removed, it'd generate a false alarm and the services would be listed as down. Since yarn services stores PID files under '/tmp' by default, mostly the issue is triggered by 'tmpwatch' (which cleans /tmp based on the options passed to it, many Linux distributions have 'tmpwatch' cron job enabled by default). This issue can be resolved by moving the PID files to /opt/mapr/pid directory.

Attachments

    Outcomes