AnsweredAssumed Answered

hadoop pid directory - too many files?

Question asked by jsumali on May 27, 2014
Latest reply on Dec 10, 2014 by nabeel
We are running 3.1.0.23703.GA

The directory /opt/mapr/hadoop/hadoop-0.20.2/pids/* gets a pid file for every jvm launched, but the files are not cleaned up after the jvms are stopped (e.g. map reduce tasks).

When we restart or stop warden, it tries to stop the tasktracker with this code in /opt/mapr/hadoop/hadoop-0.20.2/bin/hadoop-daemon.sh:

      (stop)
        if [ "$command" = "tasktracker" ]; then
          # kill all tasks
          TASKS=`find $HADOOP_PID_DIR -name "jvm*.pid" -exec cat {} \;`
          for task in $TASKS ; do
            echo "Killing process $task "
            kill -9 $task
          done
          rm -f $HADOOP_PID_DIR/jvm*.pid > /dev/null 2>&1;
          rm -f $HADOOP_PID_DIR/.jvm*.pid.crc > /dev/null 2>&1;

Currently, our pid directory has around 545k files (including .crc files), so the TASKS string is too long, and the bash expansion on the `rm` statements doesn't work.

Is there a better way to do this without having to keep that many pid files around?

Outcomes