MapReduce tasks fail due to missing directory - "failed to initialize user directory"

Document created by wade on Feb 27, 2016
Version 1Show Document
  • View in full screen mode

Original Publication Date: July 22, 2014

 

Version tested:

# cat /opt/mapr/MapRBuildVersion

3.0.3.25439.GA

 

This issue affects all available MapR releases.

 

14/07/22 18:13:22 INFO mapred.JobClient: Task Id : attempt_201407081757_0009_r_000000_2, Status : FAILED on node rh-2

Error initializing attempt_201407081757_0009_r_000000_2 java.io.IOException: Job initialization failed (20). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg

number of groups = 1

main : command provided 0

main : user is root

Failed to create directory /tmp/mapr-hadoop/mapred/local/taskTracker/root - No such file or directory
failed to initialize user directory

 

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)

            at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1632)

            at java.security.AccessController.doPrivileged(Native Method)

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

            at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1608)

            at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1493)

            at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3882)

Caused by: org.apache.hadoop.util.Shell$ExitCodeException:

            at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)

            at org.apache.hadoop.util.Shell.run(Shell.java:249)

            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

            ... 7 more

 

14/07/22 18:13:22 WARN mapred.JobClient: Error reading task outputhttp://rh-2:50060/tasklog?plaintext=true&attemptid=attempt_201407081757_0009_r_000000_2&filter=stdout

14/07/22 18:13:22 WARN mapred.JobClient: Error reading task outputhttp://rh-2:50060/tasklog?plaintext=true&attemptid=attempt_201407081757_0009_r_000000_2&filter=stderr

14/07/22 18:13:23 INFO mapred.JobClient: Job job_201407081757_0009 failed with state FAILED due to: NA

14/07/22 18:13:23 INFO mapred.JobClient: Counters: 6

14/07/22 18:13:23 INFO mapred.JobClient:   Job Counters

14/07/22 18:13:23 INFO mapred.JobClient:     Aggregate execution time of mappers(ms)=3817

14/07/22 18:13:23 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0

14/07/22 18:13:23 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0

14/07/22 18:13:23 INFO mapred.JobClient:     Launched map tasks=7

14/07/22 18:13:23 INFO mapred.JobClient:     Failed map tasks=1

14/07/22 18:13:23 INFO mapred.JobClient:     Aggregate execution time of reducers(ms)=0

 

 

After TaskTracker starts it creates the following directory structure

 

 

# for i in `cat /root/nodelist`; do echo $i ; ssh $i ls -ls /tmp/mapr-hadoop/mapred/local;  done

rh-1

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 taskTracker   <<<<

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-2

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 taskTracker  <<<<

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-3

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 taskTracker  <<<<

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:00 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-4

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 taskTracker  <<<<

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-5

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 taskTracker  <<<<

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

 

 

 

If the “taskTracker” directory gets deleted on any of the nodes for any reason you will see the tasks fail on those nodes:

 

(Delete TaskTracker Directory)

 

# for i in `cat /root/nodelist`; do echo $i ; ssh $i rm -fr /tmp/mapr-hadoop/mapred/local/taskTracker/;  done

rh-1

rh-2

rh-3

rh-4

rh-5

 

# for i in `cat /root/nodelist`; do echo $i ; ssh $i ls -ls /tmp/mapr-hadoop/mapred/local;  done

rh-1

total 12

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-2

total 12

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 3 mapr mapr 4096 Jul 22 19:04 ttprivate

rh-3

total 16

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:04 jobTracker

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:00 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 3 mapr mapr 4096 Jul 22 19:04 ttprivate

rh-4

total 12

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 2 mapr mapr 4096 Jul 22 19:01 ttprivate

rh-5

total 12

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 toBeDeleted

4 drwxr-xr-x. 2 mapr mapr 4096 Jul 22 19:01 tt_log_tmp

4 drwx------. 3 mapr mapr 4096 Jul 22 19:04 ttprivate

 

 

 

# hadoop jar hadoop-0.20.2-dev-examples.jar teragen 10000 /myvolume/teraout

14/07/22 19:05:51 INFO fs.JobTrackerWatcher: Current running JobTracker is: rh-3/10.1.0.72:9001

14/07/22 19:05:51 INFO terasort.TeraSort: Generating 10000 using 2

14/07/22 19:05:51 INFO mapred.JobClient: Creating job's output directory at /myvolume/teraout

14/07/22 19:05:51 INFO mapred.JobClient: Creating job's user history location directory at /myvolume/teraout/_logs

14/07/22 19:05:51 INFO mapred.JobClient: Running job: job_201407221900_0002

14/07/22 19:05:52 INFO mapred.JobClient:  map 0% reduce 0%

14/07/22 19:05:52 INFO mapred.JobClient: Task Id : attempt_201407221900_0002_m_000003_0, Status : FAILED on node rh-4

Error initializing attempt_201407221900_0002_m_000003_0 java.io.IOException: Job initialization failed (20). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg

number of groups = 1

main : command provided 0

main : user is root

 

Failed to create directory /tmp/mapr-hadoop/mapred/local/taskTracker/root - No such file or directory

failed to initialize user directory

 

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)

            at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1632)

            at java.security.AccessController.doPrivileged(Native Method)

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

            at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1608)

            at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1493)

            at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3882)

Caused by: org.apache.hadoop.util.Shell$ExitCodeException:

            at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)

            at org.apache.hadoop.util.Shell.run(Shell.java:249)

            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

            ... 7 more

 

14/07/22 19:05:52 WARN mapred.JobClient: Error reading task outputhttp://rh-4:50060/tasklog?plaintext=true&attemptid=attempt_201407221900_0002_m_000003_0&filter=stdout

14/07/22 19:05:52 WARN mapred.JobClient: Error reading task outputhttp://rh-4:50060/tasklog?plaintext=true&attemptid=attempt_201407221900_0002_m_000003_0&filter=stderr

14/07/22 19:05:53 INFO mapred.JobClient: Task Id : attempt_201407221900_0002_r_000001_0, Status : FAILED on node rh-4

Error initializing attempt_201407221900_0002_r_000001_0 java.io.IOException: Job initialization failed (20). with output: Reading task controller config from /opt/mapr/hadoop/hadoop-0.20.2/conf/taskcontroller.cfg

number of groups = 1

main : command provided 0

main : user is root

 

Failed to create directory /tmp/mapr-hadoop/mapred/local/taskTracker/root - No such file or directory
failed to initialize user directory

 

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:195)

            at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1632)

            at java.security.AccessController.doPrivileged(Native Method)

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)

            at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1608)

            at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1493)

            at org.apache.hadoop.mapred.TaskTracker$6.run(TaskTracker.java:3882)

Caused by: org.apache.hadoop.util.Shell$ExitCodeException:

            at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)

            at org.apache.hadoop.util.Shell.run(Shell.java:249)

            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)

            at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskController.java:188)

 

 

One common reason for tmp directory getting cleaned up:

 

Daily Cron clean up of tmp (anything older than 10 days)

 

# tail /etc/cron.daily/tmpwatch

flags=-umc

/usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \

            -x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \

            -X '/tmp/hsperfdata_*' 10d /tmp

/usr/sbin/tmpwatch "$flags" 30d /var/tmp

for d in /var/{cache/man,catman}/{cat?,X11R6/cat?,local/cat?}; do

    if [ -d "$d" ]; then

            /usr/sbin/tmpwatch "$flags" -f 30d "$d"

    fi

done

 

 

Solution:

1. If related to daily cron job tmpwatch cleanup:

 

Put an exemption for /tmp/mapr-hadoop/mapred/local/tasktracker/ in the tmpwatch daily cronjob and restart the TaskTracker service. 

 

Eg.:

 

/usr/sbin/tmpwatch "$flags" -x /tmp/.X11-unix -x /tmp/.XIM-unix \

            -x /tmp/.font-unix -x /tmp/.ICE-unix -x /tmp/.Test-unix \

            -X '/tmp/hsperfdata  -x  '/tmp/mapr-hadoop/*'_*' 10d /tmp

       

 

2. If directory was accidentally deleted:

  • Restart the TaskTrakcker service on the data node affected.  When the TaskTracker is restarted it will recreate the directory structure under /tmp/. 

 

Attachments

    Outcomes