AnsweredAssumed Answered

Tasktracker getting hung up on missing attempt

Question asked by stormcrow on Sep 17, 2013
Latest reply on Feb 28, 2014 by Ted Dunning
Our task trackers are sometimes losing tasks and getting fixated on them. It seems like it's going to process an attempt, then complains endlessly about the attempt being unknown. This rapidly fills up log files and leads to real problems. It's not happening on one node: It's happening on all nodes, seemingly at random. Is this a known issue of some kind? We're running v. 2.1.2.18401.GA

    2013-09-08 03:06:28,481 INFO org.apache.hadoop.mapred.TaskTracker: Setting pid 24517 for jvm jvm_201309030534_30083_r_-1520388758
    2013-09-08 03:06:28,483 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201309030534_30083_r_-1520388758 given task: attempt_201309030534_30083_r_000078_0


    2013-09-08 03:06:35,096 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201309030534_30083_r_000078_0 0.11692308% reduce > copy (114 of 325 at 0.08 MB/s) >

Now the following repeats forever:

    2013-09-08 03:06:37,358 INFO org.apache.hadoop.ipc.Server: IPC Server handler 23 on 36490, call getMapCompletionEvents(job_201309030534_30083, 325, 10000, attempt_201309030534_30083_r_000078_0, org.apache.hadoop.mapred.JvmContext@77d86eb2) from 127.0.0.1:41135: error: java.io.IOException: Unknown task; attempt_201309030534_30083_r_000078_0. Ignoring getMapCompletionEvents Request
    java.io.IOException: Unknown task; attempt_201309030534_30083_r_000078_0. Ignoring getMapCompletionEvents Request
            at org.apache.hadoop.mapred.TaskTracker.getMapCompletionEvents(TaskTracker.java:4982)
            at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            at java.lang.reflect.Method.invoke(Method.java:597)
            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:993)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1326)
            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1322)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1320)
    2013-09-08 03:06:37,362 WARN org.apache.hadoop.mapred.TaskTracker: Unknown child task fatalError: attempt_201309030534_30083_r_000078_0. Ignored.

Outcomes