AnsweredAssumed Answered

Task Tracker is going down abruptly

Question asked by communityadmin on Apr 15, 2012
Latest reply on Apr 23, 2012 by aaron
The cluster was running all fine, all of sudden all task trackers went down.

Task Tracker Log:
<pre>
2012-04-16 06:36:43,602 ERROR org.apache.hadoop.mapred.TaskTracker: Failed to send heartbeat to jobTracker. java.io.IOException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:95)
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:68)
        at java.util.Arrays.mergeSort(Arrays.java:1270)
        at java.util.Arrays.sort(Arrays.java:1210)
        at java.util.Collections.sort(Collections.java:159)
        at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:629)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3763)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312)
. Exiting...
2012-04-16 06:36:43,607 INFO org.apache.hadoop.util.AsyncDiskService: Shutting down all AsyncDiskService threads...
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.AsyncDiskService: All AsyncDiskService threads are terminated.
2012-04-16 06:36:43,608 INFO org.apache.hadoop.util.MRAsyncDiskService: Deleting toBeDeleted directory.
2012-04-16 06:36:43,609 INFO org.apache.hadoop.mapred.TaskTracker: Shutting down: Map-events fetcher for all reduce tasks on tracker_nia-dev15.eng.narus.com:localhost.localdomain/127.0.0.1:46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: Stopping server on 46400
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 46400: exiting
2012-04-16 06:36:43,712 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 46400: exiting
.....
.....
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 20 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 32 on 46400: exiting
2012-04-16 06:36:43,716 INFO org.apache.hadoop.ipc.Server: IPC Server handler 29 on 46400: exiting
2012-04-16 06:36:43,820 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at nia-dev15/172.31.3.158
************************************************************/
</pre>
Job Tracker Log:

<code>
2012-04-16 06:36:43,603 INFO org.apache.hadoop.mapred.JobTracker: Creating a recovery entry for tasktracker: nia-dev15.eng.narus.com
2012-04-16 06:36:43,610 INFO org.apache.hadoop.mapred.JobTracker: Adding tracker tracker_nia-dev15.eng.narus.com:localhost.localdomain/127.0.0.1:46400 to host nia-dev15.eng.narus.com
2012-04-16 06:36:43,611 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9001, call heartbeat(org.apache.hadoop.mapred.TaskTrackerStatus@79f1d448, true, true, true, -1) from 172.31.3.158:41371: error: java.io.IOException: java.lang.NullPointerException
java.io.IOException: java.lang.NullPointerException
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:95)
        at org.apache.hadoop.mapred.SchedulingAlgorithms$FairShareComparator.compare(SchedulingAlgorithms.java:68)
        at java.util.Arrays.mergeSort(Arrays.java:1270)
        at java.util.Arrays.sort(Arrays.java:1210)
        at java.util.Collections.sort(Collections.java:159)
        at org.apache.hadoop.mapred.FairScheduler.assignTasks(FairScheduler.java:629)
        at org.apache.hadoop.mapred.JobTracker.heartbeat(JobTracker.java:3763)
        at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:964)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1318)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1314)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1109)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1312)

Outcomes