AnsweredAssumed Answered

Hanging Reducers in 2.1.2 rpc err

Question asked by jerdavis on Mar 22, 2013
Latest reply on Mar 31, 2013 by nabeel
Since upgrading to 2.1.2, I've been getting reducer tasks that periodically hang for no apparent reason.
 Has anyone else experienced this, and are there any debugging steps I can take?

Here is something suspicious from the Stderr: (Lots of these over about 10 mills)
<pre>
2013-03-22 00:05:04,9544 ERROR Client fs/client/fileclient/cc/client.cc:3439 Thread: 139852444841728 rpc err Connection timed out(110) 28.21 to 10.100.0.114:5660, fid 3512.37.1053351480, upd 0,
failed err 249889867872
2013-03-22 00:05:05,9554 ERROR Client fs/client/fileclient/cc/client.cc:3439 Thread: 139852439578368 rpc err Connection timed out(110) 28.21 to 10.100.0.114:5660, fid 3512.37.1053351480, upd 0,
failed err -25200
</pre>
Can anyone tell me what those error codes are?

Here is an example syslog where a reducer was stuck for 7 hours. After failing the task, another reducer completed it in < 5 minutes.
<pre>
2013-03-22 00:03:22,920 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-03-22 00:03:22,926 INFO org.apache.hadoop.security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
2013-03-22 00:03:23,063 INFO org.apache.hadoop.mapred.Child: JVM: jvm_201303191245_1954_r_1749316548 pid: 21708
2013-03-22 00:03:23,363 INFO org.apache.hadoop.mapred.Child: Starting task attempt_201303191245_1954_r_000525_0
2013-03-22 00:03:23,364 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2013-03-22 00:03:23,447 INFO org.apache.hadoop.mapreduce.util.ProcessTree: setsid exited with exit code 0
2013-03-22 00:03:23,464 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: /proc/<pid>/status does not have information about swap space used(VmSwap). Can not track swap usage of a task.
2013-03-22 00:03:23,465 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@24c0f1ec
2013-03-22 00:03:23,629 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: Loaded native gpl library
2013-03-22 00:03:23,633 INFO com.hadoop.compression.lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 9a6528b91e9a2da20d30fb78941fbbbf54d4278b]
2013-03-22 00:03:23,664 INFO org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor

2013-03-22 **00:04:58**,423 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19555 may have finished in the interim.
...

2013-03-22 **07:19:37**,906 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19687 may have finished in the interim.
</pre>

Outcomes