AnsweredAssumed Answered

Terasort failing / hanging

Question asked by shaka on Jan 23, 2014
Latest reply on Feb 10, 2014 by gera
Hi All,
we are running terasort, but it keeps hanging at 33.33% in the reduce phase.
we are only running it on 500GB, we have a 12 node cluster with 200+TB and 100+GB of memory.

error below:

    2014-01-23 08:02:29,395 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
    2014-01-23 08:02:29,396 INFO org.apache.hadoop.security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
    2014-01-23 08:02:29,468 INFO org.apache.hadoop.mapred.Child: JVM: jvm_201311182247_12868_r_-536966358 pid: 762
    2014-01-23 08:02:29,601 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: /opt/mapr/tmp/mapred/local/taskTracker/distcache/-5200828558314565779_-1597801028_1070096176/maprfs/app/dev/SmartAnalytics/terasort/test_tb2_output/_partition.lst <- /opt/mapr/tmp/mapred/local/taskTracker/hddvsman/jobcache/job_201311182247_12868/attempt_201311182247_12868_r_000000_0/work/_partition.lst
    2014-01-23 08:02:29,604 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /opt/mapr/tmp/mapred/local/taskTracker/hddvsman/jobcache/job_201311182247_12868/jars/.job.jar.crc <- /opt/mapr/tmp/mapred/local/taskTracker/hddvsman/jobcache/job_201311182247_12868/attempt_201311182247_12868_r_000000_0/work/.job.jar.crc
    2014-01-23 08:02:29,604 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating symlink: /opt/mapr/tmp/mapred/local/taskTracker/hddvsman/jobcache/job_201311182247_12868/jars/job.jar <- /opt/mapr/tmp/mapred/local/taskTracker/hddvsman/jobcache/job_201311182247_12868/attempt_201311182247_12868_r_000000_0/work/job.jar
    2014-01-23 08:02:29,623 INFO org.apache.hadoop.mapred.Child: Starting task attempt_201311182247_12868_r_000000_0
    2014-01-23 08:02:29,623 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=SHUFFLE, sessionId=
    2014-01-23 08:02:29,694 INFO org.apache.hadoop.mapreduce.util.ProcessTree: setsid exited with exit code 0
    2014-01-23 08:02:29,701 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: /proc/<pid>/status does not have information about swap space used(VmSwap). Can not track swap usage of a task.
    2014-01-23 08:02:29,701 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.mapreduce.util.LinuxResourceCalculatorPlugin@20b40ec4
    2014-01-23 08:02:41,938 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 1389 may have finished in the interim.
    2014-01-23 08:04:37,777 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 6113 may have finished in the interim.
    2014-01-23 08:04:46,980 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 5599 may have finished in the interim.
    2014-01-23 08:04:53,060 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 7211 may have finished in the interim.
    2014-01-23 08:05:20,457 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 9890 may have finished in the interim.
    2014-01-23 08:06:18,416 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 17399 may have finished in the interim.
    2014-01-23 08:06:48,969 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 21351 may have finished in the interim.
    2014-01-23 08:07:10,301 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 23783 may have finished in the interim.
    2014-01-23 08:09:20,016 INFO org.apache.hadoop.mapred.Merger: Merging 256 sorted segments
    2014-01-23 08:09:31,432 INFO org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 256 segments left of total size: 71468244424 bytes
    2014-01-23 08:11:25,957 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 29319 may have finished in the interim.
    2014-01-23 08:11:41,198 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 22341 may have finished in the interim.
    2014-01-23 08:12:27,237 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 31801 may have finished in the interim.
    2014-01-23 08:12:51,715 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4241 may have finished in the interim.
    2014-01-23 08:12:51,715 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4242 may have finished in the interim.
    2014-01-23 08:12:51,715 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4245 may have finished in the interim.
    2014-01-23 08:12:51,715 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4246 may have finished in the interim.
    2014-01-23 08:14:48,086 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 27951 may have finished in the interim.
    2014-01-23 08:19:06,767 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 23147 may have finished in the interim.
    2014-01-23 08:19:06,768 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 23148 may have finished in the interim.
    2014-01-23 08:19:06,768 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 23149 may have finished in the interim.
    2014-01-23 08:20:07,678 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 24701 may have finished in the interim.
    2014-01-23 08:22:24,924 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 27492 may have finished in the interim.
    2014-01-23 08:22:46,372 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 29280 may have finished in the interim.
    2014-01-23 08:24:03,020 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 2357 may have finished in the interim.
    2014-01-23 08:24:03,021 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 2358 may have finished in the interim.
    2014-01-23 08:24:03,021 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 2359 may have finished in the interim.
    2014-01-23 08:24:06,079 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 3017 may have finished in the interim.
    2014-01-23 08:24:06,079 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 3018 may have finished in the interim.
    2014-01-23 08:24:06,079 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 3019 may have finished in the interim.
    2014-01-23 08:24:12,182 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 3900 may have finished in the interim.
    2014-01-23 08:25:13,193 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 8185 may have finished in the interim.
    2014-01-23 08:25:22,363 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 7538 may have finished in the interim.
    2014-01-23 08:26:47,765 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 10850 may have finished in the interim.
    2014-01-23 08:26:47,766 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 11386 may have finished in the interim.
    2014-01-23 08:27:00,104 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 12485 may have finished in the interim.
    2014-01-23 08:27:12,313 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 12767 may have finished in the interim.
    2014-01-23 08:27:12,313 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 13188 may have finished in the interim.
    2014-01-23 08:27:36,949 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 14964 may have finished in the interim.
    2014-01-23 08:28:10,857 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 18560 may have finished in the interim.
    2014-01-23 08:29:18,481 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4000 may have finished in the interim.
    2014-01-23 08:29:18,481 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4001 may have finished in the interim.
    2014-01-23 08:30:28,581 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 25081 may have finished in the interim.
    2014-01-23 08:38:59,312 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 29233 may have finished in the interim.
    2014-01-23 08:38:59,312 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 29234 may have finished in the interim.
    2014-01-23 08:42:20,209 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 981 may have finished in the interim.
    2014-01-23 08:42:20,229 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 31498 may have finished in the interim.
    2014-01-23 08:42:32,593 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 2906 may have finished in the interim.
    2014-01-23 08:42:32,593 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 2907 may have finished in the interim.
    2014-01-23 08:42:50,917 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 5718 may have finished in the interim.
    2014-01-23 08:42:50,918 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 5723 may have finished in the interim.
    2014-01-23 08:42:50,918 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 5734 may have finished in the interim.
    2014-01-23 08:42:50,918 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 5735 may have finished in the interim.
    2014-01-23 08:43:00,082 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 4184 may have finished in the interim.
    2014-01-23 08:43:06,174 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 7873 may have finished in the interim.
    2014-01-23 08:44:00,927 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 12123 may have finished in the interim.
    2014-01-23 08:44:00,927 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 12129 may have finished in the interim.
    2014-01-23 08:44:00,928 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 12131 may have finished in the interim.
    2014-01-23 08:48:40,372 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 17803 may have finished in the interim.
    2014-01-23 08:49:01,667 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 18266 may have finished in the interim.
    2014-01-23 08:49:01,667 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 18267 may have finished in the interim.
    2014-01-23 08:49:07,783 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19455 may have finished in the interim.
    2014-01-23 08:49:07,784 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19456 may have finished in the interim.
    2014-01-23 08:49:07,784 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19457 may have finished in the interim.
    2014-01-23 08:49:10,833 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19926 may have finished in the interim.
    2014-01-23 08:49:10,833 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19927 may have finished in the interim.
    2014-01-23 08:49:10,833 WARN org.apache.hadoop.mapreduce.util.ProcfsBasedProcessTree: The process 19928 may have finished in the interim.
    2014-01-23 08:49:21,824 INFO org.apache.hadoop.mapred.Merger: Merging 1745 sorted segments
    2014-01-23 08:49:25,471 INFO org.apache.hadoop.mapred.Merger: Merging 215 intermediate segments out of a total of 1745

Outcomes