AnsweredAssumed Answered

Pig Map-Reduce job hung or stuck never ending

Question asked by morillo on Sep 27, 2012
Latest reply on Feb 18, 2013 by raj_hadoop
Hi Everybody,

I have a MapR M5 v2.0.1 1-node cluster running in Ubuntu Linux 12.04.

I am trying to test the Pig Map-Reduce job example documented at [http://www.mapr.com/doc/display/MapR/Getting+Started+with+Pig][1] and I have the following output

    mapr@westeros:~$ pig
    2012-09-27 18:14:18,429 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (rexported) compiled Jun 19 2012, 15:23:08
    2012-09-27 18:14:18,430 [main] INFO  org.apache.pig.Main - Logging error messages to: /home/mapr/pig_1348784058427.log
    2012-09-27 18:14:18,731 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: maprfs:///
    2012-09-27 18:14:19,048 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: maprfs:///
    grunt> A = LOAD '/carlos/in' USING TextLoader() AS (words:chararray);
    grunt> B = FOREACH A GENERATE FLATTEN(TOKENIZE(*));
    grunt> C = GROUP B BY $0;
    grunt> D = FOREACH C GENERATE group, COUNT(B);
    grunt> STORE D INTO '/carlos/wordcount';
    2012-09-27 18:15:54,337 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: GROUP_BY
    2012-09-27 18:15:54,533 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
    2012-09-27 18:15:54,549 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.CombinerOptimizer - Choosing to move algebraic foreach to combiner
    2012-09-27 18:15:54,602 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
    2012-09-27 18:15:54,603 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
    2012-09-27 18:15:54,667 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
    2012-09-27 18:15:54,699 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
    2012-09-27 18:15:54,702 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - creating jar file Job8572094332621233159.jar
    2012-09-27 18:15:57,274 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - jar file Job8572094332621233159.jar created
    2012-09-27 18:15:57,294 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job
    2012-09-27 18:15:57,337 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - BytesPerReducer=1000000000 maxReducers=999 totalInputFileSize=45339
    2012-09-27 18:15:57,338 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Neither PARALLEL nor default parallelism is set for this job. Setting number of reducers to 1
    2012-09-27 18:15:57,389 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission.
    2012-09-27 18:15:57,486 [Thread-4] INFO  com.mapr.fs.JobTrackerWatcher - Current running JobTracker is: westeros/172.16.149.138:9001
    2012-09-27 18:15:57,775 [Thread-4] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 2
    2012-09-27 18:15:57,775 [Thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 2
    2012-09-27 18:15:57,786 [Thread-4] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
    2012-09-27 18:15:57,787 [Thread-4] WARN  org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
    2012-09-27 18:15:57,793 [Thread-4] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1
    2012-09-27 18:15:57,890 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete
    2012-09-27 18:15:58,936 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_201209271718_0001
    2012-09-27 18:15:58,936 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - More information at: http://maprfs:50030/jobdetails.jsp?jobid=job_201209271718_0001
    2012-09-27 18:15:58,939 [main] INFO  com.mapr.fs.JobTrackerWatcher - Current running JobTracker is: westeros/172.16.149.138:9001


The job seems to be running but I am not 100% sure if it's stuck or hung waiting for some resource like slots.

I opened the URL in the last messages [http://maprfs:50030/jobdetails.jsp?jobid=job_201209271718_0001][2] and slightly modify it to use the hostname of my system [http://westeros:50030/jobdetails.jsp?jobid=job_201209271718_0001][3] and I am able to successfully open it in Firefox and it reports the status of the job as running, but the %Complete value doesn't change at all. It's all the time 0.00%.

Here is a snippet of the tasktracker log:

    2012-09-27 18:15:58,799 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201209271718_0001_m_000002_0 task's state:UNASSIGNED
    2012-09-27 18:15:58,803 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201209271718_0001_m_000002_0 which needs 1 slots
    2012-09-27 18:15:58,803 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 1 and trying to launch attempt_201209271718_0001_m_000002_0 which needs 1 slots
    2012-09-27 18:16:01,076 INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201209271718_0001_m_-1299692696
    2012-09-27 18:16:01,077 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201209271718_0001_m_-1299692696 spawned.
    2012-09-27 18:16:01,078 INFO org.apache.hadoop.mapred.TaskLog: MapR: Setup Cmds: export JVM_PID=`echo $$`
    export HADOOP_CLIENT_OPTS="-Dhadoop.tasklog.taskid=attempt_201209271718_0001_m_000002_0 -Dhadoop.tasklog.iscleanup=false -Dhadoop.tasklog.totalLogFileSize=0"
    export HADOOP_WORK_DIR="/tmp/mapr-hadoop/mapred/local/taskTracker/mapr/jobcache/job_201209271718_0001/attempt_201209271718_0001_m_000002_0/work"
    export HADOOP_TOKEN_FILE_LOCATION="/tmp/mapr-hadoop/mapred/local/taskTracker/mapr/jobcache/job_201209271718_0001/jobToken"
    export HADOOP_ROOT_LOGGER="INFO,maprfsTLA"
    export LD_LIBRARY_PATH="/tmp/mapr-hadoop/mapred/local/taskTracker/mapr/jobcache/job_201209271718_0001/attempt_201209271718_0001_m_000002_0/work:/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/native/Linux-amd64-64:"
    
    echo 10 > /proc/self/oom_score_adj;renice -n 10 -p $$ 1>/dev/null;
    2012-09-27 18:16:01,079 INFO org.apache.hadoop.mapred.TaskController: Writing commands to /tmp/mapr-hadoop/mapred/local/ttprivate/taskTracker/mapr/jobcache/job_201209271718_0001/attempt_201209271718_0001_m_000002_0/taskjvm.sh
    2012-09-27 18:16:02,308 INFO org.apache.hadoop.mapred.TaskTracker: Setting pid 4791 for jvm jvm_201209271718_0001_m_-1299692696
    2012-09-27 18:16:02,309 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201209271718_0001_m_-1299692696 given task: attempt_201209271718_0001_m_000002_0
    2012-09-27 18:16:02,924 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201209271718_0001_m_000002_0 0.0% setup
    2012-09-27 18:16:02,926 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201209271718_0001_m_000002_0 is done.
    2012-09-27 18:16:02,926 INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201209271718_0001_m_000002_0  was -1
    2012-09-27 18:16:02,927 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 1
    2012-09-27 18:16:03,088 INFO org.apache.hadoop.mapred.TaskTracker: Received KillTaskAction for task: attempt_201209271718_0001_m_000002_0
    2012-09-27 18:16:03,088 INFO org.apache.hadoop.mapred.TaskTracker: About to purge task: attempt_201209271718_0001_m_000002_0
    2012-09-27 18:16:13,058 INFO org.apache.hadoop.mapred.JvmManager: Killing Idle Jvm jvm_201209271718_0001_m_-1299692696 #Tasks ran: 1
    2012-09-27 18:16:13,064 WARN org.apache.hadoop.mapred.LinuxTaskController: Exit code from task is : 143
    2012-09-27 18:16:13,065 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201209271718_0001_m_-1299692696 exited with exit code 143. Number of tasks it ran: 1
    2012-09-27 18:16:13,399 INFO org.apache.hadoop.mapred.TaskTracker: Setting pid 4791 for jvm jvm_201209271718_0001_m_-1299692696
    2012-09-27 18:16:13,400 INFO org.apache.hadoop.mapred.TaskTracker: Killing unknown JVM jvm_201209271718_0001_m_-1299692696
    mapr@westeros:/opt/mapr/hadoop/hadoop-0.20.2/logs$


Any ideas, hints, suggestions, what could possibly be wrong?

This is the second time, I am trying after I rebooted the system and ran again the Pig job with the same results and behavior.

Is there anything I should be looking as far as slots configuration?
Something like cluster configuration, ratios or heap size?

Thanks,

Carlos.


  [1]: http://www.mapr.com/doc/display/MapR/Getting+Started+with+Pig
  [2]: http://maprfs:50030/jobdetails.jsp?jobid=job_201209271718_0001
  [3]: http://westeros:50030/jobdetails.jsp?jobid=job_201209271718_0001

Outcomes