AnsweredAssumed Answered

killed jobs result in wasted memory

Question asked by jerdavis on May 23, 2013
Latest reply on May 24, 2013 by jerdavis
I have LOTS of MR jobs that are just hanging out taking up memory and CPU, but not doing any processing. The OutOfMemory is probably more effect than cause.. Usually these 'zombie' processes take up memory that isn't accounted for by the job tracker, and thus we end up going OOM.
I'm running v. 2.1.2.19528.GA

Doing an lsof allows me to track them back to the job

    ps -eo pid,euser,s,pcpu,pmem,bsdstart,fname
    
     7206 me S 88.9  1.7  00:38 java
     9869 me S 89.3  1.7  00:40 java
    10880 me S 82.5  1.6  22:45 java
    11108 me S 82.8  1.6  22:45 java
    17994 me S 86.1  1.6  20:25 java
    24245 me S 88.9  1.6  00:52 java
    25576 me S 88.8  1.6  00:56 java

    STDERR
        Exception in thread "Thread for syncLogs" java.lang.IllegalStateException: Shutdown in progress
                at java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:39)
                at java.lang.Runtime.addShutdownHook(Runtime.java:192)
                at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1546)
                at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1518)
                at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:235)
                at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:206)
                at org.apache.hadoop.mapred.CentralTaskLogUtil.renameFile(CentralTaskLogUtil.java:220)
                at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:329)
                at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:402)
                at org.apache.hadoop.mapred.Child$3.run(Child.java:157)
        
    SYSLOG:
    2013-05-23 01:07:18,714 WARN org.apache.hadoop.mapred.Child: Error running child
    cascading.flow.FlowException: internal error during mapper execution
            at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:138)
            at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
            at org.apache.hadoop.mapred.MapTask.run(MapTask.java:349)
            at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
            at org.apache.hadoop.mapred.Child.main(Child.java:264)
    Caused by: java.lang.OutOfMemoryError: Java heap space
            at java.util.Arrays.copyOf(Arrays.java:2882)
            at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
            at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
            at java.lang.StringBuffer.append(StringBuffer.java:219)
            at cascading.util.Util.join(Util.java:177)
            at cascading.util.Util.join(Util.java:162)
            at cascading.util.Util.join(Util.java:157)
            at cascading.util.Util.join(Util.java:140)
            at cascading.util.Util.join(Util.java:135)
            at cascading.scheme.util.DelimitedParser.parseLine(DelimitedParser.java:294)
            at cascading.scheme.hadoop.TextDelimited.source(TextDelimited.java:859)
            at cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:140)
            at cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:120)
            at cascading.flow.stream.SourceStage.map(SourceStage.java:76)
            at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
            at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
            ... 7 more
    
    LSOF:
    java    3613 upstream  cwd    DIR              253,0         0    1577925 /tmp/mapr-hadoop/mapred/local/taskTracker/us/jobcache/job_201305061420_20440/attempt_201305061420_20440_m_000075_1/work (deleted)

Outcomes