AnsweredAssumed Answered

Intermittent "ClassNotFoundException" failures for jars in Distributed Cache

Question asked by thealy on Jun 19, 2013
Latest reply on Jun 4, 2014 by thealy
Running 2.1.3.19871.GA / M3 / 60 nodes

I'm having a recurring problem with M/R jobs failing due to "ClassNotFoundException" errors, usually on reduce tasks, on multiple nodes. The same nodes had successfully completed Map jobs which use the same allegedly missing .jar files, just seconds before. In the TT logs I can see them being found and used:
<pre>
2013-06-20 09:54:47,562 INFO org.apache.hadoop.filecache.TrackerDistributedCacheManager: Using existing cache of maprfs:///flow/jars/avro-1.7.3.jar->/tmp/mapr-hadoop/mapred/local/taskTracker/distcache/1039772895504509950_1188761459_1793001461/maprfs/flow/jars/avro-1.7.3.jar
</pre>
Then:
<pre>
2013-06-20 09:56:44,500 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201305161027_0210_r_000006_3 - Killed : java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroSerialization$AvroWrapperDeserializer
</pre>
In many cases of I re-submit the exact same job it runs to completion.

[A similar / related post running an earlier version was never resolved]

Outcomes