AnsweredAssumed Answered

How to get AVRO filename causing "Invalid Sync!" -> Map fail

Question asked by thealy on Apr 8, 2013
Latest reply on Apr 9, 2013 by yufeldman
Running v2.1.1, M3.

I have a serious, recurring problem with M/R jobs failing when a corrupted AVRO file is read. The result is the entire job fails, and my attempts to trap the error in the mapper have all failed. In the past a tedious and time consuming process has worked: From the Jobs panel, find the job with status = RED; Select to get the list of task attempts where it failed, and ssh to one of those nodes; then in the /opt/mapr/hadoop/hadoop-0.20.2/logs/[tasktracker log for job], grep for the attempt number. This used to  show the dreaded "Invalid Sync!" messages and the file/split that was being processed. I could then remove the file from the input directory and attempt to fix it with an equally convoluted process.

But the file/split related to the "Invalid Sync!" error are not visible; looking in the userlogs, the job directory exists and contains files, but does not contain an attempt log directory for the attempt number in question. (I believe the filename visibility went away with upgrade to 2.1.1, but I'm not positive.)

Ideally I would like to trap and just ignore the non-recoverable error in the mapper and end the job there, but attempts to do so have failed.

Where can I find the information within the log structure? Any suggestions greatly appreciated.

-Terry

Outcomes