AnsweredAssumed Answered

importtsv debugging bad records

Question asked by rdominelli on May 13, 2013
Does anyone know if ImportTsv logs or can be configured to log the line which causes a badline error?  I am processing a file and receiving the following

    3:58:35 INFO mapred.JobClient: Job job_201305021335_0074 completed suc                                                                                               cessfully
    13/05/13 13:58:36 INFO mapred.JobClient: Counters: 17
    13/05/13 13:58:36 INFO mapred.JobClient:   Job Counters
    13/05/13 13:58:36 INFO mapred.JobClient:     Aggregate execution time of mappers                                                                                                 (ms)=463122
    13/05/13 13:58:36 INFO mapred.JobClient:     Total time spent by all reduces wai                                                                                                 ting after reserving slots (ms)=0
    13/05/13 13:58:36 INFO mapred.JobClient:     Total time spent by all maps waitin                                                                                                 g after reserving slots (ms)=0
    13/05/13 13:58:36 INFO mapred.JobClient:     Launched map tasks=1
    13/05/13 13:58:36 INFO mapred.JobClient:     Data-local map tasks=1
    13/05/13 13:58:36 INFO mapred.JobClient:     Aggregate execution time of reducer                                                                                                 s(ms)=0
    13/05/13 13:58:36 INFO mapred.JobClient:   ImportTsv
    13/05/13 13:58:36 INFO mapred.JobClient:     Bad Lines=254069
    13/05/13 13:58:36 INFO mapred.JobClient:   FileSystemCounters
    13/05/13 13:58:36 INFO mapred.JobClient:     MAPRFS_BYTES_READ=176214930
    13/05/13 13:58:36 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=72302
    13/05/13 13:58:36 INFO mapred.JobClient:   Map-Reduce Framework
    13/05/13 13:58:36 INFO mapred.JobClient:     Map input records=884861
    13/05/13 13:58:36 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=162975744
    13/05/13 13:58:36 INFO mapred.JobClient:     Spilled Records=0
    13/05/13 13:58:36 INFO mapred.JobClient:     CPU_MILLISECONDS=72000
    13/05/13 13:58:36 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=1657634816
    13/05/13 13:58:36 INFO mapred.JobClient:     Map output records=630792
    13/05/13 13:58:36 INFO mapred.JobClient:     SPLIT_RAW_BYTES=91
    13/05/13 13:58:36 INFO mapred.JobClient:     GC time elapsed (ms)=1317


Thanks for any help/
Rich

Outcomes