AnsweredAssumed Answered

MapRed insert job not working for large size data

Question asked by anup on Dec 31, 2013
Latest reply on Mar 3, 2016 by evckumar1
Hi Friends,

I have written mapred job to insert data from M5 to M7. First I use context to write Puts from mapper but that wont work for me. Then I use the HTablePool API to insert the data. That job works well for me for smaller size data (<=5G).
But when I use the same code for bigger size data (>50 G) data. All the map tasks after completing some percentage of insertion fails with no reason in jobtracker.

Job having more than 200 map task is failing all the time. I also tried auto flush off feature where I set auto flush to false in setup of mapper and flush commits in cleanup.

I am not getting where I am wrong. Does anybody face the same issue or having some idea about this issue.

For some mapper, I am seeing below error in JT:
*Tasktask_201312251324_48243_m_000107 is complete but its information is lost. This happens if jobtracker recovered the job and tasktracker which ran this task is not running/failed.*

Outcomes