Finding specific part-file which gives drops

Question asked by simanchal.maharana on Jan 23, 2018
I am uploading huge amount of XML data in part-file format ( output of MapRedce/Hive jobs ) to MarkLogic database by Mapr Map Reduce job. Due to some cluster issue or network issue only few record 5/10/50/100 records (out of 20 million) are not uploading. For which I need to upload whole 20 million record again. It’s very time consuming. We are losing 2/3 Hrs. again.


I want to find those particular split file/part file from which few records missed. So that I can re-ingest only those part files instead of whole 20/30 millions . How can I find those specific part files?


Could you please help me for the above thing?


Thanks a lot for your help.