AnsweredAssumed Answered

Mapreduce takes time in merge-pass

Question asked by rpillai on Jan 10, 2018
Latest reply on Jan 23, 2018 by maprcommunity

I am running a simple name match merge map reduce job in talend against a 7 node mapr cluster.  The file I am running is around 200K rows thats 180mb.  The job zooms passed thru until the merge step and it runs the merge reducer step for 2 hrs. Not sure why it takes around 2 hrs. here is the log, will be great if someone can help:

 

2018-01-10 21:55:41,725 INFO [main] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: MergerManager: memoryLimit=1670591232, maxSingleShuffleLimit=417647808, mergeThreshold=1102590208, ioSortFactor=256, memToMemMergeOutputsThreshold=256 2018-01-10 21:55:41,729 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleEventFetcher: attempt_1508344405654_15568_r_000001_0 Thread started: EventFetcher for fetching Map Completion Events 2018-01-10 21:55:41,747 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleEventFetcher: attempt_1508344405654_15568_r_000001_0: Got 3 new map-outputs 2018-01-10 21:55:41,771 INFO [MapOutputCopier task_1508344405654_15568_r_000001.10] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher: fetcher#10 about to shuffle output of map attempt_1508344405654_15568_m_000002_0 to MEMORY 2018-01-10 21:55:41,771 INFO [MapOutputCopier task_1508344405654_15568_r_000001.0] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher: fetcher#0 about to shuffle output of map attempt_1508344405654_15568_m_000001_0 to MEMORY 2018-01-10 21:55:41,776 INFO [MapOutputCopier task_1508344405654_15568_r_000001.10] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: closeInMemoryFile -> map-output of size: 9277, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->8465390 2018-01-10 21:55:41,776 INFO [MapOutputCopier task_1508344405654_15568_r_000001.11] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleFetcher: fetcher#11 about to shuffle output of map attempt_1508344405654_15568_m_000000_0 to MEMORY 2018-01-10 21:55:41,777 INFO [MapOutputCopier task_1508344405654_15568_r_000001.0] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: closeInMemoryFile -> map-output of size: 92931, inMemoryMapOutputs.size() -> 2, commitMemory -> 9277, usedMemory ->8465390 2018-01-10 21:55:41,807 INFO [MapOutputCopier task_1508344405654_15568_r_000001.11] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: closeInMemoryFile -> map-output of size: 8363182, inMemoryMapOutputs.size() -> 3, commitMemory -> 102208, usedMemory ->8465390 2018-01-10 21:55:41,808 INFO [EventFetcher for fetching Map Completion Events] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleEventFetcher: EventFetcher is interrupted.. Returning 2018-01-10 21:55:41,815 INFO [main] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: finalMerge called with 3 in-memory map-outputs and 0 on-disk map-outputs 2018-01-10 21:55:41,829 INFO [main] org.apache.hadoop.mapred.Merger: Merging 3 sorted segments 2018-01-10 21:55:41,829 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 3 segments left of total size: 8465384 bytes 2018-01-10 21:55:41,973 INFO [main] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: Merged 3 segments, 8465390 bytes to disk to satisfy reduce memory limit 2018-01-10 21:55:41,973 INFO [main] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: Merging 1 files, 8465406 bytes from disk 2018-01-10 21:55:41,974 INFO [main] org.apache.hadoop.mapreduce.task.reduce.DirectShuffleMergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce 2018-01-10 21:55:41,974 INFO [main] org.apache.hadoop.mapred.Merger: Merging 1 sorted segments 2018-01-10 21:55:41,977 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 8465404 bytes


Attached is the Hadoop confidence file

Attachments

Outcomes