AnsweredAssumed Answered

How to Merge completely in memory

Question asked by jerdavis on Oct 4, 2012
Latest reply on Oct 5, 2012 by gera
If I have a small enough dataset, and I want to do the Merge phase entirely in memory, what parameters would I set?
Lets assume I have a 5GB dataset with 50M keys at 100 Bytes, and lets assume I am sending them all to a single reducer.

It seems like the following should work but it doesn't, as I see several merges in the logs. Are there any other parameters I should look at?

    mapred.inmem.merge.threshold=0
    mapred.job.shuffle.input.buffer.percent=0.90
    mapred.job.shuffle.merge.percent=0.90
    mapred.reduce.child.java.opts=-Xmx10000m
    
    and maybe:
    mapred.reduce.parallel.copies=20

Outcomes