It seems there is a tradeoff when setting max mappers and max reducers, and I'm wondering if there is another setting somewhere to help tune this.
I can configure my cluster to get 95% utilization with a good mix of mappers and reducers. However often times I end up with a final Reduce that takes a long time, and my cluster is only 50% utilized (because I have reserved those resources for Maps that might come along but never will).
On the other hand, if I do allow more reduce slots, it's easy for me to OOM the nodes during normal (mixed) execution.
Any suggestions on how to handle this?