AnsweredAssumed Answered

mapr starts shuffling later than hadoop

Question asked by feeblefakie on Sep 9, 2012
Latest reply on Sep 10, 2012 by gera
Hi,

<p>
I have been running experiments in hadoop and mapr.
15nodes cluster and about 5TB of data is stored for each filesystem. (HDFS and maprfs)
</p>
<p>
I traced sar log through the experiments and noticed some differences between hadoop and mapr.
One difference that I can't understand is mapr starts shuffling very late compared to Hadoop.
Mapr basically starts shuffling after 98% of map completed even though hadoop starts shuffling  after 10% of map completed.
So, shuffling in mapr does not overlap with map processing most of the time.
</p>
<p>
I put the both in the same cluster with the same query and almost the same configuration (mapred-site.xml).
What makes this difference ?
</p>

Outcomes