AnsweredAssumed Answered

mapr starts shuffling later than hadoop

Question asked by feeblefakie on Sep 9, 2012
Latest reply on Sep 10, 2012 by gera

I have been running experiments in hadoop and mapr.
15nodes cluster and about 5TB of data is stored for each filesystem. (HDFS and maprfs)
I traced sar log through the experiments and noticed some differences between hadoop and mapr.
One difference that I can't understand is mapr starts shuffling very late compared to Hadoop.
Mapr basically starts shuffling after 98% of map completed even though hadoop starts shuffling  after 10% of map completed.
So, shuffling in mapr does not overlap with map processing most of the time.
I put the both in the same cluster with the same query and almost the same configuration (mapred-site.xml).
What makes this difference ?