AnsweredAssumed Answered

using mapR distribtued cache is very slow

Question asked by fayue1015 on Jun 3, 2012
Latest reply on Jun 20, 2012 by fayue1015
  I am distributing the data using distribute cache  in *standard* hadoop very well, which load the data 2G around 250 seconds, while I run the same program using mapR, because our server also install mapR, and it takes 20,000 seconds to load the data, very slow. not sure what is the reason.
 Also, the code to run the .jar in mapr is a little different, I need to add maprs:/, which is the full path extension to the data so that I can use distributed cache mechanism, while in hdfs, it can automatically extend the file path.

for example, in maprs filesystem, I need to use following command

./hadoop jar x.jar  -gp maprfs:/user/data/genotype.txt   -op output_path/

in hdfs filesystem, I can use following command, assume both filesystem has the same directory structure.

./hadoop jar x.jar  -gp data/genotype.txt -pp   -op output_path/

Thanks for your help!