AnsweredAssumed Answered

Evaluating against CDH

Question asked by reducer on Dec 1, 2013
Latest reply on Dec 2, 2013 by Ted Dunning
Docs suggest MapR's hadoop would be 2x-5x faster than other distros. There is also some evidence on the web suggesting there is not much difference compared to CDH4. We are considering MapR as an option and were wondering about these before installing MapR M5.

 1. Has anyone tested MapR for a primarily hive structured-data workload?  Of course hive simply creates MR jobs but is there a benchmarked performance boost from MapR. We work with tens of billions of row tables-- with couple very large joins.

 2. When you say "native compression", and the fact that compression can be defined at a directory level, does this mean that I can simply set compression at /user/hive/warehouse/some_database.db/ and all the files will automatically be compressed? Also, is the decompression handled by the file-system or the process that is reading/writing? i.e. say I have some simple python code to read with open() then would the data come back decompressed to the client code?
**Edit:**
Sorry this has already been answered here:
http://answers.mapr.com/questions/2984/compression-in-hive-jobs and
http://answers.mapr.com/questions/2913/if-maprfs-compresses-files-transparently-is-it-unnecessary-to-compress-output-in-hive already.

 3. Is there a way to install MapR or at least MaprR-FS alongside CDH? The disk setup of the current cluster is JBOD running CDH4.

Thanks.

Outcomes