AnsweredAssumed Answered

If Maprfs compresses files transparently, is it unnecessary to compress output in hive?

Question asked by mfuery on Jul 13, 2012
Latest reply on Jul 13, 2012 by steven
It seems to me that maprfs compression is at the filesystem level, so I assume it takes place on each node. However, I am unsure if the network traffic, generated by Hive map reduce, is transmitted compressed or uncompressed, if I am not using compression within hive configuration.  It seems to me that hive reads the file from maprfs and it is uncompressed, then, if blocks reside on other nodes, those blocks are sent across the network by hive, also uncompressed.  Thoughts?


    set mapred.output.compress=true;
    set mapred.compress.map.output=true;
    SET mapred.map.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
    SET mapred.output.compression.type=BLOCK;
    SET mapreduce.maprfs.use.compression=true;
    SET hive.exec.compress.output=true;

Outcomes