See all drill best practice FAQs.
If Drill performance is your sole criteria, then it is always better to enable compression in parquet and disable compression on MFS for the directories accessed by Drill. In some of internal performance tests, the best reads/sec were achieved when parquet compression is enabled and MFS compression is disabled. However this should be done with an extra bit of caution, bearing in mind that disabling MFS compression can adversely affect other components that are accessing those same files within those MFS directories.
My experiences and testing have shown the same. Disable mfs compression, and rely on Parquet compression. I think that is partially due to the block in parquet being able to compressed individually. This made for better throughput. The other thing I did in Drill/MapR was to set the chunk size in MapR two 2x what the default is. (The default is 256mb, I set to 512mb to match the Parquet block size default in Drill). This made a large difference as well in my query times with Drill.
I am curious on how other components would be negatively affected. Let's say spark was reading the Parquet files. It's still using the Parquet libs, so the decompression should work in a similar way right? Is your statement based on testing results with certain tools or more of a "it COULD do this, make sure you test each component individually" type warning?
Retrieving data ...