AnsweredAssumed Answered

MapR-Drill Performance challenge

Question asked by AjayChaudhary on Aug 29, 2017
Latest reply on Sep 25, 2017 by jbates

We are finding challenges with the performance of MapR-Drill while querying a parquet file stored in the MapR-FS.

While we are querying the same file using IMPALA, it was giving better performance.


Query 1: -1 month of data
impala took: 11s
mapr-drill took: 118secs


select yodlee_transaction_status,count(1) from dfs.`/user/hive/warehouse/cv2_jan2015_parquet/` where description not like '%a%' or description like '%qwqw%' group by yodlee_transaction_status;

query2: - 2 year data
impala : 27mins
Mapr-drill: 2hr+

file_created_date >= '2014-04-01'
and file_created_date <= '2017-04-29'
and cobrand_id in ('10006164')
and yodlee_transaction_status <> 'D'
and currency_id = '152'
and description like '%dsyg%'
and description like '%sad|tiasxas|tick|adda|asda|df%'
order by random()
limit 200000;


Following Parameters are Changed


We have made only two tweaks(heap size/spill dir) we performed on Map-R drill.
Other parameters might be regular settings on Drill itself.
export DRILL_HEAP=${DRILL_HEAP:-"12G"}
export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0
export DRILL_LOCALHOST=`hostname -i`


drill.exec: {
cluster-id: "drillbits1",
zk.connect: "10.11.X.XX:5181,10.11.X.XX:5181,10.11.X.XX:5181"
sort.external.spill.directories: ["/tmp/"${DRILL_LOCALHOST}],
sort.external.spill.fs: "maprfs:///"


Few findings: whenever we use "OR" Operator in query MapR-Drill slows down the performance.


Also, We are planning to have a star schema kind of structure in MapR-DB. All the tables will be binary tables.  How will be the performance of

How will be the performance of all the tables.