AnsweredAssumed Answered

MapR-Drill Performance challenge

Question asked by AjayChaudhary on Aug 29, 2017
Latest reply on Sep 25, 2017 by jbates

We are finding challenges with the performance of MapR-Drill while querying a parquet file stored in the MapR-FS.

While we are querying the same file using IMPALA, it was giving better performance.

 

Query 1: -1 month of data
impala took: 11s
mapr-drill took: 118secs

 

select yodlee_transaction_status,count(1) from dfs.`/user/hive/warehouse/cv2_jan2015_parquet/` where description not like '%a%' or description like '%qwqw%' group by yodlee_transaction_status;


query2: - 2 year data
impala : 27mins
Mapr-drill: 2hr+

select
*
from
dfs.`/mpanel/impaladb/cv2/cv2_parquet/`
where
file_created_date >= '2014-04-01'
and file_created_date <= '2017-04-29'
and cobrand_id in ('10006164')
and yodlee_transaction_status <> 'D'
and currency_id = '152'
and description like '%dsyg%'
and description like '%sad|tiasxas|tick|adda|asda|df%'
order by random()
limit 200000;

 

Following Parameters are Changed

 

We have made only two tweaks(heap size/spill dir) we performed on Map-R drill.
Other parameters might be regular settings on Drill itself.

 

--drill-env.sh
export DRILL_HEAP=${DRILL_HEAP:-"12G"}
export DRILL_MAX_DIRECT_MEMORY=${DRILL_MAX_DIRECT_MEMORY:-"64G"}
export HADOOP_HOME=/opt/mapr/hadoop/hadoop-2.7.0
export DRILL_LOCALHOST=`hostname -i`

--drill-override.conf

drill.exec: {
cluster-id: "drillbits1",
zk.connect: "10.11.X.XX:5181,10.11.X.XX:5181,10.11.X.XX:5181"
sort.external.spill.directories: ["/tmp/"${DRILL_LOCALHOST}],
sort.external.spill.fs: "maprfs:///"
}

 

Few findings: whenever we use "OR" Operator in query MapR-Drill slows down the performance.

 

Also, We are planning to have a star schema kind of structure in MapR-DB. All the tables will be binary tables.  How will be the performance of

How will be the performance of all the tables.

 

Regards,

Balaji

Outcomes