AnsweredAssumed Answered

Impala scan MapR-FS slow

Question asked by Jesse on May 25, 2017
Latest reply on May 30, 2017 by maprcommunity

Hi guys.

I recently installed Impala on a 3-node MapR cluster. When I run a simple query.The performance is not as good as Impala+HDFS. Here is the query:

SELECT *
FROM ft_test, ft_wafer
WHERE ft_test_parquet.id = ft_wafer_parquet.id

and month = 1
and day = 8
and param = 2913;

It took about 3s. But when using the same query but with HDFS. It takes less than 1 sec for a 30Gb table size. 

 

What I already did is: using parquet, partitioning, compute stats.

I attached the profile of the query. From what I see. Most of the time was spent on Scan HDFS, which is very weird because this is not a time-consuming part usually. Please take a look. Any input would be nice. Thanks.

Attachments

Outcomes