AnsweredAssumed Answered

How to improve query plan time for views against parquet files?

Question asked by gesgeorge on Apr 12, 2018
Latest reply on Apr 12, 2018 by gesgeorge

My question is related to my previous question here Does Drill understand Parquet partitions created by Spark? 

Since it was suggested that views are the best way to query partitioned parquet files created by Spark, I setup views to make it easy for a user to query against these parquet files.

What I found out was that Drill takes a long time to generate the query plans when querying against this view. In one example, execute plan for <query> took more than 2.5 mins.

I think there are some optimizations I can do with the underlying parquet files themselves that would help, but I think I would still have this issue even then. 

 

What this tells me is that the views probably don't have any metadata saved against it (and frankly didn't expect it either). Is that some way for me to get Drill to generate some metadata against these files, for example, can I do a create table against existing parquet files so that Drill can store metadata against it and improve query time?

Outcomes