See all Apache Drill Best Practices from the MapR Drill Team
You can determine whether partition pruning is applied or not based on the query plan.
Run EXPLAIN PLAN on your query. The SCAN portion of the explain output will indicate the number of files that need to be scanned. If partition pruning has occurred, the number of files will be a subset of the total files in the table.
In the example below where a scan is done, filtering first on a single year and 2 months within that year, only 2 files are scanned, as noted by “numFiles = 2”. The 2 files scanned are listed in the “files” list.
> explain plan for select * from dfs.drill.`part1` where dir0='2015' and (dir1 >= '02' and dir1 <= '03');+------+------+| text | json |+------+------+| 00-00 Screen00-01 Project(*=[$0])00-02 Project(*=[$0])00-03 Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/part1, numFiles=2, columns=[`*`], files=[maprfs:/drill/part1/2015/02/02.csv, maprfs:/drill/part1/2015/03/03.csv]]])
Another way to determine if partition pruning is happening is to check foreman drillbit.log.
For example, I have a parquet table with 2 partitions, and I am doing a partition pruning query to only scan one partition.
> select * from dfs.tmp.prune where id=2;
| id |
| 2 |
1 row selected (0.27 seconds)
From the foreman drillbit.log for this query, below information will be printed:
INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.logical.partition.PruneScanRule$DirPruneScanFilterOnScanRule
INFO o.a.d.e.p.l.partition.PruneScanRule - No partition columns are projected from the scan..continue. Total pruning elapsed time: 2 ms
INFO o.a.d.e.p.l.partition.PruneScanRule - Beginning partition pruning, pruning class: org.apache.drill.exec.planner.logical.partition.ParquetPruneScanRule$2
INFO o.a.d.e.p.l.partition.PruneScanRule - Total elapsed time to build and analyze filter tree: 1 ms
INFO o.a.d.e.p.l.partition.PruneScanRule - Elapsed time to populate partitioning column vectors: 0 ms within batchIndex: 0
INFO o.a.d.e.p.l.partition.PruneScanRule - Elapsed time in interpreter evaluation: 18 ms within batchIndex: 0
INFO o.a.d.e.p.l.partition.PruneScanRule - Pruned 2 partitions down to 1
Above log clearly shows the how many partitions are pruned and the elapsed time for partition pruning.
Note: above log is based on Drill 1.6 GA version.
Retrieving data ...