How to utilize Drill native parquet reader to optimize queries on Hive parquet table

Document created by Hao Zhu Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

 

Original Publication Date: October 27, 2015

 

Environment :

Drill 1.2

Goal:

Starting from Drill 1.2 release, Drill can utilize native parquet reader to optimize queries on Hive parquet table. Previously Drill has to use Hive parquet reader which causes huge java heap usage instead of utilizing java direct memory.

 

This article shows how to enable this feature and how to confirm that from viewing SQL explain plan.

Solution:

This behavior is controlled by parameter store.hive.optimize_scan_with_native_readers which is off by default in Drill 1.2 version.

0: jdbc:drill:> select * from sys.options where name='store.hive.optimize_scan_with_native_readers'; +-----------------------------------------------+----------+---------+----------+----------+-------------+-----------+------------+ |                     name                      |   kind   |  type   |  status  | num_val  | string_val  | bool_val  | float_val  | +-----------------------------------------------+----------+---------+----------+----------+-------------+-----------+------------+ | store.hive.optimize_scan_with_native_readers  | BOOLEAN  | SYSTEM  | DEFAULT  | null     | null        | false     | null       | +-----------------------------------------------+----------+---------+----------+----------+-------------+-----------+------------+ 1 row selected (0.31 seconds)

We can enable it at system level or session level.

 

For example, before enabling this feature, the explain plan will show "HiveScan" for queries on Hive parquet table.

explain plan for select * from h1_passwords_parquet; Scan(groupscan=[HiveScan [table=Table(dbName:default, tableName:h1_passwords_parquet)

 

If we enable it at session level, the explain plan will show "HiveDrillNativeParquetScan" instead:

alter session set `store.hive.optimize_scan_with_native_readers`=true; explain plan for select * from h1_passwords_parquet; Scan(groupscan=[HiveDrillNativeParquetScan [table=Table(dbName:default, tableName:h1_passwords_parquet)

Attachments

    Outcomes