AnsweredAssumed Answered

What are the best pratices for apache drill with jdbc storage ?

Question asked by mlamy on May 30, 2016
Latest reply on Oct 17, 2016 by tushu1232

Hello, MAPR community

 

 

We've been playing with Apache Drill and I notice that planning phase in Apache Drill is taking a lot of time.

 

We did some tests with and without MapR sandBox always with same results (we also did some tests with Apache Drill installed locally and remotely with a mysql storage and mssql storage). We think Apache Drill is a great product but unfortunately we think we can't get it work properly.

 

We'll great apreciate any input that can help us (we're out of ideas).

 

We're also sorry to duplicate this question. We also posted this question in Apache Drill mailling list but diidn't get any answer yet.

 

We took a look at drill log and notice that most of the time was during the execution of VOLCANO:Logical Planning (no pruning or join) and VOLCANO:Physical Planning.

 

We're using Apache Drill 1.6.0. We use drill without changing any system options.

 

Queries don't take long to execute after the plan is done (we saw the profiles and executed the physical plans).

 

Simple queries with a couple a fields from one table only, usually takes about 0,5s in planning phase, more complex queries takes a lot of time in planning usually taking more than 10s or minutes. Same queries with mysql or mssql jdbc driver usually takes less than one second to execute.

 

Can you please give us some suggestions how to debug a little further this issue, and give us some best pratices so we can understand what's happening ?

 

Thank you.

 

Miguel

 

P.S. Sorry for the revious post on discussions :-(

 

----------------------------------------------------------------------------------------------------------------------------

Example 1:

 

 

Complex query taking about 12s/13s in planning phase and about 3s on execution phase

 

Line 5: 2016-05-26 08:11:07,375

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.drill.exec.store.SchemaFactory - Took 37 ms to register schemas.

 

  Line 131: 2016-05-26 08:11:09,841

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP:Window Function rewrites (221ms):

 

  Line 381: 2016-05-26 08:11:10,171

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Directory Prune Planning

 

(166ms):

 

  Line 506: 2016-05-26 08:11:15,299

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:LOGICAL PLANNING (NO PRUNING

 

OR JOIN). (5126MS):

 

  Line 641: 2016-05-26 08:11:15,386

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune Planning

 

(86ms):

 

  Line 776: 2016-05-26 08:11:15,893

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning

 

(506ms):

 

  Line 911: 2016-05-26 08:11:15,981

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0

 

(87ms):

 

  Line 1046: 2016-05-26 08:11:21,561

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:foreman] DEBUG

 

o.a.d.e.p.s.h.DEFAULTSQLHANDLER - VOLCANO:PHYSICAL PLANNING (5579MS):

 

  Line 2145: 2016-05-26 08:11:21,742

 

[28b8edf3-b035-6a01-4193-5295de7a16f0:frag:0:0] DEBUG

 

o.a.d.exec.physical.impl.ImplCreator - Took 12 ms to create RecordBatch

 

tree

 

-----------------------------------------------------------------------------------------------------------------------------

 

Example 2 simple query:

 

 

Results with EXPLAIN PLAN FOR

 

EXPLAIN PLAN FOR SELECT e.ChvP, e.DesigEstrutura, e.CodCompiEstrutura,

 

e.ChvEEstrutura FROM siag_enidh.dbo.Estrutura AS e INNER JOIN

 

siag_enidh.dbo.Categoria AS catg ON catg.ChvP = e.ChvECategoria AND

 

catg.SiglaCategoria = 'CPBSQ11';

 

 

The query execution with physical plan takes about 0,02s. The plan takes more that 0,5s to execute as you see in detail log:

 

Line 8: 2016-05-26 07:51:34,214

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.drill.exec.store.SchemaFactory - Took 600 ms to register schemas.

 

  Line 15: 2016-05-26 07:51:35,563

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP:Window Function rewrites (42ms):

 

  Line 27: 2016-05-26 07:51:35,585

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Directory Prune Planning

 

(14ms):

 

  Line 33: 2016-05-26 07:51:36,104

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:LOGICAL PLANNING (NO PRUNING

 

OR JOIN). (517MS):

 

  Line 41: 2016-05-26 07:51:36,115

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Partition Prune Planning

 

(10ms):

 

  Line 49: 2016-05-26 07:51:36,121

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:LOPT Join Planning

 

(5ms):

 

  Line 57: 2016-05-26 07:51:36,126

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - HEP_BOTTOM_UP:Convert SUM to $SUM0

 

(4ms):

 

  Line 65: 2016-05-26 07:51:36,255

 

[28b8f28b-327e-6b3e-866e-df640296211a:foreman] DEBUG

 

o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Physical Planning (124ms):

 

  Line 220: 2016-05-26 07:51:36,682

 

[28b8f28b-327e-6b3e-866e-df640296211a:frag:0:0] DEBUG

 

o.a.d.exec.physical.impl.ImplCreator - Took 150 ms to create RecordBatch

 

tree

 

2016-05-26 07:51:36,104 [28b8f28b-327e-6b3e-866e-df640296211a:foreman]

 

DEBUG o.a.d.e.p.s.h.DefaultSqlHandler - VOLCANO:Logical Planning (no

 

pruning or join). (517ms):

 

DrillProjectRel(ChvP=[$0], DesigEstrutura=[$12],

 

CodCompiEstrutura=[$11], ChvEEstrutura=[$8]): rowcount = 100.0,

 

cumulative cost = {415.0 rows, 402.0 cpu, 0.0 io, 0.0 network, 0.0

 

memory}, id = 486

 

  JdbcDrel: rowcount = 100.0, cumulative cost = {415.0 rows, 402.0 cpu,

 

0.0 io, 0.0 network, 0.0 memory}, id = 485

 

  JdbcJoin(condition=[=($62, $7)], joinType=[inner]): rowcount =

 

100.0, cumulative cost = {315.0 rows, 302.0 cpu, 0.0 io, 0.0 network,

 

0.0 memory}, id = 484

 

  JdbcTableScan(table=[[siag_enidh, dbo, Estrutura]]): rowcount =

 

100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network,

 

0.0 memory}, id = 5

 

  JdbcFilter(condition=[=($11, 'CPBSQ11')]): rowcount = 15.0,

 

cumulative cost = {115.0 rows, 201.0 cpu, 0.0 io, 0.0 network, 0.0

 

memory}, id = 483

 

  JdbcTableScan(table=[[siag_enidh, dbo, Categoria]]): rowcount =

 

100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io, 0.0 network,

 

0.0 memory}, id = 6

Outcomes