AnsweredAssumed Answered

Apache Drill Aggregations

Question asked by sgudavalli on Jul 24, 2015
Latest reply on Jul 27, 2015 by parth
Hi,

I am doing a POC on MapR DB & Apache Drill as (SQL on hadoop) reporting interface.
One of the use case we have is aggregates.... for ex.. find all bookings in hotel....

In the below scenario, we are retrieving all bookings & in the other we are trying to calculate number of bookings.

**Timings**
to retrieve all bookings it take 1.12 seconds..
whereas to check how many number of bookings is made it take 16 seconds..

**Queries**

select * from dfs.`/user/mapr/booking` t where row_key like 'A001_%' and t.p.chkIn <= 20120103 and t.p.chkOut >= 20120104

select count(*) from dfs.`/user/mapr/booking` t where row_key like 'A001_%' and t.p.chkIn <= 20120103 and t.p.chkOut >= 20120104

this is crazy... how come aggregation is so bad in drill.... ?? i tried both with & without multiphase aggrgation..

row_key design -> HOTEL_XXXXXXX , all the booking related to a hotel is available close to each other

Any ideas ??

Regards
Shiv



Outcomes