Why does the Foreman crash during the planning stages of a query?

Question asked by futuredriller on Nov 29, 2017
Hello All,


I'm putting Microstrategy on top of zookeeper that's pointed to a drill cluster.  The query Microstrategy creates is pretty long, but I've been able to successfully execute them.  However, something that used to execute, now does not.  The query gets assigned a foreman, goes into planning, and then just disappears rendering the foreman unusable and dropped from the zookeeper quorum.  (The quorum is a small machine.  Not sure if that's a problem.)


I run the same query, on my laptop (embedded mode), pointing to the same datasource, which happens to be S3, and although it takes 5 minutes, I get a query plan it starts executing.


This is what I've tried so far in addition to some settings on a machine that has 36 cores and 72 GB of memory:

1.  I've used the DRILL HEAP (13GB) and DRILL_MAX_DIRECT_MEMORY (51GB) formulas based on the hardware of the bit.

2.  I've set the S3 max connection limit to 10,000 to outrun the connection pooling error.

3.  planner.memory.max_query_memory_per_node (31 GB) is set to specification for low concurrency.

4.  planner.width.max_per_node (20) is set to specification for low concurrency.

5.  All files in S3 are parquet with the meta file with the FACT table partitioned by date.  (5 billion + rows)

6.  I set planner.in_subquery_threshold = 100 so I get partition elimination on larger IN clauses.

7.  Because of point 6, I increased the planner.memory_limit to 512 and then 1 GB without success.

8.  I jconsoled the foreman being used and watched the heap memory go real near the upper limit of the memory available.  (Running out of memory in planning???)

9.  I do not have access to the log b/c the foreman drops out of the quorum after about 7 minutes.


I think it's odd that my laptop running embedded was able to obtain a plan, but a 6 machine cluster running 36 cores and 72 GBs each was not able too.  Any guidance would be awesome!