See all drill best practice FAQs.
When joining two input streams (which may be a base table or result of previous joins), Drill will generate a plan that favors using the *estimated* smaller input as the right (build) side of the HashJoin. It builds an in-memory hash table on this input. It is possible that the planner may have under-estimated the row count especially when the input is the result of a previous join. For higher row counts the hash table may run out of memory.
It is a good practice to cross-check with the execution profile if the planner picked the actual smaller input as the build side of the hash join. For more information, see https://drill.apache.org/docs/sort-based-and-hash-based-memory-constrained-operators/
Retrieving data ...