I understand that to optimize Drill queries, you should have a parquet file block-size the same as the block size in the file system.
I'm not sure how many parts the parquet files should be broken in to optimally though. If I save a file in Spark and coalesce to 1,440 (one sub-file per minute of a day), my performance is far worse than if I coalesce to, say, 40 (which ends up being ~1GB per sub file).
Is there some general target number I should aim for when coalescing on my cluster for parquet files that will be used in Drill? (like 1/node, etc).