I want to find out if Drill on YARN would help for my use case? In what cases should I use Drill on YARN over Drill on warden?
Drill offers many benefits when launched as a YARN managed application. You should consider deploying Drill as a YARN application if you
YARN framework allocates memory to applications in the form of containers with each container allocated a certain amount of memory. Applications then launch or release containers as instructed by YARN to use or free memory. When Drill on YARN is launched, the drill cluster is launched as a long running YARN application with memory being assigned statically - the amount of memory is configured manually ahead of time in Drill and YARN’s configuration files before the drill cluster is launched. Once launched, the number of memory available to the drill clusters can be increased or decreased dynamically by increasing or decreasing the number of running drillbits using Drill on YARN Web UI or client tools. When the number of drillbits need to be increased, the Drill on YARN application master makes the request for more containers and YARN allocates containers if resources are available. The application master can also, when asked, shrink the cluster by reducing the number of running drillbits. Note that when this occurs, all in flight queries can be killed.
As a best practice, it’s best to give Drill as much memory as required to handle the expected workload as Drill tends to use as much memory is available on the system up to a configured maximum. When the memory requirements of a query or a workload exceed this configured maximum, the cluster will not be able to handle additional queries until memory becomes available. Drill on YARN also tends to use as much CPU as available. Limiting CPU utilization is not currently supported. However, this is one of the few things that are being worked on.
Drill on YARN can be used in a multi tenant environment where different clusters can be launched under different user ID's with optionally a different number of drillbits providing data isolation. Clusters can optionally share metadata (storage plugin, views) or share nothing. Note that in order to share nothing, the root in ZooKeeper (zk.root) needs to be set to a value other than the default (/drill). When running multiple Drill clusters on YARN, less memory is made available per cluster than would have been available when running one larger cluster. Also, there currently isn’t a limit CPU utilization at a cluster level. A heavy workload from one tenant can easily make less CPU resources available to all other tenants.
There are some limitations in the current release. In the current implementation, Drill on YARN application master is a single point of failure. When the AM crashes or the node that runs the DoY AM is lost, the cluster dies and no recovery is possible. Please see this link that describes other limitations in detail prior to making a decision to use it.
Retrieving data ...