I am using YARN for Resource Management in my MapR cluster. Most of the times, Spark jobs and Hive queries are submitted to it. Which scheduler is ideal to use?
The default scheduler for MapR is the Fair Scheduler. I believe Cloudera also uses the Fair Scheduler as its default since CDH4. Hortonworks uses Capacity Scheduler as the default. 2 /3 in favor of Fair Scheduler.
It appears that the features in both the Schedulers are very comparable and you can achieve the same outcomes by turning different knobs on both the Schedulers. For example, if you setup the same hierarchical queues structure in both schedulers and set the per queue scheduling policy in Fair Scheduler as FIFO, it would effectively act a Capacity Scheduler and vice-versa.
My take is that unless you have a specific reason to, I would err on the side of leaving the default scheduler as is and focus more on the hierarchical queue structures and the minimum queue guarantees.
Here is a reference with more information on the comparison between the schedulers.: On what basis do I decide between Fair and Capacity Scheduler in YARN? - Quora
Naveen Gainedi Thank you.
Retrieving data ...