Author: Shashank Chandhok
Original Publication Date: May 13, 2015
The fair scheduler is a multi-user job scheduler that allows the jobs to be submitted across different pools and ensures that all jobs get fair resources as per the assigned resources and shares to the pools.
Enabling the fair scheduler
To enable the fair scheduler make the following changes in the mapred-site.xml
1. Define the mapred.fairscheduler.allocation.file property to conf/pools.xml in the mapred-site.xml file.
2. Define the mapred.jobtracker.taskScheduler property in the mapred-site.xml file.
3. Set the mapred.fairscheduler.assignmultiple property to true in the mapred-site.xml file.
4. Set the mapred.fairscheduler.eventlog.enabled property to false in the mapred-site.xml file.
Create allocations file
Create pools.xml and define different pools in the file as per the requirement. In current example, two pools have been created 'high' and 'low'.
The above file guarantees 5 map slots and 5 reduce slots to a job submitted to high pool. The pool also has a minimum share preemption timeout of 300 seconds (5 minutes), meaning that if it does not get its guaranteed share within this time, it is allowed to kill tasks from other pools to achieve its share.The pool has a cap of 25 map and 25 reduce slots, which means that once 25 tasks are running, no more will be scheduled even if the pool's fair share is higher. The 'high' has a weight of 2.0 which ensures that the number of shares for 'high' pool would be double that of 'low' pool.
Each pool file can have the following parameters.
pool elements, which configure each pool. These may contain the following sub-elements:
- minMaps and minReduces, to set the pool's minimum share of task slots.
- maxMaps and maxReduces, to set the pool's maximum concurrent task slots.
- schedulingMode, the pool's internal scheduling mode, which can be fair for fair sharing or fifo for first-in-first-out.
- maxRunningJobs, to limit the number of jobs from the pool to run at once (defaults to infinite).
- weight, to share the cluster non-proportionally with other pools. For example, a pool with weight 2.0 will get a 2x higher share than other pools. The default weight is 1.0.
- minSharePreemptionTimeout, the number of seconds the pool will wait before killing other pools' tasks if it is below its minimum share (defaults to infinite).
user elements, which may contain a maxRunningJobs element to limit jobs. Note that by default, there is a pool for each user, so per-user limits are not necessary.
poolMaxJobsDefault, which sets the default running job limit for any pools whose limit is not specified.
userMaxJobsDefault, which sets the default running job limit for any users whose limit is not specified.
defaultMinSharePreemptionTimeout, which sets the default minimum share preemption timeout for any pools where it is not specified.
fairSharePreemptionTimeout, which sets the preemption timeout used when jobs are below half their fair share.
defaultPoolSchedulingMode, which sets the default scheduling mode (fair or fifo) for pools whose mode is not specified.
Here we have 4 fair scheduler pools.
- ExpressLane - This pool is used for short jobs. There are certain parameters which are used to define a short job which are explained in http://doc.mapr.com/display/MapR/ExpressLane
- high - Custom pool defined in pools.xml for high priority jobs.
- low - Custom pool defined in pools.xml for low priority jobs.
- default - This is a default pool that exists in the fair scheduler by default.
Submit a job to low priority pool in the scheduler
# hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar teragen -Dmapred.fairscheduler.pool=low 10000 /low
Submit a job to high priority pool in the scheduler
# hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar teragen -Dmapred.fairscheduler.pool=high 10000 /high
As per the priority and the fair share of pools the jobs are preempted in the low priority pool and the high priority pool jobs are started.