Author: Mufeed Usman
Original Publication Date: April 14, 2015
On executing a map reduce a lot of mappers seem to get scheduled on data nodes where the data is not local.
To increase the percentage of task execution on nodes with data locality.
The 'mapred.fairscheduler.locality.delay' (value is set in milliseconds) parameter in 'mapred-site.xml' can be used to achieve the goal above. This parameter identifies how long the JobTracker should wait before scheduling a non-local task.
For example, if the value is set to 60 seconds, JobTracker will wait for 60 seconds to figure out local data nodes and will try to execute tasks on those nodes on priority and if it fails to find local nodes in 60 seconds, tasks are executed on non-local nodes.