Data Locality and Map Task Scheduling

Document created by mufeed Employee on Feb 13, 2016
Version 1Show Document
  • View in full screen mode

Author: Mufeed Usman

 

Original Publication Date: April 14, 2015

 

Scenario:
On executing a map reduce a lot of mappers seem to get scheduled on data nodes where the data is not local.

Goal:

To increase the percentage of task execution on nodes with data locality.

 

Solution:
The 'mapred.fairscheduler.locality.delay' (value is set in milliseconds) parameter in 'mapred-site.xml' can be used to achieve the goal above. This parameter identifies how long the JobTracker should wait before scheduling a non-local task.

For example, if the value is set to 60 seconds, JobTracker will wait for 60 seconds to figure out local data nodes and will try to execute tasks on those nodes on priority and if it fails to find local nodes in 60 seconds, tasks are executed on non-local nodes.

1 person found this helpful

Attachments

    Outcomes