Need to restart Resource Manager when fair-scheduler.xml is firstly created.

Document created by Hao Zhu Employee on Feb 18, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: April 29, 2015

 

Env:

Hadoop 2.5.1

Symptom:

1. Once fair-scheduler.xml is created for the first time, Resource Manager(RM) can not load it every 10 seconds as described in:https://hadoop.apache.org/docs/r2.5.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html

The allocation file is reloaded every 10 seconds, allowing changes to be made on the fly.

2. Resource Manager web UI(http://<RM host/IP>:8088/cluster/scheduler) does not show the changes made in fair-scheduler.xml.

Root Cause:

By default, yarn.scheduler.fair.allocation.file is set to fair-scheduler.xml.RM will search for the allocation file on the classpath (which typically includes the Hadoop conf directory) when RM is started.If RM can not find it during starting, it will print below warning message in RM logs:

WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: fair-scheduler.xml not found on the classpath.

After that, RM will not start thread "AllocationFileLoaderService" whose role is to keep monitoring the changes in allocation file and reload it every 10 seconds.The code logic is in AllocationFileLoaderService.java:Function getAllocationFile() is to search for the allocation file, if it could not find not it, it will return a NULL "allocFile".

 

if (url == null) {

  LOG.warn(allocFilePath + " not found on the classpath.");

allocFile = null;

Function serviceInit() calls getAllocationFile(), and only if the returned "allocFile" != NULL, it starts the "AllocationFileLoaderService" thread.

public void serviceInit(Configuration conf) throws Exception {

  this.allocFile = getAllocationFile(conf);

   if (allocFile != null) {

   reloadThread = new Thread() {

  ...}

So in all, if the allocation file does not exist in classpath when RM is started, RM can not reload it automatically.

Once allocation file is created in classpath for the first time, RM needs to be restarted once.

Solution:

Make sure allocation file(fair-scheduler.xml by default) does exist in classpath before RM is started.

If allocation file is created after RM is started, RM needs to be restarted to trigger the thread which is to load the allocation file automatically.

 

To confirm if RM can reload the allocation file every 10 seconds, try to make any changes to the file and monitor the RM log. Below message should show up:

INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/fair-scheduler.xml

Attachments

    Outcomes