Resource Manager Standby does not start

Document created by wade on Feb 27, 2016
Version 1Show Document
  • View in full screen mode

Author: Jitendra Yadav

Original Publication Date: February 12, 2015

 

 

We generally see below exception while trying to configure RM HA, since exception is simply and saying that some ip:port binding already used in the system but we need to look at the configuration in yarn-site.xml  on RM stand by node.

 

2015-01-19 10:12:31,350 INFO org.apache.hadoop.http.HttpServer2:
HttpServer.start() threw a non Bind IOException
java.net.BindException: Port in use: <node>:8088
at
org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:863)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:799)

 

So as per HA design, zookeeper is responsible for mantaining RM's HA stat and it has all the information about which one is active or standby RM nodes. Apart from that there is one property which should be unique between all the RM's nodes becuase this property tells zookeeper about number of unique RM's in the cluster.

 

Right now configure.sh does not handle below property on each RM nodes instead we need to manually change it on every RM node.

 

In yarn-site.xml

<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<source>yarn-site.xml</source>
</property>

 

So to resolve the issue the above property should be unique on all the RM nodes.

 

Example: If you have 3 resource managers in the cluster

 

First RM Node:

 

<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm1</value>
<source>yarn-site.xml</source>
</property>

 

Second RM Node:

 

<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm2</value>
<source>yarn-site.xml</source>
</property>

 

Third RM Node:

 

<property>
<name>yarn.resourcemanager.ha.id</name>
<value>rm3</value>
<source>yarn-site.xml</source>
</property>

Attachments

    Outcomes