Oozie job log errors "Call From xx.xx.xx.xx to 0.0.0.0:10020 failed on connection exception"

Document created by Hao Zhu Employee on Feb 17, 2016
Version 1Show Document
  • View in full screen mode

Author: Hao Zhu

Original Publication Date: December 30 , 2014

 

Env:

MapR 4.0.1 + Oozie 4.0.1

Symptom:

1. Oozie mapreduce job hung with status "Running" in oozie console.You can also use below command to check the status of the oozie job:

oozie job -info <oozie job id>

2. Oozie log shows below error using command "oozie job -log <oozie job id>":

Caused by: java.net.ConnectException: Call From xx.xx.xx.xx to 0.0.0.0:10020 failed on connection exception: java.net.ConnectException: Connection refused;

Root Cause:

10020 is the default port of JobHistoryServer, so the error message shows Oozie can not find the correct IP address of JobHistoryServer, that is why it is trying to connect "0.0.0.0".

Solution:

1. Find out which server is running JobHistoryServer service.For example:

[root@mapr4-1 ~]# clush -a jps -m|grep -i JobHistoryServer 
mapr4-3: 32295 JobHistoryServer

From above result, we know that server "mapr4-3" is running JobHistoryServer.

 

2. Confirm the port used by job history server.On above host, check mapred-site.xml to get the the port used by JobHistoryServer.For MR1, mapred-site.xml is located at /opt/mapr/hadoop/hadoop-<version>/conf;For MR2/YARN,  mapred-site.xml is located at /opt/mapr/hadoop/hadoop-<version>/etc/hadoop.Find below parameter:

<property>

<name> mapreduce.jobhistory.address</name>

<value>xx.xx.xx.xx:10020</value>

</property>

After getting the port, you can double confirm the port is occupied by JobHistoryServer using below command:

# lsof -i:10020 
COMMAND   PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
java    32295 mapr  212u  IPv4 34350382      0t0  TCP mapr4-3:10020 (LISTEN)

3. On oozie host, add below parameter in /opt/mapr/oozie/oozie-<version>/conf/hadoop-conf/core-site.xml.

<property> 
      <name>mapreduce.jobhistory.address</name>
      <value>hostname:Port</value>
</property>

"hostname" and "Port" are for the JobHistoryServer service.

 

4. Re-run the job and monitor the oozie job to make sure no more such error message again.

Attachments

    Outcomes