AnsweredAssumed Answered

MapR 4.1 YARN: java.net.ConnectException to non-existent hosts

Question asked by dannyman on Jun 23, 2016
Latest reply on Oct 19, 2017 by cathy

An error I can consistently reproduce:

 

Command:

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-dev-examples.jar pi 10000 10000

 

Transient Error:

16/06/23 11:40:37 INFO mapreduce.Job:  map 40% reduce 0%
16/06/23 11:40:37 INFO mapreduce.Job: Task Id : attempt_1466704145366_0005_m_000032_0, Status : FAILED
Container launch failed for container_1466704145366_0005_01_000035 : java.net.ConnectException: Call From c24-02-37/10.10.2.137 to c24-mtv-04-26.prod.qxxxxxxxxd.com:38969 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
    at org.apache.hadoop.ipc.Client.call(Client.java:1417)
    at org.apache.hadoop.ipc.Client.call(Client.java:1366)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
    at com.sun.proxy.$Proxy33.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$Container.launch(ContainerLauncherImpl.java:151)
    at org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl$EventProcessor.run(ContainerLauncherImpl.java:355)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:611)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:701)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:371)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1465)
    at org.apache.hadoop.ipc.Client.call(Client.java:1384)
    ... 9 more

16/06/23 11:40:38 INFO mapreduce.Job:  map 41% reduce 0%

 

The errors are always from any given node to c24-mtv-04-26. I have tried the following:

  • Disable NodeManager on c24-mtv-04-26
  • Restart Wardens across the cluster
  • Stop warden on c24-mtv-04-26
  • Remove c24-mtv-04-26 entirely from cluster
  • Restart ResourceManager

 

Still these connection failures to a host which is no longer in the cluster persist.

 

4.1.0.31175.GA-38212

 

Zero problems running same job in classic mode.

Outcomes