AnsweredAssumed Answered

Java bind exception for node manage on port 8040

Question asked by reza on Apr 10, 2016
Latest reply on Apr 20, 2016 by maprcommunity
Branched from an earlier discussion

The problem discussed here is similar to the one on the thread titled NodeManager Localizer RPC cannot bind to port 8040 however the solution for that post did not work here.

 

I am having problems with another cluster that I have setup. The setup for this cluster is very similar to the ones in the previous post.   Following are the alarms raised in the web control which could be related:

 

NodeManager Down Alarm15m 3.5s agoCan not determine if service: nodemanager is running. Check logs at: /opt/mapr/hadoop/hadoop-2.4.1/logs/[X]
JobHistoryServer Down Alarm12m 6s agoCan not determine if service: historyserver is running. Check logs at: /opt/mapr/hadoop/hadoop-2.4.1/logs

 

The output from the log files are as follows:

# tail -50 yarn-mapr-nodemanager-VMMapR02.log
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:742)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
        ... 13 more
2016-04-11 10:17:34,677 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2016-04-11 10:17:34,678 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2016-04-11 10:17:34,678 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2016-04-11 10:17:34,678 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:139)
        at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
        at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.createServer(ResourceLocalizationService.java:284)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.serviceStart(ResourceLocalizationService.java:264)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceStart(ContainerManagerImpl.java:300)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:197)
        at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:358)
        at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:404)
Caused by: java.net.BindException: Problem binding to [0.0.0.0:8040] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:719)
        at org.apache.hadoop.ipc.Server.bind(Server.java:421)
        at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:563)
        at org.apache.hadoop.ipc.Server.<init>(Server.java:2153)
        at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:897)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server.<init>(ProtobufRpcEngine.java:505)
        at org.apache.hadoop.ipc.ProtobufRpcEngine.getServer(ProtobufRpcEngine.java:480)
        at org.apache.hadoop.ipc.RPC$Builder.build(RPC.java:742)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.createServer(RpcServerFactoryPBImpl.java:169)
        at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:132)
        ... 13 more
2016-04-11 10:17:34,783 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at VMMapR02/172.27.127.206
************************************************************/
2016-04-11 10:17:39,592 WARN org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Node is out of sync with ResourceManager, hence resyncing.
2016-04-11 10:17:39,593 WARN org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Message from ResourceManager: Node not found resyncing VMMapR02.intechiq.com:54579
2016-04-11 10:17:39,608 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Notifying ContainerManager to block new container-requests
2016-04-11 10:17:39,608 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: Cleaning up running containers on resync
2016-04-11 10:17:40,593 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using finished containers :[]
2016-04-11 10:17:40,737 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Rolling master-key for container-tokens, got key with id -1046118172
2016-04-11 10:17:40,739 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: Rolling master-key for nm-tokens, got key with id :2062168046
2016-04-11 10:17:40,742 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registered with ResourceManager as VMMapR02.intechiq.com:54579 with total resource of <memory:5120, vCores:2, disks:1.33>
2016-04-11 10:17:40,742 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Notifying ContainerManager to unblock new container-requests
2016-04-11 10:17:40,750 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: NodeStatusUpdater thread is reRegistered and restarted
# tail -50 yarn-mapr-resourcemanager-VMMapR02.log
2016-04-11 10:17:25,589 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Rolling master-key for amrm-tokens
2016-04-11 10:17:25,591 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Rolling master-key for nm-tokens
2016-04-11 10:17:25,591 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-04-11 10:17:25,592 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 1
2016-04-11 10:17:25,604 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s)
2016-04-11 10:17:25,604 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2016-04-11 10:17:25,604 INFO org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager: storing master key with keyID 2
2016-04-11 10:17:25,809 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-04-11 10:17:25,864 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8031
2016-04-11 10:17:25,894 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
2016-04-11 10:17:25,894 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-04-11 10:17:25,896 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8031: starting
2016-04-11 10:17:26,083 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-04-11 10:17:26,113 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8030
2016-04-11 10:17:26,138 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
2016-04-11 10:17:26,146 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-04-11 10:17:26,146 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8030: starting
2016-04-11 10:17:26,278 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-04-11 10:17:26,285 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8032
2016-04-11 10:17:26,293 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.api.ApplicationClientProtocolPB to the server
2016-04-11 10:17:26,294 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-04-11 10:17:26,295 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8032: starting
2016-04-11 10:17:26,440 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to active state
2016-04-11 10:17:26,895 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2016-04-11 10:17:26,920 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.resourcemanager is not defined
2016-04-11 10:17:26,966 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
2016-04-11 10:17:26,969 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context cluster
2016-04-11 10:17:26,969 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static
2016-04-11 10:17:26,970 INFO org.apache.hadoop.http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs
2016-04-11 10:17:26,985 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /cluster/*
2016-04-11 10:17:26,985 INFO org.apache.hadoop.http.HttpServer2: adding path spec: /ws/*
2016-04-11 10:17:27,038 INFO org.apache.hadoop.http.HttpServer2: Jetty bound to port 8088
2016-04-11 10:17:27,047 INFO org.mortbay.log: jetty-6.1.26
2016-04-11 10:17:27,122 INFO org.mortbay.log: Extract jar:file:/opt/mapr/hadoop/hadoop-2.4.1/share/hadoop/yarn/hadoop-yarn-common-2.4.1-mapr-1408.jar!/webapps/cluster to /tmp/Jetty_VMMapR02_8088_cluster____unkmfx/webapp
2016-04-11 10:17:28,534 INFO org.mortbay.log: Started SelectChannelConnector@VMMapR02:8088
2016-04-11 10:17:28,545 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /cluster started at 8088
2016-04-11 10:17:29,405 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2016-04-11 10:17:29,456 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue class java.util.concurrent.LinkedBlockingQueue
2016-04-11 10:17:29,466 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033
2016-04-11 10:17:29,483 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server
2016-04-11 10:17:29,490 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2016-04-11 10:17:29,498 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2016-04-11 10:17:39,589 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Node not found resyncing VMMapR02.intechiq.com:54579
2016-04-11 10:17:40,663 INFO org.apache.hadoop.yarn.util.RackResolver: Resolved VMMapR02.intechiq.com to /default-rack
2016-04-11 10:17:40,720 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: NodeManager from node VMMapR02.intechiq.com(cmPort: 54579 httpPort: 8042) registered with capability: <memory:5120, vCores:2, disks:1.33>, assigned nodeId VMMapR02.intechiq.com:54579
2016-04-11 10:17:40,723 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: VMMapR02.intechiq.com:54579 Node Transitioned from NEW to RUNNING
2016-04-11 10:17:40,734 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added node VMMapR02.intechiq.com:54579 cluster capacity: <memory:5120, vCores:2, disks:1.33>
2016-04-11 10:17:40,772 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: Node VMMapR02.intechiq.com:54579 reported UNHEALTHY with details: 1/1 local-dirs turned bad: /tmp/hadoop-mapr/nm-local-dir;
2016-04-11 10:17:40,774 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: VMMapR02.intechiq.com:54579 Node Transitioned from RUNNING to UNHEALTHY
2016-04-11 10:17:40,774 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Removed node VMMapR02.intechiq.com:54579 cluster capacity: <memory:0, vCores:0, disks:0.0>

 

Following are the outputs from the commands requested previously:

ls -ltr /tmp | grep pid

-rw-r--r--. 1   2000 mapr          5 Apr  5 10:46 yarn-mapr-nodemanager.pid

-rw-r--r--. 1   2000 mapr          5 Apr  5 10:46 yarn-mapr-resourcemanager.pid

-rw-r--r--. 1   2000 mapr          5 Apr  5 10:46 mapred-mapr-historyserver.pid

 

# ls -ltr /opt/mapr/pid | grep yarn

-rw-r--r--. 1 2000 mapr   5 Apr  5 10:46 yarn-mapr-nodemanager.pid

-rw-r--r--. 1 mapr shadow 6 Apr 11 10:16 yarn-mapr-resourcemanager.pid

 

# egrep 'YARN_PID_DIR | HADOOP_MAPRED_PID_DIR' /opt/mapr/conf/env.sh

export HADOOP_MAPRED_PID_DIR="${MAPR_HOME}/pid"

 

# jps

22593 CommandServer

29187 JobHistoryServer

21379 CLDB

20721 WardenMain

20494 QuorumPeerMain

22893 ResourceManager

19991 NodeManager

11117 Jps

I have tried restarting the warden and zookeeper and further killing the process on port 8040 as of previous posts (NodeManager Localizer RPC cannot bind to port 8040 ).

Outcomes