Hello, we are running Ver 5.2 and having numerous Spark jobs fail due to attempted connection to a non-existent node. The IP address in the errors is 0.0.218.212. The errors with this value are visible in the Resource Manager (RM) logs in several places: RM as well as Node manager on the RM node. It also appears in the userlogs of the nodes acting as Application Master for the job when it crashes.
/opt/mapr/hadoop/hadoop-2.7.0/logs/yarn-mapr-resourcemanager-hd19.sec.bnl.local.log:2016-11-03 12:15:49,632 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode: Assigned container container_e10_1478184432441_0018_01_000044 of capacity <memory:7168, vCores:2, disks:0.0> on host 0.0.218.212:43363, which has 2 containers, <memory:14336, vCores:4, disks:0.0> used and <memory:46853, vCores:2, disks:1.5> available after allocation
2016-10-25 17:20:16,959 INFO org.apache.hadoop.yarn.server.nodemanager.security.NMContainerTokenSecretManager: Updating node address : 0.0.218.212:37468
2016-10-25 17:21:52,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager started at /0.0.218.212:37468
2016-10-25 17:21:52,986 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: ContainerManager bound to 0.0.218.212/0.0.218.212:0
maprcli node list shows no such entity. I need to find where it is being picked up so that I can purge it. I even did a grep -r /opt/mapr/ on all my nodes to see if there was a corrupted config file, but only found the string in the logs mentioned above.
Can anyone tell me how I can find the phantom node and get rid of it?