AnsweredAssumed Answered

Multiple clusters defined in mapr-clusters.conf TT/JT fail to start

Question asked by mmercer on Oct 7, 2013
Latest reply on Oct 7, 2013 by nabeel
We had to add our other two clusters to our brand new cluster (which was already running).  I appended the other two clusters info to mapr-clusters.conf, and we were able to begin doing our distcps to start migrating data from one cluster to another.

I did however run into an issue after one machine began having issues (specifically with the jobtracker, complained about the path not being owned by mapr, fixed the perms, jobtracker still failed, issues binding to /0.0.0.0:9100 Address already in use (which is interesting, given that there is no hostname being called, so I would expect its having issues binding to 9100..  9100 is not being used at all, checked lsof, netstat, etc.).  Decided to reboot.

Upon rebooting:
heartbeat is failing (would love to know more about how this actually works in mapr -- will get to this later)

tt fails to start:
<pre>
stat: cannot stat `/var/mapr/local/mapr-test2.quantifind.com': No such file or directory
</pre>

Which is correct, I look at /var/mapr/local and it lists the *other* clusters, but not this one.
I look at /mapr/Mapr-Test/var/mapr/local and it shows the expected results -- why isnt TT trying to use the fully expanded url?

The changes to mapr-clusters.conf seem to be read immediately, so I do not expect I need to bounce any services, or is this inaccurate?

JT is still having the same issue as before, unable to lock /0.0.0.0:9100
<pre>
2013-10-07 17:37:59,754 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to /0.0.0.0:9001 : Address already in use
</pre>
Any insight would be appreciated.

Thanks

Outcomes