AnsweredAssumed Answered

JobTracker fails after a few seconds.

Question asked by tristanls on Apr 9, 2012
Latest reply on Apr 10, 2012 by tristanls
I'm having issues with getting the JobTracker to stay alive.

While trying to get Mahout working, I start of by running

    env JAVA_HOME=$JAVA_HOME HADOOP_CONF_DIR=$HADOOP_CONF_DIR ./build-20news-bayes.sh

However, when I start a JobTracker from the MapR Control System, it starts running. fs.JobTrackerWatcher is able to find it, but it ends up not being able to connect. Around the same time it starts to try connecting, it enters `failed` state and breaks. I'm not sure where to start troubleshooting this, port `9001` is open within the EC2 Security Group so it should be able to connect ok right?

Some logs:

    12/04/09 20:15:30 INFO fs.JobTrackerWatcher: Current running JobTracker is: ip-10-116-223-132.ec2.internal/10.116.223.132:9001
    12/04/09 20:15:31 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 0 time(s).
    12/04/09 20:15:32 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 1 time(s).
    12/04/09 20:15:33 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 2 time(s).
    12/04/09 20:15:34 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 3 time(s).
    12/04/09 20:15:35 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 4 time(s).
    12/04/09 20:15:36 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 5 time(s).
    12/04/09 20:15:37 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 6 time(s).
    12/04/09 20:15:38 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 7 time(s).
    12/04/09 20:15:39 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 8 time(s).
    12/04/09 20:15:40 INFO ipc.Client: Retrying connect to server: ip-10-116-223-132.ec2.internal/10.116.223.132:9001. Already tried 9 time(s).
    12/04/09 20:15:40 INFO ipc.RPC: FailoverProxy: Server on ip-10-116-223-132.ec2.internal/10.116.223.132:9001 is lost due to java.net.SocketException: Call to ip-10-116-223-132.ec2.internal/10.116.223.132:9001 failed on socket exception in call getStagingAreaDir
    12/04/09 20:15:40 INFO ipc.RPC: Searching for the Active Server ...
    12/04/09 20:15:40 INFO ipc.RPC: Attempt# 1 . Trying to connect Server at ip-10-116-223-132.ec2.internal/10.116.223.132:9001

Outcomes