AnsweredAssumed Answered

Task Tracker FailingTo Start

Question asked by mufeed on Nov 24, 2013
Latest reply on Dec 2, 2013 by Ted Dunning
I have setup a 5-node cluster on which I have TT running on 3 nodes. My TT continuosly fails on node-4 and node-5 with the following in the logs -

/opt/mapr/logs :

<pre>
2013-11-25 09:27:56,452 ERROR org.apache.hadoop.mapred.TaskTracker: Failed to create and mount local mapreduce volume at /var/mapr/local/r4n1/mapred/. Please see logs at /opt/mapr/logs/createTTVolume.log
2013-11-25 09:27:56,452 ERROR org.apache.hadoop.mapred.TaskTracker: Command ran /opt/mapr/server/createTTVolume.sh r4n1 /var/mapr/local/r4n1/mapred/ /var/mapr/local/r4n1/mapred/taskTracker/
2013-11-25 09:27:56,452 ERROR org.apache.hadoop.mapred.TaskTracker: Command output
2013-11-25 09:27:56,453 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start TaskTracker because org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:322)
at org.apache.hadoop.util.Shell.run(Shell.java:249)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:442)
at org.apache.hadoop.mapred.TaskTracker.createTTVolume(TaskTracker.java:1879)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:961)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:2176)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:5310)

2013-11-25 09:27:56,455 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down TaskTracker at r4n1/10.141.159.101
************************************************************/
</pre>

**/opt/mapr/logs/createTTVolume.5000.log :**

<pre>
2013-11-25 09:27:35 INFO This script was called with the arguments: r4n1 /var/mapr/local/r4n1/mapred/ /var/mapr/local/r4n1/mapred/taskTracker/
2013-11-25 09:27:35 INFO Checking if MapRFS is online
2013-11-25 09:27:35 DEBUG Will launch command "hadoop fs -stat /" with a command attempt timeout of 60 seconds a maximum of 1000 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:35 DEBUG Launching "hadoop fs -stat /"
2013-11-25 09:27:36 DEBUG Command attempt 1 completed successfully in 1 seconds
2013-11-25 09:27:36 DEBUG Command completed successfully after 1 attempts and after 1 seconds
2013-11-25 09:27:36 DEBUG Will launch command "hadoop fs -stat /var/mapr/local/r4n1" with a command attempt timeout of 60 seconds a maximum of 1000 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:36 DEBUG Launching "hadoop fs -stat /var/mapr/local/r4n1"
2013-11-25 09:27:37 DEBUG Command attempt 1 completed successfully in 1 seconds
2013-11-25 09:27:37 DEBUG Command completed successfully after 1 attempts and after 1 seconds
2013-11-25 09:27:37 INFO MapRFS is online. Checking whether MFS on this node is online
2013-11-25 09:27:37 DEBUG Will launch command "/opt/mapr/server/mrconfig -p 5660 info fsstate" with a command attempt timeout of 60 seconds a maximum of 60 attempts and a sleep time of 3 seconds between failed command attempts
2013-11-25 09:27:37 DEBUG Launching "/opt/mapr/server/mrconfig -p 5660 info fsstate"
2013-11-25 09:27:38 DEBUG Command attempt 1 completed successfully in 1 seconds
2013-11-25 09:27:38 DEBUG Command completed successfully after 1 attempts and after 1 seconds
2013-11-25 09:27:38 INFO MFS on this node is online
2013-11-25 09:27:38 INFO Checking for a volume already mounted at the specified mount path
2013-11-25 09:27:38 DEBUG Will launch command "maprcli volume list -filter [p==/var/mapr/local/r4n1/mapred]and[mt==1] -columns volumename,mounted,mountdir" with a command attempt timeout of 60 seconds a maximum of 1 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:38 DEBUG Launching "maprcli volume list -filter [p==/var/mapr/local/r4n1/mapred]and[mt==1] -columns volumename,mounted,mountdir"
2013-11-25 09:27:40 DEBUG Command attempt 1 completed successfully in 2 seconds
2013-11-25 09:27:40 DEBUG Command completed successfully after 1 attempts and after 2 seconds
2013-11-25 09:27:40 INFO The mount path is not currently being used as the primary mount path of any existing volume
2013-11-25 09:27:40 INFO Checking for a pre-existing TaskTracker volume
2013-11-25 09:27:40 DEBUG Will launch command "maprcli volume info -name mapr.r4n1.local.mapred -json" with a command attempt timeout of 60 seconds a maximum of 1 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:40 DEBUG Launching "maprcli volume info -name mapr.r4n1.local.mapred -json"
2013-11-25 09:27:42 DEBUG Command attempt 1 failed with return code 1 after 2 seconds, sleeping for 1 seconds
2013-11-25 09:27:43 DEBUG Command did not complete successfully after 1 attempts and after 3 seconds
2013-11-25 09:27:43 INFO A pre-existing TaskTracker volume could not be found, will try to create one.
2013-11-25 09:27:43 DEBUG Will launch command "hadoop mfs -ls /var/mapr/local/r4n1" with a command attempt timeout of 60 seconds a maximum of 3 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:43 DEBUG Launching "hadoop mfs -ls /var/mapr/local/r4n1"
2013-11-25 09:27:44 DEBUG Command attempt 1 completed successfully in 1 seconds
2013-11-25 09:27:44 DEBUG Command completed successfully after 1 attempts and after 1 seconds
2013-11-25 09:27:44 INFO A new TaskTracker volume will be created.
2013-11-25 09:27:44 DEBUG Will launch command "maprcli volume create -name mapr.r4n1.local.mapred -path /var/mapr/local/r4n1/mapred -replication 1 -localvolumehost r4n1 -localvolumeport 5660 -shufflevolume true" with a command attempt timeout of 60 seconds a maximum of 3 attempts and a sleep time of 1 seconds between failed command attempts
2013-11-25 09:27:44 DEBUG Launching "maprcli volume create -name mapr.r4n1.local.mapred -path /var/mapr/local/r4n1/mapred -replication 1 -localvolumehost r4n1 -localvolumeport 5660 -shufflevolume true"
2013-11-25 09:27:46 DEBUG Command attempt 1 failed with return code 1 after 2 seconds, sleeping for 1 seconds
2013-11-25 09:27:47 DEBUG Launching "maprcli volume create -name mapr.r4n1.local.mapred -path /var/mapr/local/r4n1/mapred -replication 1 -localvolumehost r4n1 -localvolumeport 5660 -shufflevolume true"
2013-11-25 09:27:50 DEBUG Command attempt 2 failed with return code 1 after 3 seconds, sleeping for 1 seconds
2013-11-25 09:27:51 DEBUG Launching "maprcli volume create -name mapr.r4n1.local.mapred -path /var/mapr/local/r4n1/mapred -replication 1 -localvolumehost r4n1 -localvolumeport 5660 -shufflevolume true"
2013-11-25 09:27:55 DEBUG Command attempt 3 failed with return code 1 after 4 seconds, sleeping for 1 seconds
2013-11-25 09:27:56 FATAL Command did not complete successfully after 3 attempts and after 12 seconds.
2013-11-25 09:27:56 INFO The command run was:
2013-11-25 09:27:56 INFO The output of the last failed command attempt:
</pre>

How do I resolve this?

Outcomes