AnsweredAssumed Answered

Job failure due to Stale file handle (116)

Question asked by vinod_singh on Feb 6, 2012
Latest reply on Feb 16, 2012 by vinod_singh
On several occasions I have noticed the jobs being failed due to file system error e.g. Stale File handle. The job thread dumps contain something like below-

<pre>
ERROR Client fs/client/fileclient/cc/client.cc:1515 Thread: 140316168423168 AllocateFid failed, File output.00242, error Stale File handle(116), primaryFid 2112.1398320.10590244
ERROR Client fs/client/fileclient/cc/writebuf.cc:229 Thread: 140316168423168 FlushWrite failed: File output.00242, error: Stale File handle(116), pfid 2112.1398320.10590244, off 2162688 6449.3374.201984
ERROR Client fs/client/fileclient/cc/client.cc:1515 Thread: 140317051700992 AllocateFid failed, File output.00242, error Stale File handle(116), primaryFid 2112.1398320.10590244</pre>
or
<pre>ERROR Client fs/client/fileclient/cc/client.cc:489 Thread: 140717038753536 Open failed for file /var/mapr/local/node/mapred/taskTracker/spill/, LookupFid error No such file or directory(2)</pre>

What could be the reason for such failures and how to resolve them?

Outcomes