AnsweredAssumed Answered

Job History server is failing to start with stale file handle

Question asked by bgajjela on Dec 7, 2016
Latest reply on Dec 9, 2016 by maprcommunity

 Hi,

 

Job history server is failing to start. This happened after a reboot on the node where job history server is running.

 

 

Below is the snippet from history-server.out

ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:2348 Thread: 11879 readdirplus failed for dir /var/mapr/cluster/yarn/rm/staging/history/done/2016/11/29/00000, error = Stale File handle(116)

 

Below is the snippet from history-server.log

 

2016-12-07 00:25:09,684 WARN org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils: Unable to parse start time from job history file job_1479362581504_0374-1480913383375-bgajjel-ORC_Query29.hql-1480913390013-0-0-FAILED-root.adhoc.standard--1.jhist : java.lang.NumberFormatException: For input string: ""
2016-12-07 00:25:09,724 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistory failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.fs.AbstractFileSystem$1.hasNext(AbstractFileSystem.java:893)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:748)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:760)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.addDirectoryToJobListCache(HistoryFileManager.java:724)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.initExisting(HistoryFileManager.java:679)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:154)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:232)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:242)
2016-12-07 00:25:09,734 INFO org.apache.hadoop.mapreduce.v2.hs.JobHistory: Stopping JobHistory
2016-12-07 00:25:09,739 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer failed in state INITED; cause: java.lang.NullPointerException
java.lang.NullPointerException
        at org.apache.hadoop.fs.AbstractFileSystem$1.hasNext(AbstractFileSystem.java:893)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectory(HistoryFileManager.java:748)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.scanDirectoryForHistoryFiles(HistoryFileManager.java:760)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.addDirectoryToJobListCache(HistoryFileManager.java:724)
        at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.initExisting(HistoryFileManager.java:679)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:96)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.serviceInit(JobHistoryServer.java:154)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.launchJobHistoryServer(JobHistoryServer.java:232)
        at org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer.main(JobHistoryServer.java:242)
2016-12-07 00:25:09,741 FATAL org.apache.hadoop.mapreduce.v2.hs.JobHistoryServer: Error starting JobHistoryServer

 

 

 

1)  i tried by restarting warden it doesn't help

2) After looking at the logs to found out that there is stale file handle. so when I tried to remove it doesn’t work since there is a dependency tied to it.

 

$hadoop fs -rmr /var/mapr/cluster/yarn/rm/staging/history/done/2016/11/29/

rmr: DEPRECATED: Please use 'rm -r' instead.

16/12/07 13:10:39 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum

16/12/07 13:10:39 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.

2016-12-07 13:10:39,0914 ERROR Client fs/client/fileclient/cc/client.cc:2451 Thread: 6921 Rmdirs failed for dir 000000,Readdirplus rpc error Stale File handle(116) fid 2126.2628.3307876

2016-12-07 13:10:39,0915 ERROR Client fs/client/fileclient/cc/client.cc:2472 Thread: 6921 Rmdirs failed for dir/file 000000, rpc error 116 fid 2126.310.3021476

2016-12-07 13:10:39,0916 ERROR JniCommon fs/client/fileclient/cc/jni_MapRClient.cc:951 Thread: 6921 remove: File /var/mapr/cluster/yarn/rm/staging/history/done/2016/11/29, rpc error, Stale File handle(116)

16/12/07 13:10:39 ERROR fs.MapRFileSystem: Failed to delete path maprfs:///var/mapr/cluster/yarn/rm/staging/history/done/2016/11/29, error: Stale file handle (116)

rmr: `/var/mapr/cluster/yarn/rm/staging/history/done/2016/11/29': Input/output error

 

 

any other ideas is appreciated.

 

 

Thanks,

Bharath

Outcomes