AnsweredAssumed Answered

nodemanager crash when local dir mounted over nfs  / how to have enough scratch space for spark without sacrificing physical disks?

Question asked by dafox on Mar 4, 2016
Latest reply on Jun 7, 2016 by Hao Zhu

We are attempting to have `yarn.nodemanager.local-dirs` mounted over NFS via the nfsloopback service. This is an attempt to establish scratch space for spark on yarn. The mount is done like so:

 

127.0.0.1:/mapr/clustername/var/mapr/local/nodename/scratch/nm-spark-scratch    nfs    hard,nolock

 

The problem is, that with this setup, yarn refuses to run any job, with application crashing with the following error:      

 

Application application_1456699782045_0524 failed 2 times due to AM Container for appattempt_1456699782045_0524_000002 exited with exitCode: -1000
    For more detailed output, check application tracking page:http://resourcemanager:8088/cluster/app/application_1456699782045_0524Then, click on links to logs of each attempt.
    Diagnostics: Application application_1456699782045_0524 initialization failed (exitCode=255) with output: main : command provided 0
    main : user is username
    main : requested yarn user is username
    Failed to read file /nm-spark-scratch/hadoop-mapr/nm-local-dir/nmPrivate/container_e266_1456699782045_0524_02_000001.tokens - Input/output error
    Failing this attempt. Failing the application.

 

 

Then, we attempted to look at this file with ordinary command line tools and got the following error:

 

# cat /nm-spark-scratch/hadoop-mapr/nm-local-dir/nmPrivate/container_e266_1456699782045_0526_01_000001.tokens
    cat: /nm-spark-scratch/hadoop-mapr/nm-local-dir/nmPrivate/container_e266_1456699782045_0526_01_000001.tokens: Input/output error

 

So,

  1. any idea why this file gives I/O error? any other files over that mount work fine.
  2. any other suggestions how to have enough scratch space for spark without sacrificing physical disks?

Outcomes