AnsweredAssumed Answered

NFS server stopped working on cluster (M3, 4.0.2)

Question asked by minnow-noir on May 31, 2015
Latest reply on Jun 23, 2015 by minnow-noir
MAPR NFS stopped working on our M3 system within the last day or two.  It always worked fine before that.  I feel like it may have something to do with permissions under /opt/mapr because one of our guys said he had to change permissions in order to get rid of some sort of error about Spark not being able to write logs.  Other than that, no one has made any changes to the system in a while.

I noticed yesterday that even though I could still access files through Spark jobs, I couldn't access them via Linux commands at /mapr/cluster-name/user/ec2-user/the-folder-in-question

I've Googled this and searched the MAPR sites, but mostly just found a number of unanswered questions from people with similar problems.  I'm including the types of output those users were asked to provide.

Manually restarting warden or mapr-nfssserver does not help.

Please advice.


sudo tail -f /opt/mapr/logs/nfsserver.log

    INFO nfsserver[26491] fs/nfsd/main.cc:532 ***** NFS server starting: pid=26491, mapr-version: 4.0.2.29870.GA *****
    ]2015-05-31 10:16:55,6183 INFO nfsserver[26491] fs/nfsd/main.cc:546 ******* NFS server MAPR_HOME=/opt/mapr, NFS_PORT=2049, NFS_MGMT_PORT=9998, NFSMON_PORT=9997
    2015-05-31 10:16:55,6227 INFO nfsserver[26491] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100003 3 6 2049, ret 25600
    2015-05-31 10:16:55,6237 INFO nfsserver[26491] fs/nfsd/nfsserver.cc:970 0.0.0.0[0] Use32BitFileId is 1
    2015-05-31 10:16:55,6239 INFO nfsserver[26491] fs/nfsd/nfsserver.cc:983 0.0.0.0[0] AutoRefreshExportsTimeInterval is 0
    2015-05-31 10:16:55,6239 ERROR nfsserver[26491] fs/nfsd/main.cc:67 0.0.0.0[0] Error registering NFS program

sudo cat /etc/fstab

    #
    # /etc/fstab
    # Created by anaconda on Wed Aug 15 23:28:13 2012
    #
    # Accessible filesystems, by reference, are maintained under '/dev/disk'
    # See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
    #
    LABEL=rootfs         /                       ext4    defaults        1 1
    LABEL=swap      swap                    swap    defaults        0 0
    tmpfs                   /dev/shm                tmpfs   defaults        0 0
    devpts                  /dev/pts                devpts  gid=5,mode=620  0 0
    sysfs                   /sys                    sysfs   defaults        0 0
    proc                    /proc                   proc    defaults        0 0

service nfs status

    rpc.svcgssd is stopped
    rpc.mountd is stopped
    nfsd is stopped
    rpc.rquotad is stopped

sudo tail /opt/mapr/logs/maprexecute.log

    2015-05-31 10:34:13:INFO:5752: maprexecute renice by uid 2000 gid 2000
    2015-05-31 10:34:13:INFO:5825: maprexecute adjustoom by uid 2000 gid 2000
    Cmd Line: /opt/mapr/server/maprexecute adjustoom -1000 4190
    2015-05-31 10:34:13:INFO:5825: oomPath /proc/4190/oom_score_adj oomValue -1000
    2015-05-31 10:34:13:INFO:5826: maprexecute renice by uid 2000 gid 2000
    2015-05-31 10:34:13:INFO:5827: maprexecute adjustoom by uid 2000 gid 2000
    Cmd Line: /opt/mapr/server/maprexecute adjustoom -1000 5158
    2015-05-31 10:34:13:INFO:5827: oomPath /proc/5158/oom_score_adj oomValue -1000
    2015-05-31 10:34:13:INFO:5828: maprexecute renice by uid 2000 gid 2000
    2015-05-31 10:34:24:INFO:7580: maprexecute renice by uid 2000 gid 2000


ls -als /opt/mapr

    total 228
      4 drwxrwxr-x 25 root mapr   4096 May 31 10:33 .
      4 drwxr-xr-x  7 mapr mapr   4096 May 29 17:06 ..
      4 drwxrwxr-x  3 root mapr   4096 Apr 24 14:25 adminuiapp
      4 drwxrwxr-x  2 root mapr   4096 May 11 17:30 bin
      4 drwxrwxr-x  2 root mapr   4096 Apr 24 14:25 checkservicescripts
      4 drwxrwxr-x  5 mapr mapr   4096 May 31 10:13 conf
      4 drwxrwxr-x  2 root mapr   4096 Apr 24 14:25 conf.new
      4 drwxrwxr-x  2 root mapr   4096 Jan 19 15:17 contrib
      4 drwxrwxr-x  4 root mapr   4096 May 25 19:43 drill
      4 drwxrwxr-x  5 root mapr   4096 Jan 19 15:23 hadoop
      4 drwxrwxr-x  3 root mapr   4096 May 11 17:30 hbase
      4 -r--rw-r--  1 root mapr     17 Apr 24 14:25 hostid
      4 -rw-rw-r--  1 mapr mapr     29 May 31 10:33 hostname
      4 drwxrwxr-x  2 root mapr   4096 Apr 24 14:25 initscripts
      4 drwxrwxr-x  2 root mapr   4096 Apr 24 14:26 lib
      4 drwxrwxr-x  2 root mapr   4096 Apr 24 14:25 libexp
      4 drwxrwxrwt  2 mapr mapr   4096 May 31 10:34 logs
      4 -rw-rw-r--  1 root mapr     15 Jan 19 15:17 MapRBuildVersion
    116 -rw-rw-r--  1 root mapr 118485 Jan 19 15:17 NOTICE.txt
      4 drwxrwxrwx  2 root mapr   4096 May 31 10:35 pid
      4 drwxr-xr-x  2 root root   4096 May 31 10:13 roles
      4 drwxrwxr-x  8 root mapr   4096 Apr 24 14:25 server
      4 drwxr-xr-x  2 root root   4096 May 31 10:13 servicesconf
      4 drwxrwxr-x  3 root mapr   4096 May 29 15:59 spark
      4 drwxrwxr-x  4 root mapr   4096 Apr 24 14:25 support
      4 drwxrwxr-x  2 root mapr   4096 Jan 19 15:35 themes
      4 drwxrwxr-x  3 root mapr   4096 Apr 24 14:25 webapps
      4 drwxrwx---  3 mapr mapr   4096 Apr 24 14:26 zkdata
      4 drwxrwxr-x  3 mapr mapr   4096 Apr 24 14:26 zookeeper

ls -als /opt/mapr/server/maprexecute

    132 -rwsr-x--- 1 root mapr 133256 Jan 19 15:20 /opt/mapr/server/maprexecute






Outcomes