How to troubleshoot and resolve issues starting MapR's NFS server

Document created by jbubier Employee on Feb 7, 2016
Version 1Show Document
  • View in full screen mode

Author: Jonathan Bubier

 

Original Publication Date: November 12, 2014

MapR provides the capability to access MapR-FS using a standard NFS client using its NFS server.  NFS clients across platforms can mount MapR's NFS server and access MapR-FS using standard Linux commands like a traditional filesystem.  During a startup of warden or an attempt to manage an NFS server it may be seen that the NFS server does not initialize properly.  This can be indicated by alarms as well as failed service status in the MapR Control System (MCS) and an inability of NFS clients to mount the NFS server.  There are a number of reasons that can cause this and the first place to determine the root cause is the NFS server log - /opt/mapr/logs/nfsserver.log - on the host where the NFS server failed to start.  Below are some typical issues that can occur and there may be other issues not covered here. 

 


1. No license to start NFS server
After reviewing nfsserver.log on the host a message similar to the one below "No license to run NFS server in servermode" may be seen.  Ex:


2014-11-12 10:34:45,9602 INFO nfsserver[1821] fs/nfsd/main.cc:535 ***** NFS server starting: pid=1821, mapr-version: 4.0.1.27334.GA *****
2014-11-12 10:34:45,9603 INFO nfsserver[1821] fs/nfsd/main.cc:549 ******* NFS server MAPR_HOME=/opt/mapr, NFS_PORT=2049, NFS_MGMT_PORT=9998, NFSMON_PORT=9997
2014-11-12 10:34:45,9735 INFO nfsserver[1821] fs/nfsd/mount.cc:2147 Export info: /mapr (rw)
2014-11-12 10:34:45,9798 INFO nfsserver[1821] fs/nfsd/mount.cc:1781 CLDB info: host1:7222 host2:7222
2014-11-12 10:34:46,3558 INFO nfsserver[1821] fs/nfsd/nfsha.cc:476 hostname: host1.domain.prv, hostid: 0xa81a38e0d7b6068
2014-11-12 10:34:46,3623 INFO nfsserver[1821] fs/nfsd/requesthandle.cc:468 found NFS_HEAPSIZE env var: 236
2014-11-12 10:34:46,1152 INFO nfsserver[1821] fs/nfsd/main.cc:643 NFS server started ... pid=1821, uid=2147483632
2014-11-12 10:34:45,9720 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100003 3 6 2049, ret 0
2014-11-12 10:34:45,9733 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:971 0.0.0.0[0] Use32BitFileId is 1
2014-11-12 10:34:45,9734 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:984 0.0.0.0[0] AutoRefreshExportsTimeInterval is 0
2014-11-12 10:34:45,9735 INFO nfsserver[1821] fs/nfsd/mount.cc:2177 0.0.0.0[0] Allocating export entry 2651400
2014-11-12 10:34:45,9808 INFO nfsserver[1821] fs/nfsd/mount.cc:1858 0.0.0.0[0] Allocating export entry 26513b0
2014-11-12 10:34:45,9950 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100005 3 6 2049, ret 0
2014-11-12 10:34:46,0061 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100005 1 6 2049, ret 0
2014-11-12 10:34:46,0189 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100005 3 17 2049, ret 0
2014-11-12 10:34:46,0341 INFO nfsserver[1821] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100005 1 17 2049, ret 0
2014-11-12 10:34:46,0353 INFO nfsserver[1821] fs/nfsd/mount.cc:1191 0.0.0.0[0] Setting slash-mapr-clusterid clustername my.cluster.com, id 1012313856
2014-11-12 10:34:46,0519 INFO nfsserver[1821] fs/nfsd/requesthandle.cc:335 0.0.0.0[0] using /etc/mtab to check ramfs mount
2014-11-12 10:34:46,1160 ERROR nfsserver[1821] fs/nfsd/nfsha.cc:847 0.0.0.0[0] Error registering with CLDB: Read-only file system, err=0, status=30 cldb=host1:7222
2014-11-12 10:34:51,1172 INFO nfsserver[1821] fs/nfsd/nfsha.cc:476 hostname: host1.domain.prv, hostid: 0xa81a38e0d7b6068
2014-11-12 10:34:51,1183 INFO nfsserver[1821] fs/nfsd/nfsha.cc:957 exiting: No license to run NFS server in servermode

This indicates that the NFS server attempted to register with the master CLDB and the CLDB responded it has no free licenses to run an NFS server.  If an M3 license is installed only one active NFS server is possible so this error can be seen if a second NFS server is installed and attempts to start.  If an M5 or M7 license is installed it is possible that more NFS servers have registered than are provided by the installed license.  Review your installed licenses in the MCS using the 'Manage Licenses' link in the upper right corner or use the maprcli license list command. 

 

 

2. Linux server already running
After reviewing nfsserver.log on the host a message similar to the one below "Error (Success) while registering MOUNT_PROGRAM vers=3 with portmapper"

2014-11-12 10:37:24,1103 INFO nfsserver[5269] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100005 3 6 2049, ret 256
2014-11-12 10:37:24,1103 ERROR nfsserver[5269] fs/nfsd/mount.cc:2552 0.0.0.0[0] Error (Success) while registering MOUNT_PROGRAM vers=3 with portmapper
2014-11-12 10:37:24,1213 INFO nfsserver[5269] fs/nfsd/nfsserver.cc:945 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset unset 100003 3, ret 0
2014-11-12 10:37:24,1213 ERROR nfsserver[5269] fs/nfsd/main.cc:83 0.0.0.0[0] Error registering mount program

 

This indicates that the NFS server attempted to register with portmapper and was unable to do so.  This typically indicates that another program is using the port designed for the MapR NFS server (TCP 2049) and there is a conflict.  Check if any service is currently listening on TCP port 2049 using the netstat comand.  The below indicates that the NFS kernel server is currently running and listening on port 2049. 

 


# netstat -anp | grep 2049
tcp        0      0 0.0.0.0:2049                0.0.0.0:*                   LISTEN      -
tcp        0      0 :::2049                     :::*                        LISTEN      -

If the Linux kernel NFS server is currently running on the same port as MapR stop it using the service command, i.e. service nfsd stop.  Verify that no service is listening on the port and attempt to start MapR's NFS server again.

 

 


3. Incorrect permissions on nfsserver binary

 

In the event that no log messages are present in nfsserver.log it may indicate an underlying problem with warden as it attempts to start the NFS server.  Review /opt/mapr/logs/warden.log on the host for possible further information.  Ex:

 


2014-11-12 10:17:06,745 INFO  com.mapr.warden.service.baseservice.Service$ServiceRun [nfs_monitor]: Command: [/opt/mapr/initscripts/mapr-nfsserver, start], Directory: /opt/mapr/initscripts
2014-11-12 10:17:06,767 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [nfs_monitor]: Error while running command: [/opt/mapr/initscripts/mapr-nfsserver, start]
2014-11-12 10:17:06,767 ERROR com.mapr.warden.service.baseservice.Service$ServiceRun run [nfs_monitor]: /opt/mapr/server/nfsserver not found. exiting

 

These error messages indicate that warden attempted to start the NFS server using the initscript /opt/mapr/initscripts/mapr-nfsserver and this initscript calls the binary /opt/mapr/server/nfsserver.  However this binary was not found or is not executable.  If messages similar to these are seen in warden.log verify that /opt/mapr/server/nfsserver is present and has executable permissions for the 'mapr' user.  Ex:

# ls -l /opt/mapr/server/nfsserver
-rwxr-xr-x 1 root root 52817988 Sep  5 12:09 /opt/mapr/server/nfsserver

 

 

4. Incorrect permissions on maprexecute binary

 

After reviewing the NFS server log there may be insufficient detail for the failure, only an error message similar to the "Error registering NFS program" as seen below.

 


2014-11-12 10:02:02,2267 INFO nfsserver[30630] fs/nfsd/main.cc:535 ***** NFS server starting: pid=30630, mapr-version: 4.0.1.27334.GA *****
2014-11-12 10:02:02,2268 INFO nfsserver[30630] fs/nfsd/main.cc:549 ******* NFS server MAPR_HOME=/opt/mapr, NFS_PORT=2049, NFS_MGMT_PORT=9998, NFSMON_PORT=9997
2014-11-12 10:02:02,2351 INFO nfsserver[30630] fs/nfsd/nfsserver.cc:927 0.0.0.0[0] running the cmd /opt/mapr/server/maprexecute pmapset set 100003 3 6 2049, ret 25856
2014-11-12 10:02:02,2364 INFO nfsserver[30630] fs/nfsd/nfsserver.cc:971 0.0.0.0[0] Use32BitFileId is 1
2014-11-12 10:02:02,2366 INFO nfsserver[30630] fs/nfsd/nfsserver.cc:984 0.0.0.0[0] AutoRefreshExportsTimeInterval is 0
2014-11-12 10:02:02,2366 ERROR nfsserver[30630] fs/nfsd/main.cc:66 0.0.0.0[0] Error registering NFS program

 

This typically indicates that there was an error when executing the /opt/mapr/server/maprexecute binary and the NFS server could not be started.  Review the maprexecute log -/opt/mapr/logs/maprexecute.log for possible causes of this error.  One such error is below:

 


2014-11-12 10:01:51:INFO:30450: maprexecute pmapset by uid 2147483632 gid 2147483632
Cmd Line: /opt/mapr/server/maprexecute pmapset unset 100005 3
2014-11-12 10:01:51:ERROR:30450: maprexecute binary should not have write or execute permissions for others.

 

This error message indicates that the maprexecute binary does not have the correct permissions and cannot be used safely.  Verify the current permissions on the binary, the correct permissions set is all permissions for root, read and execute for the 'mapr' group and the setuid bit set.  Ex:

 


# ls -l /opt/mapr/server/maprexecute
-rwsr-x--- 1 root mapr 133272 Sep  5 12:11 /opt/mapr/server/maprexecute

If the permissions on the maprexecute binary do not match the above it can be easily resolved by running the /opt/mapr/server/upgrade2maprexecute script to reset the permissions.  Once the permissions are correct restart the NFS server. 

 

 


5. Incorrect permissions on pmapset binary

 

Another example of a possible error in the maprexecute log is the error message seen below.

2014-11-12 10:09:03:INFO:6742: maprexecute pmapset by uid 2147483632 gid 2147483632
Cmd Line: /opt/mapr/server/maprexecute pmapset set 100003 3 6 2049
2014-11-12 10:09:03:ERROR:6742: /opt/mapr/server/pmapset binary should have exec permissions for root
2014-11-12 10:09:03:ERROR:6742: incorrect permissions on target binary /opt/mapr/server/pmapset


This error message indicates that the binary /opt/mapr/server/pmapset does not have the correct permissions - executable for root in this case.  Verify the current permissions on the binary and once they have been corrected restart the NFS server.

 

 


6. Incorrect permissions on daemon.conf

 

Another example of a possible error in the maprexecute log is the error message seen below.

2014-11-12 10:04:14:INFO:911: maprexecute pmapset by uid 2147483632 gid 2147483632
Cmd Line: /opt/mapr/server/maprexecute pmapset unset 100005 3
2014-11-12 10:04:14:ERROR:911: Group/others have write permission for dameon.conf /opt/mapr/conf/daemon.conf. mode 0100777
2014-11-12 10:04:14:ERROR:911: user 2147483632 2147483632 not allowed to run maprexecute


This error message indicates that the permissions on /opt/mapr/conf/daemon.conf are set incorrectly and could present a security issue.  This file specifies the MapR service user and group and the group can gain elevated privileges using the maprexecute binary.  As a result this file should be modifiable only by 'root'.  The expected permissions are read and write for root and read for all other users.  Ex:

 


# ls -l /opt/mapr/conf/daemon.conf
-rw-r--r-- 1 root root 74 Nov  3 15:27 /opt/mapr/conf/daemon.conf

Once the permissions on this file are correct restart the NFS server. 

 

 


7. Read-only file system errors

 

When reviewing the NFS server log the error message similar to the one seen below may be seen.

2014-11-12 10:09:58,3493 INFO nfsserver[7856] fs/nfsd/main.cc:535 ***** NFS server starting: pid=7856, mapr-version: 4.0.1.27334.GA *****
2014-11-12 10:09:58,3494 INFO nfsserver[7856] fs/nfsd/main.cc:549 ******* NFS server MAPR_HOME=/opt/mapr, NFS_PORT=2049, NFS_MGMT_PORT=9998, NFSMON_PORT=9997
2014-11-12 10:09:58,3635 INFO nfsserver[7856] fs/nfsd/mount.cc:2147 Export info: /mapr (rw)
2014-11-12 10:09:58,3649 INFO nfsserver[7856] fs/nfsd/mount.cc:1781 CLDB info: host1:7222 host2:7222
2014-11-12 10:09:58,4245 INFO nfsserver[7856] fs/nfsd/nfsha.cc:476 hostname: host1.domain.prv, hostid: 0xa81a38e0d7b6068
2014-11-12 10:09:58,4253 INFO nfsserver[7856] fs/nfsd/requesthandle.cc:468 found NFS_HEAPSIZE env var: 236
...
2014-11-12 10:09:58,4243 INFO nfsserver[7856] fs/nfsd/mount.cc:1191 0.0.0.0[0] Setting slash-mapr-clusterid clustername my.cluster.com, id 1012313856
2014-11-12 10:09:58,4446 INFO nfsserver[7856] fs/nfsd/requesthandle.cc:335 0.0.0.0[0] using /etc/mtab to check ramfs mount
2014-11-12 10:09:58,5068 ERROR nfsserver[7856] fs/nfsd/nfsha.cc:847 0.0.0.0[0] Error registering with CLDB: Read-only file system, err=0, status=30 cldb=host1:7222
2014-11-12 10:10:03,5080 INFO nfsserver[7856] fs/nfsd/nfsha.cc:476 hostname: host1.domain.prv, hostid: 0xa81a38e0d7b6068

Note that by itself this error message does not indicate a problem and does not indicate that MapR-FS is read-only.  It indicates that the NFS server attempted to register with the CLDB specified by 'cldb=host1:7222' and that CLDB responded that it is read-only and cannot accept the registration request.  This is expected behavior when the CLDB is a slave and not the current master.  Note that no message is shown when the NFS server successfully registers with the master CLDB.  In the above it can be seen there are two CLDBs in the cluster - host1 and host2 and the NFS server successfully registers with host2 after getting a read-only error from host1. 

 

If the 'Read-only filesystem' or similar error message is repeatedly seen for all CLDBs in the cluster the CLDB log (/opt/mapr/logs/cldb.log) on the master should be reviewed to see why the registration request is rejected.  If the error message is seen only from the slave CLDB hosts then the error message can be safely ignored.
1 person found this helpful

Attachments

    Outcomes