AnsweredAssumed Answered

nfs not functioning properly

Question asked by mshirley on Apr 19, 2013
Latest reply on May 13, 2013 by mshirley
We are having a problem getting an NFS mount to work properly.

We have successfully added the service, we created the nfs export, and we are able to successfully mount the export.

    [root@tt ~]# mount -o nolock cldb.hostname:/mapr /mnt
    [root@tt ~]# df
    ...
    cldb.hostname:/mapr
    104857600         0 104857600   0% /mnt
    [root@tt ~]# ls /mnt
    clusterName
    [root@tt ~]# ls /mnt/clusterName
    **LOCKS UP**
        

The problem occurs when you attempt to do an ls on the actual /mapr/clusterName directory.  It locks up our session and we have to manually kill our ls pid.

For simplicity sake we are running NFS on the same node as CLDB.

    [root@cl ~]# showmount -e localhost
    Export list for localhost:
    /mapr             *
    /mapr/clusterName *
    [root@cl ~]# maprcli service list -node cldb.hostname
    name         state  logpath                              displayname
    fileserver   2      /opt/mapr/logs/mfs.log               FileServer
    nfs          2      /opt/mapr/logs/nfsserver.log         NFS Gateway
    webserver    0      /opt/mapr/logs/adminuiapp.log        Webserver
    cldb         2      /opt/mapr/logs/cldb.log              CLDB
    tasktracker  0      /opt/mapr/hadoop/hadoop-0.20.2/logs  TaskTracker
    jobtracker   0      /opt/mapr/hadoop/hadoop-0.20.2/logs  JobTracker
    hoststats    2      /opt/mapr/logs/hoststats.log         HostStats

In these logs 1.1.1.1/1.1.2.1 is the CLDB server, 2.2.2.2 is a tasktracker we are using to mount the NFS export.

/opt/mapr/nfsserver.log

    2013-04-19 16:33:22,5520 INFO nfsserver[309] fs/nfsd/main.cc:398 ***** NFS server starting: pid=309, mapr-version: 2.1.1.17042.GA *****
    2013-04-19 16:33:22,5521 INFO nfsserver[309] fs/nfsd/main.cc:412 ******* NFS server MAPR_HOME=/opt/mapr, NFS_PORT=2049, NFS_MGMT_PORT=9998, NFSMON_PORT=9997
    2013-04-19 16:33:22,5661 INFO nfsserver[309] fs/nfsd/mount.cc:2049 Export info: /mapr (rw)
    2013-04-19 16:33:22,5671 INFO nfsserver[309] fs/nfsd/mount.cc:1759 CLDB info: 1.1.1.1:7222
    2013-04-19 16:33:22,6052 INFO nfsserver[309] fs/nfsd/nfsha.cc:405 hostname: cldb.hostname, hostid: 0x44aa172ae0d31477
    2013-04-19 16:33:22,6055 INFO nfsserver[309] fs/nfsd/requesthandle.cc:410 found NFS_HEAPSIZE env var: 236
    2013-04-19 16:33:22,7049 INFO nfsserver[309] fs/nfsd/main.cc:501 NFS server started ... pid=309, uid=1000
    2013-04-19 16:33:24,9110 INFO nfsserver[309] fs/nfsd/cache.cc:1914 cluster clusterName : Setting myTopology to /cldb/topology
    2013-04-19 16:33:24,9111 INFO nfsserver[309] fs/nfsd/cache.cc:1914 cluster clusterName : Setting myTopology to /cldb/topology
    2013-04-19 16:33:22,5647 INFO nfsserver[309] fs/nfsd/nfsserver.cc:899 0.0.0.0[0] running the cmd sudo  /opt/mapr/server/pmapset set 100003 3 6 2049, ret 0
    2013-04-19 16:33:22,5660 INFO nfsserver[309] fs/nfsd/nfsserver.cc:937 0.0.0.0[0] Use32BitFileId is 1
    2013-04-19 16:33:22,5662 INFO nfsserver[309] fs/nfsd/mount.cc:2066 0.0.0.0[0] Allocating export entry 1d2a2d0
    2013-04-19 16:33:22,5676 INFO nfsserver[309] fs/nfsd/mount.cc:1817 0.0.0.0[0] Allocating export entry 1d2c530
    2013-04-19 16:33:22,5771 INFO nfsserver[309] fs/nfsd/nfsserver.cc:899 0.0.0.0[0] running the cmd sudo  /opt/mapr/server/pmapset set 100005 3 6 2049, ret 0
    2013-04-19 16:33:22,5862 INFO nfsserver[309] fs/nfsd/nfsserver.cc:899 0.0.0.0[0] running the cmd sudo  /opt/mapr/server/pmapset set 100005 1 6 2049, ret 0
    2013-04-19 16:33:22,5953 INFO nfsserver[309] fs/nfsd/nfsserver.cc:899 0.0.0.0[0] running the cmd sudo  /opt/mapr/server/pmapset set 100005 3 17 2049, ret 0
    2013-04-19 16:33:22,6043 INFO nfsserver[309] fs/nfsd/nfsserver.cc:899 0.0.0.0[0] running the cmd sudo  /opt/mapr/server/pmapset set 100005 1 17 2049, ret 0
    2013-04-19 16:33:22,6050 INFO nfsserver[309] fs/nfsd/mount.cc:1151 0.0.0.0[0] Setting slash-mapr-clusterid clustername clusterName, id 2775521280
    2013-04-19 16:33:22,6265 INFO nfsserver[309] fs/nfsd/requesthandle.cc:309 0.0.0.0[0] using /etc/mtab to check ramfs mount
    2013-04-19 16:36:15,2129 WARN nfsserver[309] fs/nfsd/nfsserver.cc:540 127.0.0.1[0x24bb97a5] NFS: unsupported NFS version:4, prog=0, bufLen=40
    2013-04-19 16:36:15,2234 INFO nfsserver[309] fs/nfsd/mount.cc:132 127.0.0.1[0xbb421e8d] Mounted /mapr: [udp]
    2013-04-19 16:37:49,2516 INFO nfsserver[309] fs/nfsd/mount.cc:132 2.2.2.2[0x258586c2] Mounted /mapr: [udp]
    2013-04-19 16:38:29,0624 ERROR nfsserver[309] fs/nfsd/cache.cc:1803 2.2.2.2[0xeb87e088] CLDBVolLookup call failed: volname=volume1, err=110, status=0 cldb=1.1.1.1:7222
    2013-04-19 16:38:29,0626 ERROR nfsserver[309] fs/nfsd/cache.cc:1803 2.2.2.2[0xeb87e088] CLDBVolLookup call failed: volname=volume2, err=110, status=0 cldb=1.1.1.1:7222
    2013-04-19 16:38:29,0626 ERROR nfsserver[309] fs/nfsd/cache.cc:1803 2.2.2.2[0xeb87e088] CLDBVolLookup call failed: volname=volume3, err=110, status=0 cldb=1.1.1.1:7222
    2013-04-19 16:38:31,0844 ERROR nfsserver[309] fs/nfsd/nfsha.cc:1178 0.0.0.0[0] Error heartbeating with CLDB: err=Connection timed out(110) cldb=1.1.1.1:7222
    2013-04-19 16:38:32,7314 INFO nfsserver[309] fs/nfsd/cache.cc:1914 cluster clusterName : Setting myTopology to /cldb/topology
    2013-04-19 16:39:10,5086 ERROR nfsserver[309] fs/nfsd/mount.cc:734 2.2.2.2[0xee87e088] Error from server for path=mapr.cluster.root, Connection timed out, err=110, status=0 cldb=1.1.1.1:7222

The first obvious thing I noticed was "Error heartbeating with CLDB: err=Connection timed out(110) cldb=1.1.1.1:7222".  This doesn't make sense since the NFS service is actually running on the same host as the CLDB so why is it timing out?

Also, "Error from server for path=mapr.cluster.root, Connection timed out, err=110, status=0 cldb=1.1.1.1:7222".  This seems to be coming from the TT that is attempting to ls.

/opt/mapr/logs/cldb.log

    2013-04-19 16:49:58,328 INFO ZKDataRetrieval [RPC-thread-2-EventThread]: Process path: /services_config/nfs/cldb.hostname. Event state: SyncConnected. Event type: NodeDataChanged
    2013-04-19 16:50:01,294 ERROR Topology [Monitor Thread for Heartbeat/License]: HeartbeatMonitor: Last heartbeat from cldb.hostname was 63 secs ago. Unregistering NFS Server: host=cldb.hostname,vips=,macs=de:ad:be:ef:00:00,de:ad:be:ef:00:01,
    2013-04-19 16:50:09,697 INFO ZKDataRetrieval [RPC-thread-2-EventThread]: Process path: /services_config/nfs/cldb.hostname. Event state: SyncConnected. Event type: NodeDataChanged
    2013-04-19 16:50:11,277 INFO CLDBServer [RPC-thread-2]: NFSRegister: Request  FSID: 4947792613547906167 isDCA: false NFSHost:Port: 1.1.1.1:2049-1.1.2.1:2049- NFSHostName: cldb.hostname
    2013-04-19 16:50:11,277 INFO Topology [RPC-thread-2]: NFS Registration from: cldb.hostname: with vIPs: []
    2013-04-19 16:50:11,740 INFO CLDBServer [HB-thread-5]: Allocating WorkUnit type : NOCOMPRESS_LIST_UPDATED for container 0 with sequence number 0 to 1.1.1.1:5660-1.1.2.1:5660-
    2013-04-19 16:51:21,315 ERROR Topology [Monitor Thread for Heartbeat/License]: HeartbeatMonitor: Last heartbeat from cldb.hostname was 65 secs ago. Unregistering NFS Server: host=cldb.hostname,vips=,macs=de:ad:be:ef:00:00,de:ad:be:ef:00:01,
    2013-04-19 16:55:24,618 INFO CLDBServer [HB-thread-1]: NFSHeartBeat: Heartbeat from unknown/inactive NFS Server: 1.1.1.1:9998, Requesting registration

The only thing I noticed here is that the registered NFS server keeps becoming unregistered.  I'm not sure if this is an error or if it's normal for an NFS service to drop and reregister regularly.

We even tried to do an NFS mount locally to the NFS server and it results in the same lockup when doing an ls on the cluster directory.

Anyone have any ideas?

Outcomes