debian - NFS client does not pick up server after restart

Sunday, January 3, 2016

debian - NFS client does not pick up server after restart

EDIT :

To summarize the issue, this is a problem with the NFS server changing IP address and the NFS clients not picking up the new address. I can see via tcpdump that the client still tries to contact the old IP address on port 2049.

We have several NFS mount points defined like this in /etc/fstab. As you can see, this is NFS v3.

storage-1:/data/medias/media /var/www/myproject/data/media nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/medias/secure /var/www/myproject/web/secure nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/tobeprocessed /var/www/myproject/data/tobeprocessed nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/ftp /var/ftp nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0

When we restart the server, we have to unmount and remount each endpoint, otherwise the clients are unable to access the NFS server. I tried up to 5 minutes after the reboot before unmounting and remounting.

After a restart of the NFS server, a simple ls /var/www/myproject/data/media makes the console hang.

I can also see the following messages in /var/log/syslog :

Sep 16 11:24:36 encoder-1 kernel: [69688.160102] nfs: server storage-1 not responding, still trying
Sep 16 11:30:15 encoder-1 kernel: [70027.744042] nfs: server storage-1 not responding, still trying

When I umount and then mount one of the nfs directories on the client, I can then access it. But I cannot access the others unless I also umount and mount them.

I anyone knows a possible solution for this, I am all ears. Note that rpcinfo shows that the client is able to contact the server, as shown below.

There is one NFS server, 4 NFS clients for a total of 12 mount points.

The result of rpcinfo -p storage-1 from a client :

[0]root@encoder-1:/var/log # rpcinfo -p storage-1
   program vers proto   port  service

    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    100024    1   udp  52115  status
    100024    1   tcp  57907  status
    100003    2   tcp   2049  nfs
    100003    3   tcp   2049  nfs

    100003    4   tcp   2049  nfs
    100227    2   tcp   2049
    100227    3   tcp   2049
    100003    2   udp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   udp   2049  nfs
    100227    2   udp   2049
    100227    3   udp   2049
    100021    1   udp  59603  nlockmgr
    100021    3   udp  59603  nlockmgr

    100021    4   udp  59603  nlockmgr
    100021    1   tcp  47716  nlockmgr
    100021    3   tcp  47716  nlockmgr
    100021    4   tcp  47716  nlockmgr
    100005    1   udp    892  mountd
    100005    1   tcp    892  mountd
    100005    2   udp    892  mountd
    100005    2   tcp    892  mountd
    100005    3   udp    892  mountd
    100005    3   tcp    892  mountd

When enabling NFS debug traces as explained here, we get the following log message :

Sep 17 05:35:00 encoder-1 kernel: [135112.160230] nfs: server storage-1 not responding, still trying
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS: nfs_lookup_revalidate(///) is valid
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS: revalidating (0:12/5242881)
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS call  getattr

Answer

I think it may be a problem resolving the hostname. I have noticed that even if resolving seems to work fine otherwise on the system and network the NFS mount processes appear to be occasionally having a problem with it. I would change the hostname to the actual IP address and try that out.Lets say the FQDN is storage-1.example.org and it would resolve to 192.0.2.11 then do:

192.0.2.11:/data/medias/media /var/www/myproject/data/media nfs bg,rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0

Even if that doesn't fix the problem I personally find using the IP address instead of the hostname or FQDN to be preferable. But I understand there could be reasons why you wouldn't want to do that.

Note: I added the bg option, which will background the mount process in case it takes longer to mount, in order to speed up booting. It's up to you if you would prefer that. I thought I'd mention it since when there are a number of NFS mountpoints with each one taking longer (or timing out) to mount the boot time may easily become more than one hour.

Blog

Sunday, January 3, 2016

debian - NFS client does not pick up server after restart

No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server