EDIT :
To summarize the issue, this is a problem with the NFS server changing IP address and the NFS clients not picking up the new address. I can see via tcpdump
that the client still tries to contact the old IP address on port 2049.
We have several NFS mount points defined like this in /etc/fstab
. As you can see, this is NFS v3.
storage-1:/data/medias/media /var/www/myproject/data/media nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/medias/secure /var/www/myproject/web/secure nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/tobeprocessed /var/www/myproject/data/tobeprocessed nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
storage-1:/data/ftp /var/ftp nfs rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
When we restart the server, we have to unmount and remount each endpoint, otherwise the clients are unable to access the NFS server. I tried up to 5 minutes after the reboot before unmounting and remounting.
After a restart of the NFS server, a simple ls /var/www/myproject/data/media
makes the console hang.
I can also see the following messages in /var/log/syslog
:
Sep 16 11:24:36 encoder-1 kernel: [69688.160102] nfs: server storage-1 not responding, still trying
Sep 16 11:30:15 encoder-1 kernel: [70027.744042] nfs: server storage-1 not responding, still trying
When I umount
and then mount
one of the nfs directories on the client, I can then access it. But I cannot access the others unless I also umount
and mount
them.
I anyone knows a possible solution for this, I am all ears. Note that rpcinfo
shows that the client is able to contact the server, as shown below.
There is one NFS server, 4 NFS clients for a total of 12 mount points.
The result of rpcinfo -p storage-1
from a client :
[0]root@encoder-1:/var/log # rpcinfo -p storage-1
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 52115 status
100024 1 tcp 57907 status
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 2 tcp 2049
100227 3 tcp 2049
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 2 udp 2049
100227 3 udp 2049
100021 1 udp 59603 nlockmgr
100021 3 udp 59603 nlockmgr
100021 4 udp 59603 nlockmgr
100021 1 tcp 47716 nlockmgr
100021 3 tcp 47716 nlockmgr
100021 4 tcp 47716 nlockmgr
100005 1 udp 892 mountd
100005 1 tcp 892 mountd
100005 2 udp 892 mountd
100005 2 tcp 892 mountd
100005 3 udp 892 mountd
100005 3 tcp 892 mountd
When enabling NFS debug traces as explained here, we get the following log message :
Sep 17 05:35:00 encoder-1 kernel: [135112.160230] nfs: server storage-1 not responding, still trying
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS: nfs_lookup_revalidate(///) is valid
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS: revalidating (0:12/5242881)
Sep 17 05:53:47 encoder-1 kernel: [136240.018538] NFS call getattr
Answer
I think it may be a problem resolving the hostname. I have noticed that even if resolving seems to work fine otherwise on the system and network the NFS mount processes appear to be occasionally having a problem with it. I would change the hostname to the actual IP address and try that out.Lets say the FQDN is storage-1.example.org and it would resolve to 192.0.2.11 then do:
192.0.2.11:/data/medias/media /var/www/myproject/data/media nfs bg,rsize=32768,wsize=32768,hard,intr,actimeo=300,nfsvers=3,async,noatime,sec=sys 0 0
Even if that doesn't fix the problem I personally find using the IP address instead of the hostname or FQDN to be preferable. But I understand there could be reasons why you wouldn't want to do that.
Note: I added the bg option, which will background the mount process in case it takes longer to mount, in order to speed up booting. It's up to you if you would prefer that. I thought I'd mention it since when there are a number of NFS mountpoints with each one taking longer (or timing out) to mount the boot time may easily become more than one hour.
No comments:
Post a Comment