nfs client hanging

Guillaume Rousse Guillaume.Rousse at inria.fr
Wed Jul 2 08:00:40 EDT 2008


Guillaume Rousse a écrit :
> Guillaume Rousse a écrit :
>> And I keep getting those "RPC: failed to contact local rpcbind server 
>> (errno 5)" error messages in the logs. I'm switching to regular 
>> portmap instead of rpcbind to see if it helps.
> Replacing rpcbind with portmap doesn't help, the problem still happens.
> 
> Here is a summary of the situation:
> - the server is a netapp NAS 270 running DAT 7.2.5
> 
> - the client is a linux x86_64 host running kernel 2.6.24.5, with 
> following patches applied
>  * 46f8a64bae11f5c9b15b4401f6e9863281999b66
>  * 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486
>  * 63c86716ea34ad94d52e5b0abbda152574dc42b5
>  * 8e60029f403781b8a63b7ffdb7dc1faff6ca651e
>  * c37dcd334c0b0a46a90cfa13b9f69e2aaa89bc09
> 
> - the client doesn't have CONFIG_SUNRPC_BIND34 enabled
> 
> - there is no filtering between the client and the server
> 
> - delegation is disabled
> 
> - the file system is mounted with the following options: 
> rw,nosuid,nodev,hard,intr
> 
> - when the hang occurs, the server is disconnected (no connection found 
> in netstat -t output)
[..]
> Is there anything else I can do to get informations now, before 
> rebooting ? And if already too late, as I suspect the real issue here is 
> the connection loss, what would be the best setup to catch the problem 
> after the reboot ? Setting all rpcdebug flags is way too verbose for a 
> problem occuring once every 2-3 days only...

Actually, I spoke too fast, the problem also occurs with files hosted on 
the linux server, with a different scenario:
- the connection is still present when the hang occurs
- rpc.idmapd seems to have crashed (no trace in the logs, tough, as 
verbosity was set to 0)

Full client (rpc + nfs flags) and servers  (rpc + nfsd flags) logs are 
available at http://www.zarb.org/~guillomovitch/{client,server}.log.

 From what I understand, this is a normal revalidation operation:
Jul  2 04:35:00 chatelet kernel: NFS: revalidating (0:15/7322625)
Jul  2 04:35:02 chatelet kernel: NFS: nfs_update_inode(0:15/7322625 ct=2 
info=0xe)
Jul  2 04:35:02 chatelet kernel: NFS: (0:15/7322625) revalidation complete

Starting from 5:13, revalidation on this inode doesn't complete anymore, 
which seems to imply the problem occurs between 04:35:02 and 05:13:47:
Jul  2 05:13:47 chatelet kernel: NFS: revalidating (0:15/7322625)
Jul  2 08:26:13 chatelet kernel: NFS: revalidating (0:15/7322625)

But I don't find anything really suspicious :(

Due to previous reboot, the client was running a standard mandriva 
2.6.24.5 kernel, without the formentionned patches applied.

Given this problem only occurs for home directories for a host with a 
very specific usage (transient ssh logins for svn commits, or web home 
page access), I'm more and more considering an autofs issue with 
repeated mount/umount events, rather than a pure nfs issue. With a 
similar setup, we have no problem for long-lived mount points, such as 
homedirs on workstations. And I just found report on autofs ML of such 
kind of issues:
http://linux.kernel.org/pipermail/autofs/2008-June/004814.html
-- 
Guillaume Rousse
Moyens Informatiques - INRIA Futurs
Tel: 01 69 35 69 62


More information about the NFSv4 mailing list