nfs client hanging
Guillaume Rousse
Guillaume.Rousse at inria.fr
Wed Jul 2 08:00:40 EDT 2008
Guillaume Rousse a écrit :
> Guillaume Rousse a écrit :
>> And I keep getting those "RPC: failed to contact local rpcbind server
>> (errno 5)" error messages in the logs. I'm switching to regular
>> portmap instead of rpcbind to see if it helps.
> Replacing rpcbind with portmap doesn't help, the problem still happens.
>
> Here is a summary of the situation:
> - the server is a netapp NAS 270 running DAT 7.2.5
>
> - the client is a linux x86_64 host running kernel 2.6.24.5, with
> following patches applied
> * 46f8a64bae11f5c9b15b4401f6e9863281999b66
> * 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486
> * 63c86716ea34ad94d52e5b0abbda152574dc42b5
> * 8e60029f403781b8a63b7ffdb7dc1faff6ca651e
> * c37dcd334c0b0a46a90cfa13b9f69e2aaa89bc09
>
> - the client doesn't have CONFIG_SUNRPC_BIND34 enabled
>
> - there is no filtering between the client and the server
>
> - delegation is disabled
>
> - the file system is mounted with the following options:
> rw,nosuid,nodev,hard,intr
>
> - when the hang occurs, the server is disconnected (no connection found
> in netstat -t output)
[..]
> Is there anything else I can do to get informations now, before
> rebooting ? And if already too late, as I suspect the real issue here is
> the connection loss, what would be the best setup to catch the problem
> after the reboot ? Setting all rpcdebug flags is way too verbose for a
> problem occuring once every 2-3 days only...
Actually, I spoke too fast, the problem also occurs with files hosted on
the linux server, with a different scenario:
- the connection is still present when the hang occurs
- rpc.idmapd seems to have crashed (no trace in the logs, tough, as
verbosity was set to 0)
Full client (rpc + nfs flags) and servers (rpc + nfsd flags) logs are
available at http://www.zarb.org/~guillomovitch/{client,server}.log.
From what I understand, this is a normal revalidation operation:
Jul 2 04:35:00 chatelet kernel: NFS: revalidating (0:15/7322625)
Jul 2 04:35:02 chatelet kernel: NFS: nfs_update_inode(0:15/7322625 ct=2
info=0xe)
Jul 2 04:35:02 chatelet kernel: NFS: (0:15/7322625) revalidation complete
Starting from 5:13, revalidation on this inode doesn't complete anymore,
which seems to imply the problem occurs between 04:35:02 and 05:13:47:
Jul 2 05:13:47 chatelet kernel: NFS: revalidating (0:15/7322625)
Jul 2 08:26:13 chatelet kernel: NFS: revalidating (0:15/7322625)
But I don't find anything really suspicious :(
Due to previous reboot, the client was running a standard mandriva
2.6.24.5 kernel, without the formentionned patches applied.
Given this problem only occurs for home directories for a host with a
very specific usage (transient ssh logins for svn commits, or web home
page access), I'm more and more considering an autofs issue with
repeated mount/umount events, rather than a pure nfs issue. With a
similar setup, we have no problem for long-lived mount points, such as
homedirs on workstations. And I just found report on autofs ML of such
kind of issues:
http://linux.kernel.org/pipermail/autofs/2008-June/004814.html
--
Guillaume Rousse
Moyens Informatiques - INRIA Futurs
Tel: 01 69 35 69 62
More information about the NFSv4
mailing list