nfs client hanging

Trond Myklebust trond.myklebust at fys.uio.no
Wed Jul 2 13:19:17 EDT 2008


On Wed, 2008-07-02 at 14:00 +0200, Guillaume Rousse wrote:
> Guillaume Rousse a écrit :
> > Guillaume Rousse a écrit :
> >> And I keep getting those "RPC: failed to contact local rpcbind server 
> >> (errno 5)" error messages in the logs. I'm switching to regular 
> >> portmap instead of rpcbind to see if it helps.
> > Replacing rpcbind with portmap doesn't help, the problem still happens.
> > 
> > Here is a summary of the situation:
> > - the server is a netapp NAS 270 running DAT 7.2.5
> > 
> > - the client is a linux x86_64 host running kernel 2.6.24.5, with 
> > following patches applied
> >  * 46f8a64bae11f5c9b15b4401f6e9863281999b66
> >  * 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486
> >  * 63c86716ea34ad94d52e5b0abbda152574dc42b5
> >  * 8e60029f403781b8a63b7ffdb7dc1faff6ca651e
> >  * c37dcd334c0b0a46a90cfa13b9f69e2aaa89bc09
> > 
> > - the client doesn't have CONFIG_SUNRPC_BIND34 enabled
> > 
> > - there is no filtering between the client and the server
> > 
> > - delegation is disabled
> > 
> > - the file system is mounted with the following options: 
> > rw,nosuid,nodev,hard,intr
> > 
> > - when the hang occurs, the server is disconnected (no connection found 
> > in netstat -t output)
> [..]
> > Is there anything else I can do to get informations now, before 
> > rebooting ? And if already too late, as I suspect the real issue here is 
> > the connection loss, what would be the best setup to catch the problem 
> > after the reboot ? Setting all rpcdebug flags is way too verbose for a 
> > problem occuring once every 2-3 days only...
> 
> Actually, I spoke too fast, the problem also occurs with files hosted on 
> the linux server, with a different scenario:
> - the connection is still present when the hang occurs
> - rpc.idmapd seems to have crashed (no trace in the logs, tough, as 
> verbosity was set to 0)
> 
> Full client (rpc + nfs flags) and servers  (rpc + nfsd flags) logs are 
> available at http://www.zarb.org/~guillomovitch/{client,server}.log.
> 
>  From what I understand, this is a normal revalidation operation:
> Jul  2 04:35:00 chatelet kernel: NFS: revalidating (0:15/7322625)
> Jul  2 04:35:02 chatelet kernel: NFS: nfs_update_inode(0:15/7322625 ct=2 
> info=0xe)
> Jul  2 04:35:02 chatelet kernel: NFS: (0:15/7322625) revalidation complete
> 
> Starting from 5:13, revalidation on this inode doesn't complete anymore, 
> which seems to imply the problem occurs between 04:35:02 and 05:13:47:
> Jul  2 05:13:47 chatelet kernel: NFS: revalidating (0:15/7322625)
> Jul  2 08:26:13 chatelet kernel: NFS: revalidating (0:15/7322625)
> 
> But I don't find anything really suspicious :(

If the idmapper was crashed, then that is quite sufficient to explain
your trouble. The above GETATTR calls rely on the idmapper to translate
NFSv4 names into uids and gids, and will hang if it is not responding.

Trond



More information about the NFSv4 mailing list