nfs client hanging

Guillaume Rousse Guillaume.Rousse at inria.fr
Thu Jul 3 04:29:52 EDT 2008


Trond Myklebust a écrit :
> On Wed, 2008-07-02 at 14:00 +0200, Guillaume Rousse wrote:
>> Guillaume Rousse a écrit :
>>> Guillaume Rousse a écrit :
>>>> And I keep getting those "RPC: failed to contact local rpcbind server 
>>>> (errno 5)" error messages in the logs. I'm switching to regular 
>>>> portmap instead of rpcbind to see if it helps.
>>> Replacing rpcbind with portmap doesn't help, the problem still happens.
>>>
>>> Here is a summary of the situation:
>>> - the server is a netapp NAS 270 running DAT 7.2.5
>>>
>>> - the client is a linux x86_64 host running kernel 2.6.24.5, with 
>>> following patches applied
>>>  * 46f8a64bae11f5c9b15b4401f6e9863281999b66
>>>  * 4e99a1ff3410c627a428d5ddb6cd2e7bc908a486
>>>  * 63c86716ea34ad94d52e5b0abbda152574dc42b5
>>>  * 8e60029f403781b8a63b7ffdb7dc1faff6ca651e
>>>  * c37dcd334c0b0a46a90cfa13b9f69e2aaa89bc09
>>>
>>> - the client doesn't have CONFIG_SUNRPC_BIND34 enabled
>>>
>>> - there is no filtering between the client and the server
>>>
>>> - delegation is disabled
>>>
>>> - the file system is mounted with the following options: 
>>> rw,nosuid,nodev,hard,intr
>>>
>>> - when the hang occurs, the server is disconnected (no connection found 
>>> in netstat -t output)
>> [..]
>>> Is there anything else I can do to get informations now, before 
>>> rebooting ? And if already too late, as I suspect the real issue here is 
>>> the connection loss, what would be the best setup to catch the problem 
>>> after the reboot ? Setting all rpcdebug flags is way too verbose for a 
>>> problem occuring once every 2-3 days only...
>> Actually, I spoke too fast, the problem also occurs with files hosted on 
>> the linux server, with a different scenario:
>> - the connection is still present when the hang occurs
>> - rpc.idmapd seems to have crashed (no trace in the logs, tough, as 
>> verbosity was set to 0)
>>
>> Full client (rpc + nfs flags) and servers  (rpc + nfsd flags) logs are 
>> available at http://www.zarb.org/~guillomovitch/{client,server}.log.
>>
>>  From what I understand, this is a normal revalidation operation:
>> Jul  2 04:35:00 chatelet kernel: NFS: revalidating (0:15/7322625)
>> Jul  2 04:35:02 chatelet kernel: NFS: nfs_update_inode(0:15/7322625 ct=2 
>> info=0xe)
>> Jul  2 04:35:02 chatelet kernel: NFS: (0:15/7322625) revalidation complete
>>
>> Starting from 5:13, revalidation on this inode doesn't complete anymore, 
>> which seems to imply the problem occurs between 04:35:02 and 05:13:47:
>> Jul  2 05:13:47 chatelet kernel: NFS: revalidating (0:15/7322625)
>> Jul  2 08:26:13 chatelet kernel: NFS: revalidating (0:15/7322625)
>>
>> But I don't find anything really suspicious :(
> 
> If the idmapper was crashed, then that is quite sufficient to explain
> your trouble. The above GETATTR calls rely on the idmapper to translate
> NFSv4 names into uids and gids, and will hang if it is not responding.
If the idmapper is restarted, are those call supposed to finally 
succeed, or still persist to fail ? And if there can we have a default 
uid/gid in the nfs server, if this call fail ?


-- 
Guillaume Rousse
Moyens Informatiques - INRIA Futurs
Tel: 01 69 35 69 62


More information about the NFSv4 mailing list