nfs client hanging

Guillaume Rousse Guillaume.Rousse at inria.fr
Fri Jun 20 08:03:54 EDT 2008


Trond Myklebust a écrit :
> On Tue, 2008-06-17 at 14:14 +0200, Guillaume Rousse wrote:
>> Jun 17 12:05:46 chatelet kernel: RPC:       __rpc_wake_up_task done
>> Jun 17 12:05:46 chatelet kernel: RPC:  2605 sync task resuming
>> Jun 17 12:05:46 chatelet kernel: RPC:  2605 return -512, status -512
>> Jun 17 12:05:46 chatelet kernel: RPC:  2605 release task
>> Jun 17 12:05:46 chatelet kernel: RPC:  2605 releasing UNIX cred e4ce8600
>> Jun 17 12:05:46 chatelet kernel: RPC:       rpc_release_client(df59ad00)
>> Jun 17 12:05:46 chatelet kernel: nfs_revalidate_inode: (0:16/100) 
>> getattr failed, error=-512
>> Jun 17 12:05:46 chatelet kernel: NFS: revalidating (0:16/100)
>> Jun 17 12:05:46 chatelet kernel: RPC:       new task initialized, 
>> procpid 722
>> Jun 17 12:05:46 chatelet kernel: RPC:       allocated task f151fe00
>> Jun 17 12:05:46 chatelet kernel: RPC:     0 looking up UNIX cred
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 __rpc_execute flags=0x80
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 call_start nfs4 proc 1 (sync)
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 call_reserve (status 0)
>> Jun 17 12:05:46 chatelet kernel: RPC:       waiting for request slot
>> Jun 17 12:05:46 chatelet kernel: RPC:  2605 freeing task
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 sleep_on(queue 
>> "xprt_backlog" time 7869098)
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 added to queue f7d28964 
>> "xprt_backlog"
>> Jun 17 12:05:46 chatelet kernel: RPC:  2606 sync task going to sleep
>>
>> I'm not an expert, but it seems the 'nfs_revalidate_inode: (0:16/100) 
>> getattr failed, error=-512', followed by 'added to queue f7d28964 
>> "xprt_backlog"' basically implies something get wrong, and is then put 
>> in a queue to be tried again later. Which seems to explain the hang.
> 
> No. It is quite correct: look at the id of the tasks (the first number),
> which clearly shows that these are two different RPC calls.
> 
> However the fact that everything is being queued on the xprt_backlog
> means that there is a heavy congestion, and that all the RPC slots are
> full. It supports the suspicion that the server is failing to respond to
> the client, and so the client requests are all backed up.
> 
> Does 'netstat -t' show the client as still being connected to the
> server?
I just had the problem again, and the client is disconnected from the 
server.

And I keep getting those "RPC: failed to contact local rpcbind server 
(errno 5)" error messages in the logs. I'm switching to regular portmap 
instead of rpcbind to see if it helps.

-- 
Guillaume Rousse
Moyens Informatiques - INRIA Futurs
Tel: 01 69 35 69 62


More information about the NFSv4 mailing list