[pnfs] [PATCH 9/9] pnfs client prevent race in sequence slot

Trond Myklebust trond.myklebust at fys.uio.no
Wed Sep 26 19:04:58 EDT 2007


On Wed, 2007-09-26 at 17:26 -0400, William A. (Andy) Adamson wrote:
> Here is what is happening on the failure of the Connectathon lock test
> #12.
> 
> The client produces a SEQUENCE:PUTFH:GETATTR (nfs4_proc_getattr)
> compond followed by a SEQUENCE:PUTFH:LOCKU (nfs4_proc_unlck). 
> 
> Wireshark shows the nfs4_proc_getattr as succeeding, and the
> nfs4_proc_unlck as failing with BAD_SESSION. 
> 
> The nfs4_proc_getattr rpc task catches a signal (from the test) and
> returns -ERESTARTSYS, having not called decode, which results in the
> nfs4_proc_getattr local variable nfs41_sequence_res.status being left
> unset - so whatever garbage is in the un-initalized
> nfs41_sequence_res.status is what is passed to nfs41_sequence_done.
> This happens to be non-zero, and is interpreted by
> nfs41_proc_sequence_done() as an error, which means that the sequence
> number for the slot is decremented, and the next rpc (nfs4_proc_unlk)
> will send the same sequence number and get an error (BAD_SESSION on
> our server). 
> 
> The client does not know if the rpc succeeded, because it never
> decoded the reply. But, in order to process the SEQUENCE operation on
> the nfs4_proc_getattr and not get out of sync with the server, the
> client needs to know the status of the SEQUENCE operation sent by the
> server. 
> 
> Suggestions?

I suggest bringing this question up on the ietf channel. I think this
question is of interest to more than just Linux...

That said, how about the following suggestion:

If we have to interrupt an RPC call, then we immediately fire off an
asynchronous RPC call with a single SEQUENCE call that uses the _same_
sa_sequence id as the synchronous RPC call that was cancelled (and drops
all the other arguments).

AFAICS, the server should then reply either with an NFS4_OK or
NFS4ERR_SEQ_FALSE_RETRY. In either case, we should then be guaranteed
that the sa_sequenceid has been bumped by exactly 1 irrespective of
whether or not the server processed the synchronous RPC call.

Cheers
   Trond



More information about the pNFS mailing list