[NFS] Missing handling for NFS4ERR_OLD_STATEID in nfs4_handle_exception?

Jeff Layton jlayton at poochiereds.net
Thu Apr 12 07:59:40 EDT 2007


Frank Filz wrote:
> I'm looking at the following code, and wondering if something is missing
> in the handling of NFS4ERR_OLD_STATEID. The result is that if this error
> occurs, nfs4_map_errors() will print:
> 
> 	nfs4_map_errors could not handle NFSv4 error 10024
> 
> It also looks like the handling of NFS4ERR_DELAY etc. may be wrong,
> since if nfs4_delay() returns without error, it falls through to the
> handling of NFS4ERR_OLD_STATEID.
> 
> Based on the code in nfs4_async_handle_error(), it looks like it might
> be sufficient to set ret = 0 in addition to exception->retry = 1.
> 
> Thanks for any thoughts
> 
> Frank Filz
> 
> /* This is the error handling routine for processes that are allowed
>  * to sleep.
>  */
> int nfs4_handle_exception(const struct nfs_server *server, int
> errorcode, struct nfs4_exception *exception)
> {
> 	struct nfs_client *clp = server->nfs_client;
> 	int ret = errorcode;
> 
> 	exception->retry = 0;
> 	switch(errorcode) {
> 		case 0:
> 			return 0;
> 		case -NFS4ERR_STALE_CLIENTID:
> 		case -NFS4ERR_STALE_STATEID:
> 		case -NFS4ERR_EXPIRED:
> 			nfs4_schedule_state_recovery(clp);
> 			ret = nfs4_wait_clnt_recover(server->client, clp);
> 			if (ret == 0)
> 				exception->retry = 1;
> 			break;
> 		case -NFS4ERR_FILE_OPEN:
> 		case -NFS4ERR_GRACE:
> 		case -NFS4ERR_DELAY:
> 			ret = nfs4_delay(server->client, &exception->timeout);
> 			if (ret != 0)
> 				break;
> 		case -NFS4ERR_OLD_STATEID:
> 			exception->retry = 1;
> 	}
> 	/* We failed to handle the error */
> 	return nfs4_map_errors(ret);
> }
> 
> 

This looks pretty much correct to me as-is. If we set ret=0 on 
-NFS4ERR_OLD_STATEID, then the caller won't get back an error code. This 
makes an assumption that every caller of nfs4_handle_exception is 
looping based on exception->retry. I'm not sure if that's a safe 
assumption. A better idea *might* be to fix up nfs4_map_errors not to 
throw the warning for some errors < -1000, but still return an error.

This sounds sort of like addressing the symptom and not the real 
problem, however. The real question ought to be why you're getting 
OLD_STATEID errors back from the server here. There can be legit 
reasons, but these errors ought to be fairly rare. I generally only have 
seen them when processes are signalled while RPC requests are in flight.

Also, it seems like when we hit -NFS4ERR_DELAY, we want to retry but 
only if the delay didn't hit an error. It looks like it only returns 
error if process was signalled while in nfs4_delay, and then we want to 
pass an -ERESTARTSYS back up the call chain (and not retry). So I think 
that's also correct as-is.

-- Jeff



More information about the NFSv4 mailing list