state recovery failed on NFSv4 server with error 2

Thomas Garner thomas536 at gmail.com
Fri Jun 27 21:21:37 EDT 2008


I know this was like forever ago, but I've been running this patch
against 2.6.24, and I don't think I've seen this resurface.  Though I
do still have lockups, I have no reason to believe that it's nfs
related, as I don't have any debug info for it and am currently
chalking it up to unknown hardware faults.

Thomas

On Sun, Apr 6, 2008 at 3:10 PM, Thomas Garner <thomas536 at gmail.com> wrote:
> I applied the patch last night and recompiled.  We'll see if the
> problem resurfaces.
>
> Thomas
>
> On Sat, Apr 5, 2008 at 4:21 PM, Trond Myklebust
> <trond.myklebust at fys.uio.no> wrote:
>>
>>
>>  On Fri, 2008-04-04 at 14:16 -0400, Thomas Garner wrote:
>>  > Has anyone had any time to look into this?
>>  >
>>  > Thanks!
>>  > Thomas
>>  >
>>  > On Mon, Mar 31, 2008 at 2:00 AM, Thomas Garner <thomas536 at gmail.com> wrote:
>>  > > Ok, this finally re-occurred.
>>  > >
>>  > >
>>  > >  >  Does 'ps -efww' show a thread with a name that starts with the
>>  > >  >  IP-address of the server?
>>  > >
>>  > >  This time, no, but I have seen a process named like that before.
>>  > >
>>  > >
>>  > >  >  Do the syslogs, or does 'dmesg -s 90000' show an Oops involving that
>>  > >  >  process?
>>  > >
>>  > >  No oopses seen in dmesg, /var/log/messages, or /var/log/kern.log.
>>  > >
>>  > >  No lockd running.
>>  > >
>>  > >  Log files, first of a tcpdump from the client with the command line args:
>>  > >
>>  > >  tcpdump -s 0 -w ~/nfs_dump_argento2 -x -i eth1
>>  > >
>>  > >  http://s120158928.onlinehome.us/nfs_dump_argento2.001.bz2
>>  > >
>>  > >  And then a section of /var/log/messages that should have both of these
>>  > >  turned on:
>>  > >
>>  > >
>>  > >  rpcdebug -m rpc -s all
>>  > >  rpcdebug -m nfs -s all
>>  > >
>>  > >  http://s120158928.onlinehome.us/messages.bz2
>>  > >
>>  > >  Again, I really appreciate you guys looking into this.
>>
>>  The only explanation I can find for this is if the calls to
>>  nfs4_get_renew_cred() and nfs4_get_setclientid_cred() are failing. That
>>  again would mean that we've called nfs4_drop_state_owner() in a
>>  situation where we shouldn't...
>>
>>  Could you see if the attached patch helps?
>>
>>  Cheers
>>   Trond
>>
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Trond Myklebust <Trond.Myklebust at netapp.com>
>> To:
>> Date: Sat, 5 Apr 2008 15:54:17 -0400
>> Subject: No Subject
>> There should be no need to invalidate a perfectly good state owner just
>>  because of a stale filehandle. Doing so can cause the state recovery code
>>  to break, since nfs4_get_renew_cred() and nfs4_get_setclientid_cred() rely
>>  on finding active state owners.
>>
>>  Signed-off-by: Trond Myklebust <Trond.Myklebust at netapp.com>
>>  ---
>>
>>   fs/nfs/nfs4proc.c |    5 +----
>>   1 files changed, 1 insertions(+), 4 deletions(-)
>>
>>  diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>  index f38d057..424aa20 100644
>>  --- a/fs/nfs/nfs4proc.c
>>  +++ b/fs/nfs/nfs4proc.c
>>  @@ -982,11 +982,8 @@ static int _nfs4_open_expired(struct nfs_open_context *ctx, struct nfs4_state *s
>>         if (IS_ERR(opendata))
>>                 return PTR_ERR(opendata);
>>         ret = nfs4_open_recover(opendata, state);
>>  -       if (ret == -ESTALE) {
>>  -               /* Invalidate the state owner so we don't ever use it again */
>>  -               nfs4_drop_state_owner(state->owner);
>>  +       if (ret == -ESTALE)
>>                 d_drop(ctx->path.dentry);
>>  -       }
>>         nfs4_opendata_put(opendata);
>>         return ret;
>>   }
>>
>>
>


More information about the NFSv4 mailing list