state recovery failed on NFSv4 server with error 2
Thomas Garner
thomas536 at gmail.com
Fri Jun 27 21:21:37 EDT 2008
I know this was like forever ago, but I've been running this patch
against 2.6.24, and I don't think I've seen this resurface. Though I
do still have lockups, I have no reason to believe that it's nfs
related, as I don't have any debug info for it and am currently
chalking it up to unknown hardware faults.
Thomas
On Sun, Apr 6, 2008 at 3:10 PM, Thomas Garner <thomas536 at gmail.com> wrote:
> I applied the patch last night and recompiled. We'll see if the
> problem resurfaces.
>
> Thomas
>
> On Sat, Apr 5, 2008 at 4:21 PM, Trond Myklebust
> <trond.myklebust at fys.uio.no> wrote:
>>
>>
>> On Fri, 2008-04-04 at 14:16 -0400, Thomas Garner wrote:
>> > Has anyone had any time to look into this?
>> >
>> > Thanks!
>> > Thomas
>> >
>> > On Mon, Mar 31, 2008 at 2:00 AM, Thomas Garner <thomas536 at gmail.com> wrote:
>> > > Ok, this finally re-occurred.
>> > >
>> > >
>> > > > Does 'ps -efww' show a thread with a name that starts with the
>> > > > IP-address of the server?
>> > >
>> > > This time, no, but I have seen a process named like that before.
>> > >
>> > >
>> > > > Do the syslogs, or does 'dmesg -s 90000' show an Oops involving that
>> > > > process?
>> > >
>> > > No oopses seen in dmesg, /var/log/messages, or /var/log/kern.log.
>> > >
>> > > No lockd running.
>> > >
>> > > Log files, first of a tcpdump from the client with the command line args:
>> > >
>> > > tcpdump -s 0 -w ~/nfs_dump_argento2 -x -i eth1
>> > >
>> > > http://s120158928.onlinehome.us/nfs_dump_argento2.001.bz2
>> > >
>> > > And then a section of /var/log/messages that should have both of these
>> > > turned on:
>> > >
>> > >
>> > > rpcdebug -m rpc -s all
>> > > rpcdebug -m nfs -s all
>> > >
>> > > http://s120158928.onlinehome.us/messages.bz2
>> > >
>> > > Again, I really appreciate you guys looking into this.
>>
>> The only explanation I can find for this is if the calls to
>> nfs4_get_renew_cred() and nfs4_get_setclientid_cred() are failing. That
>> again would mean that we've called nfs4_drop_state_owner() in a
>> situation where we shouldn't...
>>
>> Could you see if the attached patch helps?
>>
>> Cheers
>> Trond
>>
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: Trond Myklebust <Trond.Myklebust at netapp.com>
>> To:
>> Date: Sat, 5 Apr 2008 15:54:17 -0400
>> Subject: No Subject
>> There should be no need to invalidate a perfectly good state owner just
>> because of a stale filehandle. Doing so can cause the state recovery code
>> to break, since nfs4_get_renew_cred() and nfs4_get_setclientid_cred() rely
>> on finding active state owners.
>>
>> Signed-off-by: Trond Myklebust <Trond.Myklebust at netapp.com>
>> ---
>>
>> fs/nfs/nfs4proc.c | 5 +----
>> 1 files changed, 1 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index f38d057..424aa20 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -982,11 +982,8 @@ static int _nfs4_open_expired(struct nfs_open_context *ctx, struct nfs4_state *s
>> if (IS_ERR(opendata))
>> return PTR_ERR(opendata);
>> ret = nfs4_open_recover(opendata, state);
>> - if (ret == -ESTALE) {
>> - /* Invalidate the state owner so we don't ever use it again */
>> - nfs4_drop_state_owner(state->owner);
>> + if (ret == -ESTALE)
>> d_drop(ctx->path.dentry);
>> - }
>> nfs4_opendata_put(opendata);
>> return ret;
>> }
>>
>>
>
More information about the NFSv4
mailing list