[pnfs] [PATCH] nfs41: Correctly use new session in exception handling
William A. (Andy) Adamson
androsadamson at gmail.com
Wed May 28 08:39:01 EDT 2008
cool - good work! i'll turn on CONFIG_DEBUG_LIST in my kernels.....
-->Andy
On Wed, May 28, 2008 at 4:07 AM, Benny Halevy <bhalevy at panasas.com> wrote:
> Andy,
> I did hit a related problem with latest code on the
> same code path (with CONFIG_DEBUG_LIST=y).
>
> This is caused by lack of reference on clp->cl_cb_xprt that's
> passed down from nfsd4_create_session to do_probe_callback.
> The problem is aggravated by fast timing since the request that
> carried the create_session rpc has to be destroyed before
> do_probe_callback uses (after free) the svc_xprt and that's
> probably why this happens rarely (if at all) with debug
> printouts.
>
> I'll send a patch to fix this in reply to this message.
>
> Benny
>
> On May. 28, 2008, 0:27 +0300, "Halevy, Benny" <bhalevy at panasas.com> wrote:
> > Thanks for trying. I'll try to reproduce tomorrow with the latest code
> > on my setup.
> >
> > Benny
> >
> > ________________________________
> >
> > From: androsadamson at gmail.com on behalf of William A. (Andy) Adamson
> > Sent: Tue 2008-05-27 21:10
> > To: Halevy, Benny
> > Cc: andros at netapp.com; pnfs at linux-nfs.org
> > Subject: Re: [pnfs] [PATCH] nfs41: Correctly use new session in exception
> handling
> >
> >
> > I turned all debug off on both client and server, still was unable to
> reproduce the problem, got the same results as below.... :(
> >
> > -->Andy
> >
> >
> > On Tue, May 27, 2008 at 12:53 PM, Benny Halevy <bhalevy at panasas.com>
> wrote:
> >
> >
> > I forgot to mention that I couldn't reproduce this with debug
> printing on...
> > Did you try with nfsd_debug by any chance?
> > Benny
> >
> >
> > William A. (Andy) Adamson wrote:
> > > Hi Benny
> > >
> > > I can't reproduce the problem. Using todays tree, origin/nfs41
> branch
> > > on client and server, I attempt to mount a non-existing path: the
> > > server does not export a /unix4.
> > >
> > > mount -t nfs4 192.168.1.105:/unix4 /mnt
> > >
> > > What I see is a successful EXCHANGE_ID, CREATE_SESSION (with a
> > > successful callback channel established),
> SEQ/PUTROOTFH/GETFH/GETATTR,
> > > a couple of more successful SEQ/PUTFH/GETATTRS to establish the
> root,
> > > then a PUTFH/LOOKUP/GETATTR for /unix4 that fails, followed by a
> > > DESTROY_SESSION.
> > >
> > > Are you doing something different?
> > >
> > > Here is my /etc/exports file:
> > >
> > > /var/lib/nfs/v4root
> *(fsid=0,rw,insecure,no_root_squash,nohide,crossmnt)
> > > /var/lib/nfs/v4root/spnfs
> *(rw,insecure,no_root_squash,nohide,crossmnt)
> > > /var/lib/nfs/v4root/unix
> *(rw,insecure,no_root_squash,nohide,crossmnt)
> > >
> > > with
> > > /export on /var/lib/nfs/v4root/spnfs type none (rw,bind)
> > > /unix on /var/lib/nfs/v4root/unix type none (rw,bind)
> > >
> > >
> > > -->Andy
> > >
> > > On Sun, May 25, 2008 at 1:19 PM, Benny Halevy <
> bhalevy at panasas.com
> >
> > > <mailto:bhalevy at panasas.com>> wrote:
> > >
> > > Andy, I think that this approach is better than what we had
> > > before but bare in mind that it breaks the server when
> > > the exported path is not found.
> > >
> > > I get the following oops when the client mounts a
> non-existing path:
> > > (with this patch applied)
> > >
> > > May 25 18:48:01 bh-testlin1 kernel: BUG: unable to handle
> kernel
> > > NULL pointer dereference at 00000000000000a0
> > > May 25 18:48:01 bh-testlin1 kernel: IP: [<ffffffff88263233>]
> > > :sunrpc:bc_send_request+0xc0/0x1e2
> > > May 25 18:48:01 bh-testlin1 kernel: PGD 78800067 PUD 78811067
> PMD 0
> > > May 25 18:48:01 bh-testlin1 kernel: Oops: 0000 [1] SMP
> > > May 25 18:48:01 bh-testlin1 kernel: CPU 0
> > > May 25 18:48:01 bh-testlin1 kernel: Modules linked in: nfsd
> > > auth_rpcgss exportfs autofs4 nfs lockd nfs_acl sunrpc ipv6
> > > dm_mirror dm_multipath dm_mod video output sbs sbshc battery
> ac
> > > kvm_amd kvm snd_hda_intel button snd_pcm k8temp i2c_nforce2
> > > i2c_core snd_timer pcspkr snd_page_alloc snd_hwdep sr_mod snd
> > > soundcore hwmon forcedeth serio_raw sg cdrom pata_amd
> ata_generic
> > > pata_acpi sata_nv libata sd_mod scsi_mod ext3 jbd mbcache
> ehci_hcd
> > > ohci_hcd uhci_hcd [last unloaded: scsi_wait_scan]
> > > May 25 18:48:01 bh-testlin1 kernel: Pid: 3271, comm:
> nfs4_cb_probe
> > > Not tainted 2.6.25-nfs41 #39
> > > May 25 18:48:01 bh-testlin1 kernel: RIP:
> 0010:[<ffffffff88263233>]
> > > [<ffffffff88263233>] :sunrpc:bc_send_request+0xc0/0x1e2
> > > May 25 18:48:01 bh-testlin1 kernel: RSP:
> 0018:ffff810078991d10
> > > EFLAGS: 00010246
> > > May 25 18:48:01 bh-testlin1 kernel: RAX: 0000000000000000
> RBX:
> > > 0000000000000074 RCX: 0000000000000005
> > > May 25 18:48:01 bh-testlin1 kernel: RDX: ffff8100788ec008
> RSI:
> > > 0000000000000057 RDI: ffff8100788ec008
> > > May 25 18:48:01 bh-testlin1 kernel: RBP: ffff8100788ec008
> R08:
> > > ffff81007880eb60 R09: ffff81007880eb20
> > > May 25 18:48:01 bh-testlin1 kernel: R10: 0000000000000000
> R11:
> > > ffffffff88267550 R12: ffff81007893e000
> > > May 25 18:48:01 bh-testlin1 kernel: R13: ffff81007893e008
> R14:
> > > 0000000000000000 R15: ffff8100798ee340
> > > May 25 18:48:01 bh-testlin1 kernel: FS:
> 00007f05577b26f0(0000)
> > > GS:ffffffff8139c000(0000) knlGS:0000000000000000
> > > May 25 18:48:01 bh-testlin1 kernel: CS: 0010 DS: 0018 ES:
> 0018
> > > CR0: 000000008005003b
> > > May 25 18:48:01 bh-testlin1 kernel: CR2: 00000000000000a0
> CR3:
> > > 0000000078810000 CR4: 00000000000006e0
> > > May 25 18:48:01 bh-testlin1 kernel: DR0: 0000000000000000
> DR1:
> > > 0000000000000000 DR2: 0000000000000000
> > > May 25 18:48:01 bh-testlin1 kernel: DR3: 0000000000000000
> DR6:
> > > 00000000ffff0ff0 DR7: 0000000000000400
> > > May 25 18:48:01 bh-testlin1 kernel: Process nfs4_cb_probe
> (pid:
> > > 3271, threadinfo ffff810078990000, task ffff810076157360)
> > > May 25 18:48:01 bh-testlin1 kernel: Stack: 00000000ffffffff
> > > ffff810079d1c258 0000000000000286 0000000000000000
> > > May 25 18:48:01 bh-testlin1 kernel: 0000000000000074
> > > ffff81007880eb20 ffff81007880e800 ffff81007893e000
> > > May 25 18:48:01 bh-testlin1 kernel: ffff81007e1f0200
> > > ffff81007893e0b8 0000000000000000 ffffffff8826197b
> > > May 25 18:48:01 bh-testlin1 kernel: Call Trace:
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8826197b>] ?
> > > :sunrpc:xprt_transmit+0xd3/0x1c1
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8832873c>] ?
> > > :nfsd:nfs4_xdr_enc_cb_null+0x0/0x50
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8825f4ab>] ?
> > > :sunrpc:call_transmit+0x200/0x23f
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff88265f3f>] ?
> > > :sunrpc:__rpc_execute+0x7b/0x21a
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8825fc62>] ?
> > > :sunrpc:rpc_run_task+0x4f/0x56
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8825fcfe>] ?
> > > :sunrpc:rpc_call_sync+0x3e/0x5b
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff88328e8f>] ?
> > > :nfsd:do_probe_callback+0x250/0x2c6
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff88328c3f>] ?
> > > :nfsd:do_probe_callback+0x0/0x2c6
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff81043ddd>] ?
> > > kthread+0x47/0x76
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8100cac8>] ?
> > > child_rip+0xa/0x12
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff81043d96>] ?
> > > kthread+0x0/0x76
> > > May 25 18:48:01 bh-testlin1 kernel: [<ffffffff8100cabe>] ?
> > > child_rip+0x0/0x12
> > > May 25 18:48:01 bh-testlin1 kernel:
> > > May 25 18:48:01 bh-testlin1 kernel:
> > > May 25 18:48:01 bh-testlin1 kernel: Code: 49 8b 5d 08 48 63
> c7 c7
> > > 44 24 14 00 80 00 00 48 39 d8 b8 00
> > > 00 00 00 0f 45 44 24 14 89 44 24 14 48 8b 6d 08 49 8b 47 10
> 48 89
> > > ef <4c> 8b a0 a0 00 00 00 e8 61 f2
> > > db f8 48 c1 e8 0c 89 ea 48 be 00
> > > May 25 18:48:01 bh-testlin1 kernel: RIP [<ffffffff88263233>]
> > > :sunrpc:bc_send_request+0xc0/0x1e2
> > > May 25 18:48:01 bh-testlin1 kernel: RSP <ffff810078991d10>
> > > May 25 18:48:01 bh-testlin1 kernel: CR2: 00000000000000a0
> > > May 25 18:48:01 bh-testlin1 kernel: ---[ end trace
> > > e76d1803ca214c10 ]---
> > >
> > > On May. 23, 2008, 21:29 +0300, andros at netapp.com
> >
> > > <mailto:andros at netapp.com> wrote:
> > > > From: Andy Adamson <andros at netapp.com <mailto:
> andros at netapp.com>>
> >
> > > >
> > > > We will need to handle several layers of session recovery
> > > depending on the
> > > > error. As a last resort, we will need to use the big
> hammer; destroy
> > > > an existing session and create a new one.
> > > >
> > > > The current code creates a new session without destroying
> the
> > > old one.
> > > > Destroy the existing session and initialize a new session
> before
> > > calling
> > > > nfs41_recover_session_sync.
> > > >
> > > > Signed-off-by: Andy Adamson<andros at netapp.com
> >
> > > <mailto:andros at netapp.com>>
> >
> > > > ---
> > > > fs/nfs/client.c | 3 ++-
> > > > fs/nfs/internal.h | 4 ++++
> > > > fs/nfs/nfs41_session_recovery.c | 25
> > > +++++++++++++++++++++++++
> > > > fs/nfs/nfs4proc.c | 4 ++--
> > > > include/linux/nfs41_session_recovery.h | 1 +
> > > > 5 files changed, 34 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > > > index a014f32..d1161ef 100644
> > > > --- a/fs/nfs/client.c
> > > > +++ b/fs/nfs/client.c
> > > > @@ -1116,7 +1116,7 @@ error:
> > > > /*
> > > > * Allocate and initialize a session if required,
> including its
> > > backchannel.
> > > > */
> > > > -static int nfs4_init_session(struct nfs_client *clp,
> struct
> > > nfs4_session **spp,
> > > > +int nfs4_init_session(struct nfs_client *clp, struct
> > > nfs4_session **spp,
> > > > struct rpc_clnt *clnt)
> > > > {
> > > > int error = 0;
> > > > @@ -1148,6 +1148,7 @@ static int nfs4_init_session(struct
> > > nfs_client *clp, struct nfs4_session **spp,
> > > > }
> > > > return error;
> > > > }
> > > > +EXPORT_SYMBOL(nfs4_init_session);
> > > > #endif /* CONFIG_NFS_V4_1 */
> > > >
> > > > /*
> > > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > > > index 22228bc..735dbb7 100644
> > > > --- a/fs/nfs/internal.h
> > > > +++ b/fs/nfs/internal.h
> > > > @@ -66,6 +66,10 @@ struct nfs_parsed_mount_data {
> > > > };
> > > >
> > > > /* client.c */
> > > > +#ifdef CONFIG_NFS_V4_1
> > > > +extern int nfs4_init_session(struct nfs_client *, struct
> > > nfs4_session **,
> > > > + struct rpc_clnt *);
> > > > +#endif /* CONFIG_NFS_V4_1 */
> > > > extern struct rpc_program nfs_program;
> > > >
> > > > extern void nfs_put_client(struct nfs_client *);
> > > > diff --git a/fs/nfs/nfs41_session_recovery.c
> > > b/fs/nfs/nfs41_session_recovery.c
> > > > index 46ddb55..cf02704 100644
> > > > --- a/fs/nfs/nfs41_session_recovery.c
> > > > +++ b/fs/nfs/nfs41_session_recovery.c
> > > > @@ -22,6 +22,7 @@
> > > > #include <linux/nfs_fs_sb.h>
> > > > #include <linux/nfs41_session_recovery.h>
> > > > #include "nfs4_fs.h"
> > > > +#include "internal.h"
> > > >
> > > > #define NFSDBG_FACILITY NFSDBG_PROC
> > > >
> > > > @@ -191,4 +192,28 @@ int nfs41_recover_session_sync(struct
> > > rpc_clnt *clnt, struct nfs_client *clp,
> > > > }
> > > > EXPORT_SYMBOL(nfs41_recover_session_sync);
> > > >
> > > > +/*
> > > > + * nfs41_new_session()
> > > > + *
> > > > + * Big Hammer. Destroy existing session and create a new
> session
> > > > + */
> > > > +int nfs41_new_session(struct nfs_server *server)
> > > > +{
> > > > + int ret;
> > > > +
> > > > + dprintk(" --> %s\n", __func__);
> > > > +
> > > > + nfs4_put_session(&server->session);
> > > > + ret = nfs4_init_session(server->nfs_client,
> &server->session,
> > > > + server->client);
> > > > + if (ret)
> > > > + goto out;
> > > > + ret = nfs41_recover_session_sync(server->client,
> > > server->nfs_client,
> > > > + server->session);
> > > > +out:
> > > > + dprintk(" <-- %s returns %d\n", __func__, ret);
> > > > + return ret;
> > > > +}
> > > > +EXPORT_SYMBOL(nfs41_new_session);
> > > > +
> > > > #endif /* CONFIG_NFS_V4_1 */
> > > > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > > > index ccec2e0..9c12bfb 100644
> > > > --- a/fs/nfs/nfs4proc.c
> > > > +++ b/fs/nfs/nfs4proc.c
> > > > @@ -3416,8 +3416,8 @@ static int
> nfs4_handle_exception(const
> > > struct nfs_server *server, int errorcode,
> > > > case -NFS4ERR_RETRY_UNCACHED_REP:
> > > > case -NFS4ERR_TOO_MANY_OPS:
> > > > case -NFS4ERR_OP_NOT_IN_SESSION:
> > > > - ret =
> > > nfs41_recover_session_sync(server->session->clnt,
> > > > - server->nfs_client,
> server->session);
> > > > +
> > > > + ret = nfs41_new_session((struct
> nfs_server
> > > *)server);
> > > > if (!ret) {
> > > > exception->retry = 1;
> > > > break;
> > > > diff --git a/include/linux/nfs41_session_recovery.h
> > > b/include/linux/nfs41_session_recovery.h
> > > > index 2f1df6f..5364b34 100644
> > > > --- a/include/linux/nfs41_session_recovery.h
> > > > +++ b/include/linux/nfs41_session_recovery.h
> > > > @@ -36,6 +36,7 @@ int nfs41_set_session_valid(struct
> > > nfs4_session *);
> > > > int nfs41_recover_session(struct nfs_client *, struct
> > > nfs4_session *);
> > > > int nfs41_recover_session_sync(struct rpc_clnt *, struct
> > > nfs_client *,
> > > > struct nfs4_session *);
> > > > +int nfs41_new_session(struct nfs_server *);
> > > >
> > > > #endif /* CONFIG_NFS_V4_1 */
> > > > #endif /* __NFS41_SESSION_RECOVERY_H__ */
> > >
> > > _______________________________________________
> > > pNFS mailing list
> >
> > > pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>
> >
> > > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> > >
> > >
> >
> >
> >
> >
> >
> > _______________________________________________
> > pNFS mailing list
> > pNFS at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20080528/df696ad3/attachment-0001.htm
More information about the pNFS
mailing list