[pnfs] [PATCH] nfs41: Correctly use new session in exception handling

William A. (Andy) Adamson androsadamson at gmail.com
Wed May 28 08:39:01 EDT 2008


cool - good work! i'll turn on CONFIG_DEBUG_LIST in my kernels.....

-->Andy

On Wed, May 28, 2008 at 4:07 AM, Benny Halevy <bhalevy at panasas.com> wrote:

> Andy,
> I did hit a related problem with latest code on the
> same code path (with CONFIG_DEBUG_LIST=y).
>
> This is caused by lack of reference on clp->cl_cb_xprt that's
> passed down from nfsd4_create_session to do_probe_callback.
> The problem is aggravated by fast timing since the request that
> carried the create_session rpc has to be destroyed before
> do_probe_callback uses (after free) the svc_xprt and that's
> probably why this happens rarely (if at all) with debug
> printouts.
>
> I'll send a patch to fix this in reply to this message.
>
> Benny
>
> On May. 28, 2008, 0:27 +0300, "Halevy, Benny" <bhalevy at panasas.com> wrote:
> > Thanks for trying.  I'll try to reproduce tomorrow with the latest code
> > on my setup.
> >
> > Benny
> >
> > ________________________________
> >
> > From: androsadamson at gmail.com on behalf of William A. (Andy) Adamson
> > Sent: Tue 2008-05-27 21:10
> > To: Halevy, Benny
> > Cc: andros at netapp.com; pnfs at linux-nfs.org
> > Subject: Re: [pnfs] [PATCH] nfs41: Correctly use new session in exception
> handling
> >
> >
> > I turned all debug off on both client and server, still was unable to
> reproduce the problem, got the same results as below.... :(
> >
> > -->Andy
> >
> >
> > On Tue, May 27, 2008 at 12:53 PM, Benny Halevy <bhalevy at panasas.com>
> wrote:
> >
> >
> >       I forgot to mention that I couldn't reproduce this with debug
> printing on...
> >       Did you try with nfsd_debug by any chance?
> >       Benny
> >
> >
> >       William A. (Andy) Adamson wrote:
> >       > Hi Benny
> >       >
> >       > I can't reproduce the problem. Using todays tree, origin/nfs41
> branch
> >       > on client and server, I attempt to mount a non-existing path: the
> >       > server does not export a /unix4.
> >       >
> >       > mount -t nfs4 192.168.1.105:/unix4 /mnt
> >       >
> >       > What I see is a successful EXCHANGE_ID, CREATE_SESSION (with a
> >       > successful callback channel established),
> SEQ/PUTROOTFH/GETFH/GETATTR,
> >       > a couple of more successful SEQ/PUTFH/GETATTRS to establish the
> root,
> >       > then a PUTFH/LOOKUP/GETATTR for /unix4 that fails, followed by a
> >       > DESTROY_SESSION.
> >       >
> >       > Are you doing something different?
> >       >
> >       > Here is my /etc/exports file:
> >       >
> >       > /var/lib/nfs/v4root
> *(fsid=0,rw,insecure,no_root_squash,nohide,crossmnt)
> >       > /var/lib/nfs/v4root/spnfs
> *(rw,insecure,no_root_squash,nohide,crossmnt)
> >       > /var/lib/nfs/v4root/unix
> *(rw,insecure,no_root_squash,nohide,crossmnt)
> >       >
> >       > with
> >       > /export on /var/lib/nfs/v4root/spnfs type none (rw,bind)
> >       > /unix on /var/lib/nfs/v4root/unix type none (rw,bind)
> >       >
> >       >
> >       > -->Andy
> >       >
> >       > On Sun, May 25, 2008 at 1:19 PM, Benny Halevy <
> bhalevy at panasas.com
> >
> >       > <mailto:bhalevy at panasas.com>> wrote:
> >       >
> >       >     Andy, I think that this approach is better than what we had
> >       >     before but bare in mind that it breaks the server when
> >       >     the exported path is not found.
> >       >
> >       >     I get the following oops when the client mounts a
> non-existing path:
> >       >     (with this patch applied)
> >       >
> >       >     May 25 18:48:01 bh-testlin1 kernel: BUG: unable to handle
> kernel
> >       >     NULL pointer dereference at 00000000000000a0
> >       >     May 25 18:48:01 bh-testlin1 kernel: IP: [<ffffffff88263233>]
> >       >     :sunrpc:bc_send_request+0xc0/0x1e2
> >       >     May 25 18:48:01 bh-testlin1 kernel: PGD 78800067 PUD 78811067
> PMD 0
> >       >     May 25 18:48:01 bh-testlin1 kernel: Oops: 0000 [1] SMP
> >       >     May 25 18:48:01 bh-testlin1 kernel: CPU 0
> >       >     May 25 18:48:01 bh-testlin1 kernel: Modules linked in: nfsd
> >       >     auth_rpcgss exportfs autofs4 nfs lockd nfs_acl sunrpc ipv6
> >       >     dm_mirror dm_multipath dm_mod video output sbs sbshc battery
> ac
> >       >     kvm_amd kvm snd_hda_intel button snd_pcm k8temp i2c_nforce2
> >       >     i2c_core snd_timer pcspkr snd_page_alloc snd_hwdep sr_mod snd
> >       >     soundcore hwmon forcedeth serio_raw sg cdrom pata_amd
> ata_generic
> >       >     pata_acpi sata_nv libata sd_mod scsi_mod ext3 jbd mbcache
> ehci_hcd
> >       >     ohci_hcd uhci_hcd [last unloaded: scsi_wait_scan]
> >       >     May 25 18:48:01 bh-testlin1 kernel: Pid: 3271, comm:
> nfs4_cb_probe
> >       >     Not tainted 2.6.25-nfs41 #39
> >       >     May 25 18:48:01 bh-testlin1 kernel: RIP:
> 0010:[<ffffffff88263233>]
> >       >      [<ffffffff88263233>] :sunrpc:bc_send_request+0xc0/0x1e2
> >       >     May 25 18:48:01 bh-testlin1 kernel: RSP:
> 0018:ffff810078991d10
> >       >      EFLAGS: 00010246
> >       >     May 25 18:48:01 bh-testlin1 kernel: RAX: 0000000000000000
> RBX:
> >       >     0000000000000074 RCX: 0000000000000005
> >       >     May 25 18:48:01 bh-testlin1 kernel: RDX: ffff8100788ec008
> RSI:
> >       >     0000000000000057 RDI: ffff8100788ec008
> >       >     May 25 18:48:01 bh-testlin1 kernel: RBP: ffff8100788ec008
> R08:
> >       >     ffff81007880eb60 R09: ffff81007880eb20
> >       >     May 25 18:48:01 bh-testlin1 kernel: R10: 0000000000000000
> R11:
> >       >     ffffffff88267550 R12: ffff81007893e000
> >       >     May 25 18:48:01 bh-testlin1 kernel: R13: ffff81007893e008
> R14:
> >       >     0000000000000000 R15: ffff8100798ee340
> >       >     May 25 18:48:01 bh-testlin1 kernel: FS:
>  00007f05577b26f0(0000)
> >       >     GS:ffffffff8139c000(0000) knlGS:0000000000000000
> >       >     May 25 18:48:01 bh-testlin1 kernel: CS:  0010 DS: 0018 ES:
> 0018
> >       >     CR0: 000000008005003b
> >       >     May 25 18:48:01 bh-testlin1 kernel: CR2: 00000000000000a0
> CR3:
> >       >     0000000078810000 CR4: 00000000000006e0
> >       >     May 25 18:48:01 bh-testlin1 kernel: DR0: 0000000000000000
> DR1:
> >       >     0000000000000000 DR2: 0000000000000000
> >       >     May 25 18:48:01 bh-testlin1 kernel: DR3: 0000000000000000
> DR6:
> >       >     00000000ffff0ff0 DR7: 0000000000000400
> >       >     May 25 18:48:01 bh-testlin1 kernel: Process nfs4_cb_probe
> (pid:
> >       >     3271, threadinfo ffff810078990000, task ffff810076157360)
> >       >     May 25 18:48:01 bh-testlin1 kernel: Stack:  00000000ffffffff
> >       >     ffff810079d1c258 0000000000000286 0000000000000000
> >       >     May 25 18:48:01 bh-testlin1 kernel:  0000000000000074
> >       >     ffff81007880eb20 ffff81007880e800 ffff81007893e000
> >       >     May 25 18:48:01 bh-testlin1 kernel:  ffff81007e1f0200
> >       >     ffff81007893e0b8 0000000000000000 ffffffff8826197b
> >       >     May 25 18:48:01 bh-testlin1 kernel: Call Trace:
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8826197b>] ?
> >       >     :sunrpc:xprt_transmit+0xd3/0x1c1
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8832873c>] ?
> >       >     :nfsd:nfs4_xdr_enc_cb_null+0x0/0x50
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8825f4ab>] ?
> >       >     :sunrpc:call_transmit+0x200/0x23f
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff88265f3f>] ?
> >       >     :sunrpc:__rpc_execute+0x7b/0x21a
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8825fc62>] ?
> >       >     :sunrpc:rpc_run_task+0x4f/0x56
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8825fcfe>] ?
> >       >     :sunrpc:rpc_call_sync+0x3e/0x5b
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff88328e8f>] ?
> >       >     :nfsd:do_probe_callback+0x250/0x2c6
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff88328c3f>] ?
> >       >     :nfsd:do_probe_callback+0x0/0x2c6
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff81043ddd>] ?
> >       >     kthread+0x47/0x76
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8100cac8>] ?
> >       >     child_rip+0xa/0x12
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff81043d96>] ?
> >       >     kthread+0x0/0x76
> >       >     May 25 18:48:01 bh-testlin1 kernel:  [<ffffffff8100cabe>] ?
> >       >     child_rip+0x0/0x12
> >       >     May 25 18:48:01 bh-testlin1 kernel:
> >       >     May 25 18:48:01 bh-testlin1 kernel:
> >       >     May 25 18:48:01 bh-testlin1 kernel: Code: 49 8b 5d 08 48 63
> c7 c7
> >       >     44 24 14 00 80 00 00 48 39 d8 b8 00
> >       >      00 00 00 0f 45 44 24 14 89 44 24 14 48 8b 6d 08 49 8b 47 10
> 48 89
> >       >     ef <4c> 8b a0 a0 00 00 00 e8 61 f2
> >       >      db f8 48 c1 e8 0c 89 ea 48 be 00
> >       >     May 25 18:48:01 bh-testlin1 kernel: RIP  [<ffffffff88263233>]
> >       >     :sunrpc:bc_send_request+0xc0/0x1e2
> >       >     May 25 18:48:01 bh-testlin1 kernel:  RSP <ffff810078991d10>
> >       >     May 25 18:48:01 bh-testlin1 kernel: CR2: 00000000000000a0
> >       >     May 25 18:48:01 bh-testlin1 kernel: ---[ end trace
> >       >     e76d1803ca214c10 ]---
> >       >
> >       >     On May. 23, 2008, 21:29 +0300, andros at netapp.com
> >
> >       >     <mailto:andros at netapp.com> wrote:
> >       >     > From: Andy Adamson <andros at netapp.com <mailto:
> andros at netapp.com>>
> >
> >       >     >
> >       >     > We will need to handle several layers of session recovery
> >       >     depending on the
> >       >     > error. As a last resort, we will need to use the big
> hammer; destroy
> >       >     > an existing session and create a new one.
> >       >     >
> >       >     > The current code creates a new session without destroying
> the
> >       >     old one.
> >       >     > Destroy the existing session and initialize a new session
> before
> >       >     calling
> >       >     > nfs41_recover_session_sync.
> >       >     >
> >       >     > Signed-off-by: Andy Adamson<andros at netapp.com
> >
> >       >     <mailto:andros at netapp.com>>
> >
> >       >     > ---
> >       >     >  fs/nfs/client.c                        |    3 ++-
> >       >     >  fs/nfs/internal.h                      |    4 ++++
> >       >     >  fs/nfs/nfs41_session_recovery.c        |   25
> >       >     +++++++++++++++++++++++++
> >       >     >  fs/nfs/nfs4proc.c                      |    4 ++--
> >       >     >  include/linux/nfs41_session_recovery.h |    1 +
> >       >     >  5 files changed, 34 insertions(+), 3 deletions(-)
> >       >     >
> >       >     > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> >       >     > index a014f32..d1161ef 100644
> >       >     > --- a/fs/nfs/client.c
> >       >     > +++ b/fs/nfs/client.c
> >       >     > @@ -1116,7 +1116,7 @@ error:
> >       >     >  /*
> >       >     >   * Allocate and initialize a session if required,
> including its
> >       >     backchannel.
> >       >     >   */
> >       >     > -static int nfs4_init_session(struct nfs_client *clp,
> struct
> >       >     nfs4_session **spp,
> >       >     > +int nfs4_init_session(struct nfs_client *clp, struct
> >       >     nfs4_session **spp,
> >       >     >                            struct rpc_clnt *clnt)
> >       >     >  {
> >       >     >       int error = 0;
> >       >     > @@ -1148,6 +1148,7 @@ static int nfs4_init_session(struct
> >       >     nfs_client *clp, struct nfs4_session **spp,
> >       >     >       }
> >       >     >       return error;
> >       >     >  }
> >       >     > +EXPORT_SYMBOL(nfs4_init_session);
> >       >     >  #endif /* CONFIG_NFS_V4_1 */
> >       >     >
> >       >     >  /*
> >       >     > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> >       >     > index 22228bc..735dbb7 100644
> >       >     > --- a/fs/nfs/internal.h
> >       >     > +++ b/fs/nfs/internal.h
> >       >     > @@ -66,6 +66,10 @@ struct nfs_parsed_mount_data {
> >       >     >  };
> >       >     >
> >       >     >  /* client.c */
> >       >     > +#ifdef CONFIG_NFS_V4_1
> >       >     > +extern int nfs4_init_session(struct nfs_client *, struct
> >       >     nfs4_session **,
> >       >     > +                                     struct rpc_clnt *);
> >       >     > +#endif /* CONFIG_NFS_V4_1 */
> >       >     >  extern struct rpc_program nfs_program;
> >       >     >
> >       >     >  extern void nfs_put_client(struct nfs_client *);
> >       >     > diff --git a/fs/nfs/nfs41_session_recovery.c
> >       >     b/fs/nfs/nfs41_session_recovery.c
> >       >     > index 46ddb55..cf02704 100644
> >       >     > --- a/fs/nfs/nfs41_session_recovery.c
> >       >     > +++ b/fs/nfs/nfs41_session_recovery.c
> >       >     > @@ -22,6 +22,7 @@
> >       >     >  #include <linux/nfs_fs_sb.h>
> >       >     >  #include <linux/nfs41_session_recovery.h>
> >       >     >  #include "nfs4_fs.h"
> >       >     > +#include "internal.h"
> >       >     >
> >       >     >  #define NFSDBG_FACILITY              NFSDBG_PROC
> >       >     >
> >       >     > @@ -191,4 +192,28 @@ int nfs41_recover_session_sync(struct
> >       >     rpc_clnt *clnt, struct nfs_client *clp,
> >       >     >  }
> >       >     >  EXPORT_SYMBOL(nfs41_recover_session_sync);
> >       >     >
> >       >     > +/*
> >       >     > + * nfs41_new_session()
> >       >     > + *
> >       >     > + * Big Hammer. Destroy existing session and create a new
> session
> >       >     > + */
> >       >     > +int nfs41_new_session(struct nfs_server *server)
> >       >     > +{
> >       >     > +     int ret;
> >       >     > +
> >       >     > +     dprintk(" --> %s\n", __func__);
> >       >     > +
> >       >     > +     nfs4_put_session(&server->session);
> >       >     > +     ret = nfs4_init_session(server->nfs_client,
> &server->session,
> >       >     > +                             server->client);
> >       >     > +     if (ret)
> >       >     > +             goto out;
> >       >     > +     ret = nfs41_recover_session_sync(server->client,
> >       >     server->nfs_client,
> >       >     > +                             server->session);
> >       >     > +out:
> >       >     > +     dprintk(" <-- %s returns %d\n", __func__, ret);
> >       >     > +     return ret;
> >       >     > +}
> >       >     > +EXPORT_SYMBOL(nfs41_new_session);
> >       >     > +
> >       >     >  #endif /* CONFIG_NFS_V4_1 */
> >       >     > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> >       >     > index ccec2e0..9c12bfb 100644
> >       >     > --- a/fs/nfs/nfs4proc.c
> >       >     > +++ b/fs/nfs/nfs4proc.c
> >       >     > @@ -3416,8 +3416,8 @@ static int
> nfs4_handle_exception(const
> >       >     struct nfs_server *server, int errorcode,
> >       >     >               case -NFS4ERR_RETRY_UNCACHED_REP:
> >       >     >               case -NFS4ERR_TOO_MANY_OPS:
> >       >     >               case -NFS4ERR_OP_NOT_IN_SESSION:
> >       >     > -                     ret =
> >       >     nfs41_recover_session_sync(server->session->clnt,
> >       >     > -                             server->nfs_client,
> server->session);
> >       >     > +
> >       >     > +                     ret = nfs41_new_session((struct
> nfs_server
> >       >     *)server);
> >       >     >                       if (!ret) {
> >       >     >                               exception->retry = 1;
> >       >     >                               break;
> >       >     > diff --git a/include/linux/nfs41_session_recovery.h
> >       >     b/include/linux/nfs41_session_recovery.h
> >       >     > index 2f1df6f..5364b34 100644
> >       >     > --- a/include/linux/nfs41_session_recovery.h
> >       >     > +++ b/include/linux/nfs41_session_recovery.h
> >       >     > @@ -36,6 +36,7 @@ int nfs41_set_session_valid(struct
> >       >     nfs4_session *);
> >       >     >  int nfs41_recover_session(struct nfs_client *, struct
> >       >     nfs4_session *);
> >       >     >  int nfs41_recover_session_sync(struct rpc_clnt *, struct
> >       >     nfs_client *,
> >       >     >                              struct nfs4_session *);
> >       >     > +int nfs41_new_session(struct nfs_server *);
> >       >     >
> >       >     >  #endif       /* CONFIG_NFS_V4_1 */
> >       >     >  #endif       /* __NFS41_SESSION_RECOVERY_H__ */
> >       >
> >       >     _______________________________________________
> >       >     pNFS mailing list
> >
> >       >     pNFS at linux-nfs.org <mailto:pNFS at linux-nfs.org>
> >
> >       >     http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> >       >
> >       >
> >
> >
> >
> >
> >
> > _______________________________________________
> > pNFS mailing list
> > pNFS at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20080528/df696ad3/attachment-0001.htm 


More information about the pNFS mailing list