[pnfs] Patch to fix ^C crash on mount

Marc Eshel eshel at almaden.ibm.com
Wed Jul 18 20:30:36 EDT 2007


"Iyer, Rahul" <Rahul.Iyer at netapp.com> wrote on 07/18/2007 04:42:46 PM:

> Hi Marc,
> By any chance is your MDS a DS too?

No.

> Thanks
> Regards
> Rahul
> 
> 
> > -----Original Message-----
> > From: Iyer, Rahul 
> > Sent: Wednesday, July 18, 2007 4:02 PM
> > To: Marc Eshel
> > Cc: pnfs at linux-nfs.org
> > Subject: Re: [pnfs] Patch to fix ^C crash on mount
> > 
> > Hmm... That is really odd! I've tried this sequence many 
> > times without a crash. If I do run into it, I'll look into it.
> > Regards
> > Rahul
> > 
> > 
> > > -----Original Message-----
> > > From: Marc Eshel [mailto:eshel at almaden.ibm.com]
> > > Sent: Wednesday, July 18, 2007 12:35 PM
> > > To: Iyer, Rahul
> > > Cc: William A. (Andy) Adamson; pnfs at linux-nfs.org
> > > Subject: Re: [pnfs] Patch to fix ^C crash on mount
> > > 
> > > I still get a crash after mount/umount/mount.
> > > Marc.
> > > 
> > > Jul 18 11:29:02 fin30 kernel: nfs4filelayout_init: NFSv4 
> > File Layout 
> > > Driver Registering...
> > > Jul 18 11:29:18 fin30 kernel: started cb service!
> > > Jul 18 11:29:19 fin30 kernel: nfs4_proc_create_session
> > > session->seqid 2
> > > Jul 18 11:29:25 fin30 kernel: device_create: exit err 0 clp 
> > f74bec00 
> > > Jul 18 11:29:25 fin30 kernel: nfs4_proc_create_session
> > > session->seqid 2
> > > Jul 18 11:29:50 fin30 kernel: started cb service!
> > > Jul 18 11:29:51 fin30 kernel: nfs4_proc_create_session
> > > session->seqid 2
> > > Jul 18 11:29:57 fin30 kernel: device_create: exit err 0 clp 
> > f74bec00 
> > > Jul 18 11:29:57 fin30 kernel: BUG: unable to handle kernel NULL 
> > > pointer dereference at virtual address 00000010 Jul 18 
> > 11:29:57 fin30 
> > > kernel:  printing eip:
> > > Jul 18 11:29:57 fin30 kernel: f8ac09e7 Jul 18 11:29:57 
> > fin30 kernel: 
> > > *pde = 00000000 Jul 18 11:29:57 fin30 kernel: Oops: 0000 
> > [#1] Jul 18 
> > > 11:29:57 fin30 kernel: SMP Jul 18 11:29:57 fin30 kernel: Modules 
> > > linked in:
> > > nfslayoutdriver autofs4
> > > nfs lockd sunrpc tg3 bonding qla2xxx firmware_class ide_cd cdrom sg
> > > Jul 18 11:29:57 fin30 kernel: CPU:    1
> > > Jul 18 11:29:57 fin30 kernel: EIP:    0060:[<f8ac09e7>] 
> > > Not tainted VLI
> > > Jul 18 11:29:57 fin30 kernel: EFLAGS: 00010246 
> > > (2.6.18.3-largeio-pnfs #8)
> > > Jul 18 11:29:57 fin30 kernel: EIP is at rpcauth_bindcred+0x74/0xb0 
> > > [sunrpc]
> > > Jul 18 11:29:57 fin30 kernel: eax: 00000000   ebx: f7fbbe00   ecx: 
> > > 00000000   edx: 00000000
> > > Jul 18 11:29:57 fin30 kernel: esi: f76d8080   edi: 00000000   ebp: 
> > > f74bec00   esp: f6f6d324
> > > Jul 18 11:29:57 fin30 kernel: ds: 007b   es: 007b   ss: 0068
> > > Jul 18 11:29:57 fin30 kernel: Process mount (pid: 4763, ti=f6f6c000 
> > > task=f74d5570 task.ti=f6f6c000) Jul 18 11:29:57 fin30 
> > kernel: Stack: 
> > > 00000000 00000000 f7f06080 00000000 f76d8080 f6f6dc08 f8abb41e 
> > > f76d8080
> > > Jul 18 11:29:57 fin30 kernel:        fffffff4 f8abb4bb 
> > > 00000000 00000000
> > > 00000000 00000000 f6f6dc18 f8b485f1
> > > Jul 18 11:29:57 fin30 kernel:        0002064c f74bec00 
> > > 00000000 00000000
> > > 00000000 00000000 00000000 00000000
> > > Jul 18 11:29:57 fin30 kernel: Call Trace:
> > > Jul 18 11:29:57 fin30 kernel:  [<f8abb41e>] 
> > rpc_call_setup+0x2d/0x44 
> > > [sunrpc] Jul 18 11:29:57 fin30 kernel:  [<f8abb4bb>]
> > > rpc_call_sync+0x59/0x91 [sunrpc]
> > > Jul 18 11:29:57 fin30 kernel:  [<f8b485f1>] 
> > > nfs41_proc_setup_session+0x197/0x2dd [nfs] Jul 18 11:29:57 fin30 
> > > kernel:  [<c01cbfd3>] vsscanf+0x3b4/0x3f0 Jul 18 11:29:57 fin30 
> > > kernel:  [<f893bc3c>]
> > > decode_and_add_device+0x341/0x452 [nfslayoutdriver] Jul 18 11:29:57 
> > > fin30 kernel:  [<f893bf9c>]
> > > decode_and_add_devicelist+0x18/0x37 [nfslayoutdriver] Jul 
> > 18 11:29:57 
> > > fin30 kernel:  [<f893b58a>]
> > > filelayout_initialize_mountpoint+0x99/0xd1 [nfslayoutdriver] Jul 18 
> > > 11:29:57 fin30 kernel:  [<f8b55412>]
> > > set_pnfs_layoutdriver+0x4e/0xb7 [nfs]
> > > Jul 18 11:29:57 fin30 kernel:  [<f8b361df>]
> > > nfs_sb_init+0x192/0x5f5 [nfs]
> > > Jul 18 11:29:57 fin30 kernel:  [<f8b36e3a>] nfs4_get_sb+0x48c/0x50f 
> > > [nfs] Jul 18 11:29:57 fin30 kernel:  [<c014fe7f>] 
> > > vfs_kern_mount+0x39/0x68 Jul 18 11:29:57 fin30 kernel: 
> > [<c014fee0>] 
> > > do_kern_mount+0x25/0x36 Jul 18 11:29:57 fin30 kernel:  [<c01617be>] 
> > > do_mount+0x581/0x5f1 Jul 18 11:29:57 fin30 kernel:  [<c0291736>] 
> > > nf_hook_slow+0x3a/0x90 Jul 18 11:29:57 fin30 kernel:  [<c02958e0>] 
> > > ip_rcv_finish+0x0/0x1ce Jul 18 11:29:57 fin30 kernel:  [<c029623a>]
> > > ip_local_deliver+0x159/0x203
> > > Jul 18 11:29:57 fin30 kernel:  [<c02960a7>] 
> > ip_rcv+0x3aa/0x3e4 Jul 18 
> > > 11:29:57 fin30 kernel:  [<c0284732>] 
> > netif_receive_skb+0x195/0x1fd Jul 
> > > 18 11:29:57 fin30 kernel:  [<f8950e77>] 
> > tg3_poll+0x613/0x669 [tg3] Jul 
> > > 18 11:29:57 fin30 kernel:  [<c0285ea2>] 
> > net_rx_action+0x63/0xe0 Jul 18 
> > > 11:29:57 fin30 kernel:  [<c011b4b9>] __do_softirq+0x5a/0xbb Jul 18 
> > > 11:29:58 fin30 kernel:  [<c0104b45>] do_IRQ+0x6b/0x76 Jul 
> > 18 11:29:58 
> > > fin30 kernel:  [<c01607ef>]
> > > copy_mount_options+0xa8/0x109
> > > Jul 18 11:29:58 fin30 kernel:  [<c016189b>] 
> > sys_mount+0x6d/0xaa Jul 18 
> > > 11:29:58 fin30 kernel:  [<c01027e5>]
> > > sysenter_past_esp+0x56/0x79
> > > Jul 18 11:29:58 fin30 kernel: Code: b0 00 00 00 50 68 a8 b0 
> > ac f8 e8 
> > > ec 6f 65 c7 83 c4 0c 8b 44 24 08 f0 ff 40 04 0f b7 46 64 83 
> > e0 40 83 
> > > f8 01 19 c
> > > 9 f7 d1 83 e1 02 <8b> 5f 10 89 e2 89 f8 ff 53 14 89 c3 3d 
> > 00 f0 ff ff 
> > > 77
> > > 05 89 46
> > > Jul 18 11:29:58 fin30 kernel: EIP: [<f8ac09e7>] 
> > > rpcauth_bindcred+0x74/0xb0 [sunrpc] SS:ESP 0068:f6f6d324
> > > (END)
> > > 
> > > 
> > > 
> > > Iyer, Rahul wrote:
> > > 
> > > >Hi guys,
> > > >This patch seems to have the side effect of eliminating the
> > > umount crash
> > > >as well. I'm still investigating as to why it worked. Either
> > > ways, the
> > > >client is now more stable than before. There are still a 
> > few issues:
> > > >- The open sequence counter seems to be encoded as is. So
> > > occasionally,
> > > >it winds up with a seqid of 1 and the server rejects it with 
> > > >NFSERR_INVAL.
> > > >- Read and write seem to update the lease value. This is 
> > not true in
> > > >NFSv4.1 IIRC. The current code does this and has the
> > > unfortunate effect
> > > >that if pNFS reads or writes run long (> lease time), then 
> > the client 
> > > >would think the lease is up to date and not send SEQUENCE 
> > ops to the 
> > > >MDS, resulting in NFSERR_BADSESSION/NFSERR_STALE_CLIENTID.
> > > >- The server code has a bug in EXCHANGE_ID which results 
> > in the long 
> > > >strings of NFSERR_CLID_INUSE errors. The current code does:
> > > >
> > > >        conf = find_confirmed_client_by_str(dname, strhashval);
> > > >        if (conf) {
> > > >                if (!cmp_creds(&conf->cl_cred, &rqstp->rq_cred) || 
> > > >(ip_addr != conf->cl_addr)) {
> > > >                        /* Client collision: send
> > > nfserr_clid_inuse */
> > > >                        goto out;
> > > >                }
> > > >
> > > >                if (cmp_verf(&verf, &conf->cl_verifier)) {
> > > >         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > >                        /* Client reboot: destroy old state */
> > > >                        expire_client(conf);
> > > >                        goto out_new;
> > > >                }
> > > >                /* router replay */
> > > >                goto out;
> > > >        }
> > > >
> > > >
> > > >In the highlighted line, it should be !cmp_verf because
> > > cmp_verf returns
> > > >true if the verifiers are the same.
> > > >
> > > >I'll work on these and send out the patches.
> > > >Regards
> > > >Rahul
> > > >
> > > >
> > > >
> > > >
> > > > 
> > > >
> > > >>-----Original Message-----
> > > >>From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu]
> > > >>Sent: Wednesday, July 18, 2007 9:47 AM
> > > >>To: Iyer, Rahul
> > > >>Cc: pnfs at linux-nfs.org
> > > >>Subject: Re: [pnfs] Patch to fix ^C crash on mount
> > > >>
> > > >>ok. applied to 4.1-sessions branch and merged with master
> > > >>
> > > >>-->Andy
> > > >>
> > > >>
> > > >>On 7/17/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
> > > >>
> > > >>   Hi,
> > > >>   If the mount hangs for some reason, and you hit ^C, the client 
> > > >>crashes as it tries to destroy a mempool and a  slab that 
> > could be 
> > > >>NULL. This patch check for the values before destroying them.
> > > >>   Regards
> > > >>   Rahul
> > > >> 
> > > >>
> > > >>   _______________________________________________
> > > >>   pNFS mailing list
> > > >>   pNFS at linux-nfs.org
> > > >>   http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs 
> > > >><http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs> 
> > > >> 
> > > >> 
> > > >>
> > > >>
> > > >>
> > > >> 
> > > >>
> > > >_______________________________________________
> > > >pNFS mailing list
> > > >pNFS at linux-nfs.org
> > > >http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> > > >
> > > >
> > > > 
> > > >
> > > 
> > _______________________________________________
> > pNFS mailing list
> > pNFS at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> > 



More information about the pNFS mailing list