[pnfs] Patch to fix ^C crash on mount
Marc Eshel
eshel at almaden.ibm.com
Wed Jul 18 15:35:14 EDT 2007
I still get a crash after mount/umount/mount.
Marc.
Jul 18 11:29:02 fin30 kernel: nfs4filelayout_init: NFSv4 File Layout
Driver Registering...
Jul 18 11:29:18 fin30 kernel: started cb service!
Jul 18 11:29:19 fin30 kernel: nfs4_proc_create_session session->seqid 2
Jul 18 11:29:25 fin30 kernel: device_create: exit err 0 clp f74bec00
Jul 18 11:29:25 fin30 kernel: nfs4_proc_create_session session->seqid 2
Jul 18 11:29:50 fin30 kernel: started cb service!
Jul 18 11:29:51 fin30 kernel: nfs4_proc_create_session session->seqid 2
Jul 18 11:29:57 fin30 kernel: device_create: exit err 0 clp f74bec00
Jul 18 11:29:57 fin30 kernel: BUG: unable to handle kernel NULL pointer
dereference at virtual address 00000010
Jul 18 11:29:57 fin30 kernel: printing eip:
Jul 18 11:29:57 fin30 kernel: f8ac09e7
Jul 18 11:29:57 fin30 kernel: *pde = 00000000
Jul 18 11:29:57 fin30 kernel: Oops: 0000 [#1]
Jul 18 11:29:57 fin30 kernel: SMP
Jul 18 11:29:57 fin30 kernel: Modules linked in: nfslayoutdriver autofs4
nfs lockd sunrpc tg3 bonding qla2xxx firmware_class ide_cd cdrom sg
Jul 18 11:29:57 fin30 kernel: CPU: 1
Jul 18 11:29:57 fin30 kernel: EIP: 0060:[<f8ac09e7>] Not tainted VLI
Jul 18 11:29:57 fin30 kernel: EFLAGS: 00010246 (2.6.18.3-largeio-pnfs #8)
Jul 18 11:29:57 fin30 kernel: EIP is at rpcauth_bindcred+0x74/0xb0 [sunrpc]
Jul 18 11:29:57 fin30 kernel: eax: 00000000 ebx: f7fbbe00 ecx:
00000000 edx: 00000000
Jul 18 11:29:57 fin30 kernel: esi: f76d8080 edi: 00000000 ebp:
f74bec00 esp: f6f6d324
Jul 18 11:29:57 fin30 kernel: ds: 007b es: 007b ss: 0068
Jul 18 11:29:57 fin30 kernel: Process mount (pid: 4763, ti=f6f6c000
task=f74d5570 task.ti=f6f6c000)
Jul 18 11:29:57 fin30 kernel: Stack: 00000000 00000000 f7f06080 00000000
f76d8080 f6f6dc08 f8abb41e f76d8080
Jul 18 11:29:57 fin30 kernel: fffffff4 f8abb4bb 00000000 00000000
00000000 00000000 f6f6dc18 f8b485f1
Jul 18 11:29:57 fin30 kernel: 0002064c f74bec00 00000000 00000000
00000000 00000000 00000000 00000000
Jul 18 11:29:57 fin30 kernel: Call Trace:
Jul 18 11:29:57 fin30 kernel: [<f8abb41e>] rpc_call_setup+0x2d/0x44
[sunrpc]
Jul 18 11:29:57 fin30 kernel: [<f8abb4bb>] rpc_call_sync+0x59/0x91 [sunrpc]
Jul 18 11:29:57 fin30 kernel: [<f8b485f1>]
nfs41_proc_setup_session+0x197/0x2dd [nfs]
Jul 18 11:29:57 fin30 kernel: [<c01cbfd3>] vsscanf+0x3b4/0x3f0
Jul 18 11:29:57 fin30 kernel: [<f893bc3c>]
decode_and_add_device+0x341/0x452 [nfslayoutdriver]
Jul 18 11:29:57 fin30 kernel: [<f893bf9c>]
decode_and_add_devicelist+0x18/0x37 [nfslayoutdriver]
Jul 18 11:29:57 fin30 kernel: [<f893b58a>]
filelayout_initialize_mountpoint+0x99/0xd1 [nfslayoutdriver]
Jul 18 11:29:57 fin30 kernel: [<f8b55412>]
set_pnfs_layoutdriver+0x4e/0xb7 [nfs]
Jul 18 11:29:57 fin30 kernel: [<f8b361df>] nfs_sb_init+0x192/0x5f5 [nfs]
Jul 18 11:29:57 fin30 kernel: [<f8b36e3a>] nfs4_get_sb+0x48c/0x50f [nfs]
Jul 18 11:29:57 fin30 kernel: [<c014fe7f>] vfs_kern_mount+0x39/0x68
Jul 18 11:29:57 fin30 kernel: [<c014fee0>] do_kern_mount+0x25/0x36
Jul 18 11:29:57 fin30 kernel: [<c01617be>] do_mount+0x581/0x5f1
Jul 18 11:29:57 fin30 kernel: [<c0291736>] nf_hook_slow+0x3a/0x90
Jul 18 11:29:57 fin30 kernel: [<c02958e0>] ip_rcv_finish+0x0/0x1ce
Jul 18 11:29:57 fin30 kernel: [<c029623a>] ip_local_deliver+0x159/0x203
Jul 18 11:29:57 fin30 kernel: [<c02960a7>] ip_rcv+0x3aa/0x3e4
Jul 18 11:29:57 fin30 kernel: [<c0284732>] netif_receive_skb+0x195/0x1fd
Jul 18 11:29:57 fin30 kernel: [<f8950e77>] tg3_poll+0x613/0x669 [tg3]
Jul 18 11:29:57 fin30 kernel: [<c0285ea2>] net_rx_action+0x63/0xe0
Jul 18 11:29:57 fin30 kernel: [<c011b4b9>] __do_softirq+0x5a/0xbb
Jul 18 11:29:58 fin30 kernel: [<c0104b45>] do_IRQ+0x6b/0x76
Jul 18 11:29:58 fin30 kernel: [<c01607ef>] copy_mount_options+0xa8/0x109
Jul 18 11:29:58 fin30 kernel: [<c016189b>] sys_mount+0x6d/0xaa
Jul 18 11:29:58 fin30 kernel: [<c01027e5>] sysenter_past_esp+0x56/0x79
Jul 18 11:29:58 fin30 kernel: Code: b0 00 00 00 50 68 a8 b0 ac f8 e8 ec
6f 65 c7 83 c4 0c 8b 44 24 08 f0 ff 40 04 0f b7 46 64 83 e0 40 83 f8 01 19 c
9 f7 d1 83 e1 02 <8b> 5f 10 89 e2 89 f8 ff 53 14 89 c3 3d 00 f0 ff ff 77
05 89 46
Jul 18 11:29:58 fin30 kernel: EIP: [<f8ac09e7>]
rpcauth_bindcred+0x74/0xb0 [sunrpc] SS:ESP 0068:f6f6d324
(END)
Iyer, Rahul wrote:
>Hi guys,
>This patch seems to have the side effect of eliminating the umount crash
>as well. I'm still investigating as to why it worked. Either ways, the
>client is now more stable than before. There are still a few issues:
>- The open sequence counter seems to be encoded as is. So occasionally,
>it winds up with a seqid of 1 and the server rejects it with
>NFSERR_INVAL.
>- Read and write seem to update the lease value. This is not true in
>NFSv4.1 IIRC. The current code does this and has the unfortunate effect
>that if pNFS reads or writes run long (> lease time), then the client
>would think the lease is up to date and not send SEQUENCE ops to the
>MDS, resulting in NFSERR_BADSESSION/NFSERR_STALE_CLIENTID.
>- The server code has a bug in EXCHANGE_ID which results in the long
>strings of NFSERR_CLID_INUSE errors. The current code does:
>
> conf = find_confirmed_client_by_str(dname, strhashval);
> if (conf) {
> if (!cmp_creds(&conf->cl_cred, &rqstp->rq_cred) ||
>(ip_addr != conf->cl_addr)) {
> /* Client collision: send nfserr_clid_inuse */
> goto out;
> }
>
> if (cmp_verf(&verf, &conf->cl_verifier)) {
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> /* Client reboot: destroy old state */
> expire_client(conf);
> goto out_new;
> }
> /* router replay */
> goto out;
> }
>
>
>In the highlighted line, it should be !cmp_verf because cmp_verf returns
>true if the verifiers are the same.
>
>I'll work on these and send out the patches.
>Regards
>Rahul
>
>
>
>
>
>
>>-----Original Message-----
>>From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu]
>>Sent: Wednesday, July 18, 2007 9:47 AM
>>To: Iyer, Rahul
>>Cc: pnfs at linux-nfs.org
>>Subject: Re: [pnfs] Patch to fix ^C crash on mount
>>
>>ok. applied to 4.1-sessions branch and merged with master
>>
>>-->Andy
>>
>>
>>On 7/17/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
>>
>> Hi,
>> If the mount hangs for some reason, and you hit ^C, the
>>client crashes as it tries to destroy a mempool and a slab
>>that could be NULL. This patch check for the values before
>>destroying them.
>> Regards
>> Rahul
>>
>>
>> _______________________________________________
>> pNFS mailing list
>> pNFS at linux-nfs.org
>> http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>><http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs>
>>
>>
>>
>>
>>
>>
>>
>_______________________________________________
>pNFS mailing list
>pNFS at linux-nfs.org
>http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
>
>
>
>
More information about the pNFS
mailing list