[pnfs] callback problem
William A. (Andy) Adamson
andros at citi.umich.edu
Wed May 23 17:07:31 EDT 2007
hi rahul
what branch?
On 5/23/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
>
> Hi Andy,
> I get a NULL pointer dereference on the server side.
>
> Here's what I see on the server side:
>
> <start>
> Too many channel attr bitmaps!
> Unable to handle kernel NULL pointer dereference at 00000000000000b0
> RIP:
> [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> PGD 1fc581067 PUD 1fc55f067 PMD 0
> Oops: 0000 [1] PREEMPT SMP
> CPU 1
> Modules linked in: nfs nfsd exportfs lockd sunrpc
> Pid: 4910, comm: nfsd Not tainted 2.6.18.3 #47
> RIP: 0010:[<ffffffff880568e2>] [<ffffffff880568e2>]
> :nfsd:cmp_creds+0x2/0x10
> RSP: 0018:ffff8101fce01a78 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff8101fe7bf448 RCX: 0000000000000000
> RDX: 2222222222222222 RSI: ffff8101fe7bf448 RDI: 00000000000000b0
> RBP: ffff8101fce01b10 R08: ffffffff88079990 R09: ffffffff88079990
> R10: ffff8101fce01a90 R11: ffff8101fe7e9800 R12: ffff8101fcf23080
> R13: ffff8101fe7bf400 R14: ffff8101fce01a90 R15: 0000000000000003
> FS: 00002baa68a666d0(0000) GS:ffff8101ffc01340(0000)
> knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000000000b0 CR3: 00000001fcab6000 CR4: 00000000000006e0
> Process nfsd (pid: 4910, threadinfo ffff8101fce00000, task
> ffff8101fde60140)
> Stack: ffffffff8805a14d ffff8101fe7e9800 190c1eac80244b79
> 3864663632396533
> 3436393164623462 6335633234653932 3730383534633231 ffff8101fe7bf400
> ffff8101fcf23000 ffff810100000024 ffff8101fce5c088 590e5b2216a85446
> Call Trace:
> [<ffffffff8805a14d>] :nfsd:nfsd4_exchange_id+0x16d/0x2a0
> [<ffffffff8804fd5a>] :nfsd:nfsd4_proc_compound+0x15ba/0x1820
> [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> [<ffffffff80487e5f>] release_sock+0x2f/0xe0
> [<ffffffff88009fd9>] :sunrpc:svc_sock_enqueue+0x49/0x2c0
> [<ffffffff80242ba0>] check_usage+0x40/0x2b0
> [<ffffffff80244d7b>] trace_hardirqs_on+0x11b/0x150
> [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> [<ffffffff8800a22d>] :sunrpc:svc_sock_enqueue+0x29d/0x2c0
> [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> [<ffffffff804f39e0>] _read_unlock+0x30/0x60
> [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> [<ffffffff8800c801>] :sunrpc:ip_map_lookup+0xc1/0xd0
> [<ffffffff8805557a>] :nfsd:nfs4svc_decode_compoundargs+0xfaa/0x1080
> [<ffffffff8803e121>] :nfsd:nfsd_dispatch+0x101/0x1e0
> [<ffffffff880093fa>] :sunrpc:svc_process+0x46a/0x740
> [<ffffffff8803e6f8>] :nfsd:nfsd+0x1d8/0x380
> [<ffffffff8020ad2c>] child_rip+0xa/0x12
> [<ffffffff804f3bbb>] _spin_unlock_irq+0x2b/0x60
> [<ffffffff8020a33c>] restore_args+0x0/0x30
> [<ffffffff8803e520>] :nfsd:nfsd+0x0/0x380
> [<ffffffff8020ad22>] child_rip+0x0/0x12
>
>
> Code: 39 07 55 48 89 e5 c9 0f 94 c0 0f b6 c0 c3 48 8b 05 b1 a4 02
> RIP [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> RSP <ffff8101fce01a78>
> CR2: 00000000000000b0
>
> =====================================
> [ BUG: lock held at task exit time! ]
> -------------------------------------
> nfsd/4910 is exiting with locks still held!
> 2 locks held by nfsd/4910:
> #0: (hash_sem){..--}, at: [<ffffffff88045200>] exp_readlock+0x10/0x20
> [nfsd]
> #1: (client_mutex){--..}, at: [<ffffffff804f2015>]
> mutex_lock+0x25/0x30
>
> stack backtrace:
>
> Call Trace:
> [<ffffffff80241ab9>] debug_check_no_locks_held+0x89/0xa0
> [<ffffffff8022acf3>] do_exit+0x923/0x990
> [<ffffffff8021d4b3>] do_page_fault+0x7e3/0x8e0
> [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> [<ffffffff8026fae8>] kfree+0xf8/0x110
> [<ffffffff804f1fbb>] __mutex_lock_slowpath+0x23b/0x270
> [<ffffffff80244b79>] mark_held_locks+0x79/0xa0
> [<ffffffff8020ab5d>] error_exit+0x0/0x96
> [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> [<ffffffff8805a14d>] :nfsd:nfsd4_exchange_id+0x16d/0x2a0
> [<ffffffff8804fd5a>] :nfsd:nfsd4_proc_compound+0x15ba/0x1820
> [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> [<ffffffff80487e5f>] release_sock+0x2f/0xe0
> [<ffffffff88009fd9>] :sunrpc:svc_sock_enqueue+0x49/0x2c0
> [<ffffffff80242ba0>] check_usage+0x40/0x2b0
> [<ffffffff80244d7b>] trace_hardirqs_on+0x11b/0x150
> [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> [<ffffffff8800a22d>] :sunrpc:svc_sock_enqueue+0x29d/0x2c0
> [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> [<ffffffff804f39e0>] _read_unlock+0x30/0x60
> [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> [<ffffffff8800c801>] :sunrpc:ip_map_lookup+0xc1/0xd0
> [<ffffffff8805557a>] :nfsd:nfs4svc_decode_compoundargs+0xfaa/0x1080
> [<ffffffff8803e121>] :nfsd:nfsd_dispatch+0x101/0x1e0
> [<ffffffff880093fa>] :sunrpc:svc_process+0x46a/0x740
> [<ffffffff8803e6f8>] :nfsd:nfsd+0x1d8/0x380
> [<ffffffff8020ad2c>] child_rip+0xa/0x12
> [<ffffffff804f3bbb>] _spin_unlock_irq+0x2b/0x60
> [<ffffffff8020a33c>] restore_args+0x0/0x30
> [<ffffffff8803e520>] :nfsd:nfsd+0x0/0x380
> [<ffffffff8020ad22>] child_rip+0x0/0x12
>
>
> <end>
>
> Regards
> Rahul
>
>
> > -----Original Message-----
> > From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu]
> > Sent: Wednesday, May 23, 2007 1:42 PM
> > To: Marc Eshel
> > Cc: Iyer, Rahul; pnfs at linux-nfs.org
> > Subject: Re: [pnfs] callback problem
> >
> >
> >
> > On 5/23/07, Marc Eshel <eshel at almaden.ibm.com> wrote:
> >
> > Some more observation I will continue to debug.
> >
> > On the client side I got the following messages after
> > mount and 2 of them
> > where incorrect. The first one "Couldn't mount...." but
> > mount did complete
> > and I was able to use it. The seconds one "started cb
> > service!" is wrong
> > callback failed since the IP address was 0.0.0.0 on the
> > server side.
> >
> > [root at fin30 ~]# mount -t nfs4 fin18:/ /mnt
> > Message from syslogd at fin30 at Wed May 23 11:34:03 2007 ...
> > fin30 kernel: started cb service!
> > Message from syslogd at fin30 at Wed May 23 11:34:05 2007 ...
> > fin30 kernel: Couldn't mount using minorversion 1
> > Message from syslogd at fin30 at Wed May 23 11:34:05 2007 ...
> > fin30 kernel: started cb service!
> >
> >
> >
> > On the server side I had to comment list_move_tail line
> > since it was
> > crashing with a null pointer.
> >
> > static inline void
> > renew_client(struct nfs4_client *clp)
> > {
> > /*
> > * Move client to the end to the LRU list.
> > */
> > dprintk("renewing client (clientid %08x/%08x)\n",
> > clp->cl_clientid.cl_boot,
> > clp->cl_clientid.cl_id);
> > //??? list_move_tail(&clp->cl_lru, &client_lru);
> > clp->cl_time = get_seconds();
> > }
> >
> >
> > i think there is a race on the server regarding the
> > nfs_client - i've failed in
> >
> > 1) find_confirmed_client_by_str called by nfsd4_exchange_id
> > 2) succeeded in calling renew_client, and failed in nfsd4_sequence.
> >
> > -->Andy
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070523/7b411f5d/attachment-0001.htm
More information about the pNFS
mailing list