[pnfs] callback problem

William A. (Andy) Adamson andros at citi.umich.edu
Wed May 23 17:17:08 EDT 2007


On 5/23/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
>
>
>
> > -----Original Message-----
> > From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu]
> > Sent: Wednesday, May 23, 2007 2:08 PM
> > To: Iyer, Rahul
> > Cc: pnfs at linux-nfs.org
> > Subject: Re: [pnfs] callback problem
> >
> > hi rahul
> >
> > what branch?
> >
> It's the 4.1-sessions branch. Also, oddly enough, I'm not hitting the
> mempool error.



i got the mempool error when exchange_id failed in setup_session. the "Too
many channel attr bitmaps" message is in nfsd4_decode_create_session which
implies that exchange_id succeeded.

-->Andy


This time around, I had the v4.1 mount fail and the v4.0
> mount succeed. The server did have a NULL pointer dereference though.
> Regards
> Rahul.
>
> >
> >
> > On 5/23/07, Iyer, Rahul <Rahul.Iyer at netapp.com> wrote:
> >
> >       Hi Andy,
> >       I get a NULL pointer dereference on the server side.
> >
> >       Here's what I see on the server side:
> >
> >       <start>
> >       Too many channel attr bitmaps!
> >       Unable to handle kernel NULL pointer dereference at
> > 00000000000000b0
> >       RIP:
> >       [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> >       PGD 1fc581067 PUD 1fc55f067 PMD 0
> >       Oops: 0000 [1] PREEMPT SMP
> >       CPU 1
> >       Modules linked in: nfs nfsd exportfs lockd sunrpc
> >       Pid: 4910, comm: nfsd Not tainted 2.6.18.3 #47
> >       RIP: 0010:[<ffffffff880568e2>]  [<ffffffff880568e2>]
> >       :nfsd:cmp_creds+0x2/0x10
> >       RSP: 0018:ffff8101fce01a78  EFLAGS: 00010286
> >       RAX: 0000000000000000 RBX: ffff8101fe7bf448 RCX:
> > 0000000000000000
> >       RDX: 2222222222222222 RSI: ffff8101fe7bf448 RDI:
> > 00000000000000b0
> >       RBP: ffff8101fce01b10 R08: ffffffff88079990 R09:
> > ffffffff88079990
> >       R10: ffff8101fce01a90 R11: ffff8101fe7e9800 R12:
> > ffff8101fcf23080
> >       R13: ffff8101fe7bf400 R14: ffff8101fce01a90 R15:
> > 0000000000000003
> >       FS:  00002baa68a666d0(0000) GS:ffff8101ffc01340(0000)
> >       knlGS:0000000000000000
> >       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> >       CR2: 00000000000000b0 CR3: 00000001fcab6000 CR4:
> > 00000000000006e0
> >       Process nfsd (pid: 4910, threadinfo ffff8101fce00000, task
> >       ffff8101fde60140)
> >       Stack:  ffffffff8805a14d ffff8101fe7e9800 190c1eac80244b79
> >       3864663632396533
> >       3436393164623462 6335633234653932 3730383534633231
> > ffff8101fe7bf400
> >       ffff8101fcf23000 ffff810100000024 ffff8101fce5c088
> > 590e5b2216a85446
> >       Call Trace:
> >       [<ffffffff8805a14d>] :nfsd:nfsd4_exchange_id+0x16d/0x2a0
> >       [<ffffffff8804fd5a>] :nfsd:nfsd4_proc_compound+0x15ba/0x1820
> >       [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> >       [<ffffffff80487e5f>] release_sock+0x2f/0xe0
> >       [<ffffffff88009fd9>] :sunrpc:svc_sock_enqueue+0x49/0x2c0
> >       [<ffffffff80242ba0>] check_usage+0x40/0x2b0
> >       [<ffffffff80244d7b>] trace_hardirqs_on+0x11b/0x150
> >       [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> >       [<ffffffff8800a22d>] :sunrpc:svc_sock_enqueue+0x29d/0x2c0
> >       [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> >       [<ffffffff804f39e0>] _read_unlock+0x30/0x60
> >       [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> >       [<ffffffff8800c801>] :sunrpc:ip_map_lookup+0xc1/0xd0
> >       [<ffffffff8805557a>]
> > :nfsd:nfs4svc_decode_compoundargs+0xfaa/0x1080
> >       [<ffffffff8803e121>] :nfsd:nfsd_dispatch+0x101/0x1e0
> >       [<ffffffff880093fa>] :sunrpc:svc_process+0x46a/0x740
> >       [<ffffffff8803e6f8>] :nfsd:nfsd+0x1d8/0x380
> >       [<ffffffff8020ad2c>] child_rip+0xa/0x12
> >       [<ffffffff804f3bbb>] _spin_unlock_irq+0x2b/0x60
> >       [<ffffffff8020a33c>] restore_args+0x0/0x30
> >       [<ffffffff8803e520>] :nfsd:nfsd+0x0/0x380
> >       [<ffffffff8020ad22>] child_rip+0x0/0x12
> >
> >
> >       Code: 39 07 55 48 89 e5 c9 0f 94 c0 0f b6 c0 c3 48 8b
> > 05 b1 a4 02
> >       RIP  [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> >       RSP <ffff8101fce01a78>
> >       CR2: 00000000000000b0
> >
> >       =====================================
> >       [ BUG: lock held at task exit time! ]
> >       -------------------------------------
> >       nfsd/4910 is exiting with locks still held!
> >       2 locks held by nfsd/4910:
> >       #0:  (hash_sem){..--}, at: [<ffffffff88045200>]
> > exp_readlock+0x10/0x20
> >       [nfsd]
> >       #1:  (client_mutex){--..}, at: [<ffffffff804f2015>]
> >       mutex_lock+0x25/0x30
> >
> >       stack backtrace:
> >
> >       Call Trace:
> >       [<ffffffff80241ab9>] debug_check_no_locks_held+0x89/0xa0
> >       [<ffffffff8022acf3>] do_exit+0x923/0x990
> >       [<ffffffff8021d4b3>] do_page_fault+0x7e3/0x8e0
> >       [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> >       [<ffffffff8026fae8>] kfree+0xf8/0x110
> >       [<ffffffff804f1fbb>] __mutex_lock_slowpath+0x23b/0x270
> >       [<ffffffff80244b79>] mark_held_locks+0x79/0xa0
> >       [<ffffffff8020ab5d>] error_exit+0x0/0x96
> >       [<ffffffff880568e2>] :nfsd:cmp_creds+0x2/0x10
> >       [<ffffffff8805a14d>] :nfsd:nfsd4_exchange_id+0x16d/0x2a0
> >       [<ffffffff8804fd5a>] :nfsd:nfsd4_proc_compound+0x15ba/0x1820
> >       [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> >       [<ffffffff80487e5f>] release_sock+0x2f/0xe0
> >       [<ffffffff88009fd9>] :sunrpc:svc_sock_enqueue+0x49/0x2c0
> >       [<ffffffff80242ba0>] check_usage+0x40/0x2b0
> >       [<ffffffff80244d7b>] trace_hardirqs_on+0x11b/0x150
> >       [<ffffffff8024465f>] __lock_acquire+0xc3f/0xd20
> >       [<ffffffff8800a22d>] :sunrpc:svc_sock_enqueue+0x29d/0x2c0
> >       [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> >       [<ffffffff804f39e0>] _read_unlock+0x30/0x60
> >       [<ffffffff8800ff08>] :sunrpc:sunrpc_cache_lookup+0x98/0x170
> >       [<ffffffff8800c801>] :sunrpc:ip_map_lookup+0xc1/0xd0
> >       [<ffffffff8805557a>]
> > :nfsd:nfs4svc_decode_compoundargs+0xfaa/0x1080
> >       [<ffffffff8803e121>] :nfsd:nfsd_dispatch+0x101/0x1e0
> >       [<ffffffff880093fa>] :sunrpc:svc_process+0x46a/0x740
> >       [<ffffffff8803e6f8>] :nfsd:nfsd+0x1d8/0x380
> >       [<ffffffff8020ad2c>] child_rip+0xa/0x12
> >       [<ffffffff804f3bbb>] _spin_unlock_irq+0x2b/0x60
> >       [<ffffffff8020a33c>] restore_args+0x0/0x30
> >       [<ffffffff8803e520>] :nfsd:nfsd+0x0/0x380
> >       [<ffffffff8020ad22>] child_rip+0x0/0x12
> >
> >
> >       <end>
> >
> >       Regards
> >       Rahul
> >
> >
> >       > -----Original Message-----
> >       > From: William A. (Andy) Adamson [mailto:andros at citi.umich.edu]
> >       > Sent: Wednesday, May 23, 2007 1:42 PM
> >       > To: Marc Eshel
> >       > Cc: Iyer, Rahul; pnfs at linux-nfs.org
> > <mailto:pnfs at linux-nfs.org>
> >       > Subject: Re: [pnfs] callback problem
> >       >
> >       >
> >       >
> >       > On 5/23/07, Marc Eshel <eshel at almaden.ibm.com> wrote:
> >       >
> >       >       Some more observation I will continue to debug.
> >       >
> >       >       On the client side I got the following messages after
> >       > mount and 2 of them
> >       >       where incorrect. The first one "Couldn't mount...." but
> >       > mount did complete
> >       >       and I was able to use it. The seconds one "started cb
> >       > service!" is wrong
> >       >       callback failed since the IP address was 0.0.0.0 on the
> >       > server side.
> >       >
> >       >       [root at fin30 ~]# mount -t nfs4 fin18:/ /mnt
> >       >       Message from syslogd at fin30 at Wed May 23
> > 11:34:03 2007 ...
> >       >       fin30 kernel: started cb service!
> >       >       Message from syslogd at fin30 at Wed May 23
> > 11:34:05 2007 ...
> >       >       fin30 kernel: Couldn't mount using minorversion 1
> >       >       Message from syslogd at fin30 at Wed May 23
> > 11:34:05 2007 ...
> >       >       fin30 kernel: started cb service!
> >       >
> >       >
> >       >
> >       >       On the server side I had to comment list_move_tail line
> >       > since it was
> >       >       crashing with a null pointer.
> >       >
> >       >       static inline void
> >       >       renew_client(struct nfs4_client *clp)
> >       >       {
> >       >               /*
> >       >               * Move client to the end to the LRU list.
> >       >               */
> >       >               dprintk("renewing client (clientid
> > %08x/%08x)\n",
> >       >                               clp->cl_clientid.cl_boot,
> >       >                               clp->cl_clientid.cl_id);
> >       >       //???   list_move_tail(&clp->cl_lru, &client_lru);
> >       >               clp->cl_time = get_seconds();
> >       >       }
> >       >
> >       >
> >       > i think there is a race on the server regarding the
> >       > nfs_client -  i've failed in
> >       >
> >       > 1) find_confirmed_client_by_str called by nfsd4_exchange_id
> >       > 2) succeeded in calling renew_client, and failed in
> > nfsd4_sequence.
> >       >
> >       > -->Andy
> >       >
> >       >
> >       >
> >       >
> >
> >
> >
> >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/pnfs/attachments/20070523/ab3a9237/attachment-0001.htm 


More information about the pNFS mailing list