[pnfs] [PATCH] nfsd: do not poison so_close_lru in release_stateowner
J. Bruce Fields
bfields at fieldses.org
Fri Jan 25 15:28:00 EST 2008
On Thu, Jan 24, 2008 at 12:51:43PM +0200, Benny Halevy wrote:
> When poisoned, the following crash can happen later
> in move_to_close_lru when the list becomes poisoned rather
> than empty.
>
> Jan 23 12:37:16 bh-testlin1 kernel: Unable to handle kernel paging request at 0000000000100108 RIP:
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff8824c604>] :nfsd:nfsd4_close+0xd3/0x123
> Jan 23 12:37:16 bh-testlin1 kernel: PGD 73827067 PUD 79cee067 PMD 0
> Jan 23 12:37:16 bh-testlin1 kernel: Oops: 0002 [1] SMP
> Jan 23 12:37:16 bh-testlin1 kernel: CPU 0
> Jan 23 12:37:16 bh-testlin1 kernel: Modules linked in: panfs(P) panlayoutdriver vmnet(P) parport_pc parport vmmon(P) nfsd auth_rpcgss exportfs autofs4 nfs lockd nfs_acl sunrpc ipv6 video output sbs sbshc battery ac sr_mod k8temp i2c_nforce2 hwmon i2c_core pcspkr forcedeth button cdrom pata_amd ata_generic sata_nv libata sd_mod scsi_mod ext3 jbd mbcache ehci_hcd ohci_hcd uhci_hcd
> Jan 23 12:37:16 bh-testlin1 kernel: Pid: 2163, comm: nfsd Tainted: P 2.6.24-rc8-panlayout #8
> Jan 23 12:37:16 bh-testlin1 kernel: RIP: 0010:[<ffffffff8824c604>] [<ffffffff8824c604>] :nfsd:nfsd4_close+0xd3/0x123
> Jan 23 12:37:16 bh-testlin1 kernel: RSP: 0018:ffff81007d947dc0 EFLAGS: 00010246
> Jan 23 12:37:16 bh-testlin1 kernel: RAX: 0000000000200200 RBX: ffff810028118000 RCX: ffffffff8824c5d0
> Jan 23 12:37:16 bh-testlin1 kernel: RDX: 0000000000100100 RSI: ffffe20000909cd0 RDI: ffff810028118058
> Jan 23 12:37:16 bh-testlin1 kernel: RBP: ffff81007c524290 R08: 0000000000000166 R09: ffff810029516000
> Jan 23 12:37:16 bh-testlin1 kernel: R10: 0000000000000000 R11: ffff81007c524290 R12: ffff81007c4f0400
> Jan 23 12:37:16 bh-testlin1 kernel: R13: 0000000000000000 R14: ffff81007c4f0400 R15: ffff81007c4c4000
> Jan 23 12:37:16 bh-testlin1 kernel: FS: 00002ab8fd1ea6f0(0000) GS:ffffffff81371000(0000) knlGS:00000000f3a2cb90
> Jan 23 12:37:16 bh-testlin1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> Jan 23 12:37:16 bh-testlin1 kernel: CR2: 0000000000100108 CR3: 0000000073838000 CR4: 00000000000006e0
> Jan 23 12:37:16 bh-testlin1 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 23 12:37:16 bh-testlin1 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 23 12:37:16 bh-testlin1 kernel: Process nfsd (pid: 2163, threadinfo ffff81007d946000, task ffff81007d944000)
> Jan 23 12:37:16 bh-testlin1 kernel: Stack: 0000000000000000 ffffffff88264ec0 ffff810029516000 ffffffff88264da0
> Jan 23 12:37:16 bh-testlin1 kernel: ffff81007c525000 ffff81007c524000 ffff81007c4f0400 ffffffff8823fef0
> Jan 23 12:37:16 bh-testlin1 kernel: ffffffff882650f8 ffff81007c524288 ffff81007b8892c0 ffff81007c4c4000
> Jan 23 12:37:16 bh-testlin1 kernel: Call Trace:
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff8823fef0>] :nfsd:nfsd4_proc_compound+0x2b1/0x476
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff88231245>] :nfsd:nfsd_dispatch+0xde/0x1b6
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff88187b9d>] :sunrpc:svc_process_common+0x2fc/0x5bd
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff88188cce>] :sunrpc:svc_process+0x101/0x143
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff88231819>] :nfsd:nfsd+0x1a1/0x2bc
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff8100ccd8>] child_rip+0xa/0x12
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff88231678>] :nfsd:nfsd+0x0/0x2bc
> Jan 23 12:37:16 bh-testlin1 kernel: [<ffffffff8100ccce>] child_rip+0x0/0x12
> Jan 23 12:37:16 bh-testlin1 kernel:
> Jan 23 12:37:16 bh-testlin1 kernel:
> Jan 23 12:37:16 bh-testlin1 kernel: Code: 48 89 42 08 48 8b 35 69 53 02 00 48 89 10 48 c7 c2 70 19 27
> Jan 23 12:37:16 bh-testlin1 kernel: RIP [<ffffffff8824c604>] :nfsd:nfsd4_close+0xd3/0x123
> Jan 23 12:37:16 bh-testlin1 kernel: RSP <ffff81007d947dc0>
> Jan 23 12:37:16 bh-testlin1 kernel: CR2: 0000000000100108
> Jan 23 12:37:16 bh-testlin1 kernel: ---[ end trace 90ea1dfbd28e9e52 ]---
>
> Thanks to Neil Brown for spotting the poison values and suggesting
> this fix.
>
> Signed-off-by: Benny Halevy <bhalevy at panasas.com>
> ---
>
> I reproduced this bug by interrupting nfsd_close while it
> was blocked on my exported file system's file_operations.release()
> and this fix solves the crash as __list_del indeed copes
> with the the empty so_close_lru list.
OK, apologies, I may be dense, but I'm not seeing how this is possible.
For it to happen, release_stateowner() must have been called on this
open owner before the move_to_close_lru() in nfsd4_close() was called.
That call to move_to_close_lru() is made under the big nfsv4 state lock.
So is every call to release_stateowner(). Therefore the call to
release_stateowner() must have happened before the nfs4_lock_state() at
the start of nfsd4_close(). In particular, it must have happened before
the call to nfs4_preprocess_seqid_op(). So nfs4_preprocess_seqid_op()
must have returned with return value 0 and a stateowner that had already
had release_stateowner() called on it.
In particular, that means the find_stateid() call at the beginning of
nfs4_preprocess_seqid_op() must have found a stateid with
stp->st_stateowner equal to this stateowner. Such a stateid would also
be on the stateowners so_stateids list. But any previous
release_stateowner() call would have (in unhash_stateowner) destroyed
all such stateid's (removing them, in the process, from the hash table
which find_stateid() uses to find open stateid's).
So I don't understand the root cause of your oops yet.
>
> That said, do we still want to "touch" sop->so_time in move_to_close_lru
> even when so_close_lru is empty?
Hm. Actually I think that list_move_tail should really be a
list_add_tail--I think so_close_lru is normally empty at this point.
--b.
>
> Benny
>
> ---
> fs/nfsd/nfs4state.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 27f284f..48a5322 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1677,7 +1677,7 @@ static void
> release_stateowner(struct nfs4_stateowner *sop)
> {
> unhash_stateowner(sop);
> - list_del(&sop->so_close_lru);
> + list_del_init(&sop->so_close_lru);
> nfs4_put_stateowner(sop);
> }
> --
>
> On Jan. 24, 2008, 11:33 +0200, Benny Halevy <bhalevy at panasas.com> wrote:
> > On Jan. 24, 2008, 5:50 +0200, Neil Brown <neilb at suse.de> wrote:
> >> On Wednesday January 23, bhalevy at panasas.com wrote:
> >>> Apparently, fs/nfsd/nfs4state.c:move_to_close_lru
> >>> may be called when sop->so_close_lru is empty.
> >> Not empty - invalid!
> >
> > I think you're right.
> >
> > here's the relevant assembly code in my tree:
> >
> > .file 1 "fs/nfsd/nfs4state.c"
> > .file 3 "include/linux/list.h"
> >
> > .loc 1 3136 0
> > movq 24(%rbp), %rbx # <variable>.cl_stateowner, temp.3156
> > leaq 56(%rbx), %rax #, head
> > .LVL1000:
> > cmpq %rax, 56(%rbx) # head, <variable>.next
> > jne .L1115 #,
> > if (list_empty(&close->cl_stateowner->so_stateids))
> > move_to_close_lru(close->cl_stateowner);
> > .LBB1951:
> > .LBB1952:
> > .loc 1 1740 0
> > static void
> > move_to_close_lru(struct nfs4_stateowner *sop)
> > {
> > testb $16, nfsd_debug(%rip) #, nfsd_debug
> > je .L1118 #,
> > movq %rbx, %rsi # temp.3156, temp.3156
> > movq $.LC89, %rdi #,
> > xorl %eax, %eax #
> > .LVL1001:
> > call printk #
> > dprintk("NFSD: move_to_close_lru nfs4_stateowner %p\n", sop);
> > .LVL1002:
> > .L1118:
> > leaq 88(%rbx), %rdi #, new
> > list_move_tail(&sop->so_close_lru, &close_lru);
> > .LVL1003:
> > .LBB1953:
> > .LBB1954:
> > .loc 3 279 0
> > static inline void list_move_tail(struct list_head *list,
> > struct list_head *head)
> > {
> > movq 88(%rbx), %rdx # <variable>.next, next
> > .LVL1004:
> > movq 8(%rdi), %rax # <variable>.prev, prev
> > __list_del(list->prev, list->next);
> > .LVL1005:
> > .LBB1955:
> > .LBB1956:
> > .loc 3 157 0
> > static inline void __list_del(struct list_head * prev, struct list_head * next)
> > {
> > movq %rax, 8(%rdx) # prev, <variable>.prev
> > ^^^^^^^^^^^^^^^^^^^^^
> > next->prev = prev;
> >
> > so apparently sop->so_close_lru is poisoned.
> >
> >>> Jan 23 12:37:16 bh-testlin1 kernel: RAX: 0000000000200200 RBX: ffff810028118000 RCX: ffffffff8824c5d0
> >> ^^LIST_POISON1^^
> >>> Jan 23 12:37:16 bh-testlin1 kernel: RDX: 0000000000100100 RSI: ffffe20000909cd0 RDI: ffff810028118058
> >> ^^LIST_POISON2^^
> >>> Jan 23 12:37:16 bh-testlin1 kernel: Code: 48 89 42 08 48 8b 35 69 53 02 00 48 89 10 48 c7 c2 70 19 27
> >> ^^^^^^^^^^^
> >> 0: 48 89 42 08 mov %rax,0x8(%rdx)
> >>
> >> %rdx == LIST_POISON2
> >>
> >> Could this be caused by the "list_del" in release_stateowner???
> >>
> >> If so, list_del_init would fix it.
> >
> > Very likely...
> > I'll try to reproduce that and test the fix.
> >
> > Benny
> >
> >> NeilBrown
> >
> > _______________________________________________
> > pNFS mailing list
> > pNFS at linux-nfs.org
> > http://linux-nfs.org/cgi-bin/mailman/listinfo/pnfs
> >
>
More information about the pNFS
mailing list