critical kernel bug in encode_lookup
Gabriel Barazer
gabriel at oxeva.fr
Wed Aug 15 13:24:29 EDT 2007
On 08/15/2007 2:25:51 PM +0200, Trond Myklebust
<trond.myklebust at fys.uio.no> wrote:
> On Wed, 2007-08-15 at 12:19 +0200, Gabriel Barazer wrote:
>> On 08/15/2007 4:39:06 AM +0200, Trond Myklebust
>> <trond.myklebust at fys.uio.no> wrote:
>>> On Wed, 2007-08-15 at 02:45 +0200, Gabriel Barazer wrote:
>>>> Hi there,
>>>>
>>>> I have experienced some bug under a quite heavy load (but only
>>>> sometimes), related to the encode_lookup function, which cause the nfs4
>>>> client to stop and block. Here is a copy of this bug :
>>>> RESERVE_SPACE(429) failed in function encode_lookup
>>> Have you tried 2.6.23-rc1 or later? I've fixed quite a few bugs in the
>>> NFSv4 space reservations.
>> I will try it ASAP. Do I have to test it on the client, the server or
>> both ? Obviously, this a bit more difficult for me to test it on the
>> server as 10+ clients are connected on it.
>>
>> Here another bug which happened quite randomly too on
>> __nfs4_find_lock_state (attached are the full call traces, again, very
>> similar but not the same). This time it happened in the middle of the
>> night (about 10% of the daily load), very randomly too.
>> Could this be related ?
>
> That too is likely to have changed (and is hopefully fixed) since I also
> tightened up the locking in that area.
>
I installed and tested 2.6.23-rc3 and I haven't got time to experiment
the other bugs : I hit one almost immediately after system reboot (and
some http load too). Here is it :
Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP:
[<ffffffff8026b93e>] __dentry_open+0x44/0x15f
PGD 58a03067 PUD 57227067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Pid: 1557, comm: php-cgi Not tainted 2.6.23-rc3 #1
RIP: 0010:[<ffffffff8026b93e>] [<ffffffff8026b93e>]
__dentry_open+0x44/0x15f
RSP: 0018:ffff810058cdfaa8 EFLAGS: 00010246
RAX: 0000000000008001 RBX: ffff81004dc1bb80 RCX: ffff81004dc1bb80
RDX: 0000000000000000 RSI: ffff81007d417bc0 RDI: ffff81004c0772d8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000012c5fdc7300
R10: ffff81004c0772d8 R11: 0000000000000246 R12: ffff810058cdfea8
R13: 0000000000000000 R14: ffff81007d417bc0 R15: ffff81004c0772d8
FS: 00002b1f1926e160(0000) GS:ffffffff80542000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000108 CR3: 0000000058aa5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process php-cgi (pid: 1557, threadinfo ffff810058cde000, task
ffff81005a417100)
Stack: 0000000000000246 ffff81004c0772d8 ffff810058cdfea8 ffff810058cdfea8
ffff810058cdfb88 ffff810058cdfea8 ffff810058cdfb88 ffffffff8026ca36
ffff81005f4d45f8 ffff8100792f8d00 ffff810076ec9480 ffffffff802c91c7
Call Trace:
[<ffffffff8026ca36>] lookup_instantiate_filp+0x5a/0x80
[<ffffffff802c91c7>] nfs4_intent_set_file+0x40/0x73
[<ffffffff802ca071>] nfs4_atomic_open+0x128/0x13a
[<ffffffff80443187>] put_rpccred+0x34/0xf4
[<ffffffff802ca171>] nfs4_open_revalidate+0xee/0x142
[<ffffffff802b6452>] nfs_atomic_lookup+0x10e/0x189
[<ffffffff80273a52>] do_lookup+0xc4/0x1ae
[<ffffffff8027586e>] __link_path_walk+0x849/0xcda
[<ffffffff80275d57>] link_path_walk+0x58/0xe0
[<ffffffff8026b85a>] get_unused_fd_flags+0x7c/0x115
[<ffffffff80276107>] do_path_lookup+0x1a0/0x1c2
[<ffffffff80276ad8>] __path_lookup_intent_open+0x56/0x96
[<ffffffff80276c68>] open_namei+0x75/0x607
[<ffffffff80228a99>] update_curr+0xf4/0x117
[<ffffffff8026baff>] do_filp_open+0x1c/0x3d
[<ffffffff8026b85a>] get_unused_fd_flags+0x7c/0x115
[<ffffffff8026bb66>] do_sys_open+0x46/0xca
[<ffffffff8020bf2e>] system_call+0x7e/0x83
Code: 48 8b 85 08 01 00 00 4c 89 7b 18 48 89 df 4c 89 73 10 48 c7
RIP [<ffffffff8026b93e>] __dentry_open+0x44/0x15f
RSP <ffff810058cdfaa8>
CR2: 0000000000000108
I hit it really soon so I couldn't verify anything else.
Gabriel
More information about the NFSv4
mailing list