critical kernel bug in encode_lookup

Gabriel Barazer gabriel at oxeva.fr
Wed Aug 15 13:24:29 EDT 2007


On 08/15/2007 2:25:51 PM +0200, Trond Myklebust 
<trond.myklebust at fys.uio.no> wrote:
> On Wed, 2007-08-15 at 12:19 +0200, Gabriel Barazer wrote:
>> On 08/15/2007 4:39:06 AM +0200, Trond Myklebust 
>> <trond.myklebust at fys.uio.no> wrote:
>>> On Wed, 2007-08-15 at 02:45 +0200, Gabriel Barazer wrote:
>>>> Hi there,
>>>>
>>>> I have experienced some bug under a quite heavy load (but only 
>>>> sometimes), related to the encode_lookup function, which cause the nfs4 
>>>> client to stop and block. Here is a copy of this bug :
>>>> RESERVE_SPACE(429) failed in function encode_lookup
>>> Have you tried 2.6.23-rc1 or later? I've fixed quite a few bugs in the
>>> NFSv4 space reservations.
>> I will try it ASAP. Do I have to test it on the client, the server or 
>> both ? Obviously, this a bit more difficult for me to test it on the 
>> server as 10+ clients are connected on it.
>>
>> Here another bug which happened quite randomly too on 
>> __nfs4_find_lock_state (attached are the full call traces, again, very 
>> similar but not the same). This time it happened in the middle of the 
>> night (about 10% of the daily load), very randomly too.
>> Could this be related ?
> 
> That too is likely to have changed (and is hopefully fixed) since I also
> tightened up the locking in that area.
> 

I installed and tested 2.6.23-rc3 and I haven't got time to experiment 
the other bugs : I hit one almost immediately after system reboot (and 
some http load too). Here is it :
Unable to handle kernel NULL pointer dereference at 0000000000000108 RIP:
  [<ffffffff8026b93e>] __dentry_open+0x44/0x15f
PGD 58a03067 PUD 57227067 PMD 0
Oops: 0000 [1] SMP
CPU 0
Pid: 1557, comm: php-cgi Not tainted 2.6.23-rc3 #1
RIP: 0010:[<ffffffff8026b93e>]  [<ffffffff8026b93e>] 
__dentry_open+0x44/0x15f
RSP: 0018:ffff810058cdfaa8  EFLAGS: 00010246
RAX: 0000000000008001 RBX: ffff81004dc1bb80 RCX: ffff81004dc1bb80
RDX: 0000000000000000 RSI: ffff81007d417bc0 RDI: ffff81004c0772d8
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000012c5fdc7300
R10: ffff81004c0772d8 R11: 0000000000000246 R12: ffff810058cdfea8
R13: 0000000000000000 R14: ffff81007d417bc0 R15: ffff81004c0772d8
FS:  00002b1f1926e160(0000) GS:ffffffff80542000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000108 CR3: 0000000058aa5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process php-cgi (pid: 1557, threadinfo ffff810058cde000, task 
ffff81005a417100)
Stack:  0000000000000246 ffff81004c0772d8 ffff810058cdfea8 ffff810058cdfea8
  ffff810058cdfb88 ffff810058cdfea8 ffff810058cdfb88 ffffffff8026ca36
  ffff81005f4d45f8 ffff8100792f8d00 ffff810076ec9480 ffffffff802c91c7
Call Trace:
  [<ffffffff8026ca36>] lookup_instantiate_filp+0x5a/0x80
  [<ffffffff802c91c7>] nfs4_intent_set_file+0x40/0x73
  [<ffffffff802ca071>] nfs4_atomic_open+0x128/0x13a
  [<ffffffff80443187>] put_rpccred+0x34/0xf4
  [<ffffffff802ca171>] nfs4_open_revalidate+0xee/0x142
  [<ffffffff802b6452>] nfs_atomic_lookup+0x10e/0x189
  [<ffffffff80273a52>] do_lookup+0xc4/0x1ae
  [<ffffffff8027586e>] __link_path_walk+0x849/0xcda
  [<ffffffff80275d57>] link_path_walk+0x58/0xe0
  [<ffffffff8026b85a>] get_unused_fd_flags+0x7c/0x115
  [<ffffffff80276107>] do_path_lookup+0x1a0/0x1c2
  [<ffffffff80276ad8>] __path_lookup_intent_open+0x56/0x96
  [<ffffffff80276c68>] open_namei+0x75/0x607
  [<ffffffff80228a99>] update_curr+0xf4/0x117
  [<ffffffff8026baff>] do_filp_open+0x1c/0x3d
  [<ffffffff8026b85a>] get_unused_fd_flags+0x7c/0x115
  [<ffffffff8026bb66>] do_sys_open+0x46/0xca
  [<ffffffff8020bf2e>] system_call+0x7e/0x83


Code: 48 8b 85 08 01 00 00 4c 89 7b 18 48 89 df 4c 89 73 10 48 c7
RIP  [<ffffffff8026b93e>] __dentry_open+0x44/0x15f
  RSP <ffff810058cdfaa8>
CR2: 0000000000000108

I hit it really soon so I couldn't verify anything else.

Gabriel


More information about the NFSv4 mailing list