Regression test results for linux-2.6.18-rc4-CITI_NFS4_ALL-1 (and before)

Trond Myklebust trond.myklebust at fys.uio.no
Tue Aug 22 20:46:48 EDT 2006


On Tue, 2006-08-22 at 17:30 -0700, Bryce Harrington wrote:
> Now for the issues:
> 
> 1.  Failure during holey file test  (Bugzilla #119)
> ==================================
> Starting with the linux-2.6.18-rc2-CITI_NFS4_ALL-1 patch, Connectathon
> 04 has been failing during the "holey file test", with the following
> output:
> 
>   test holey file support
>   read (hole) offset 8192, sz = 56667, bytes = 5141 (ret -1), holesz = 9012
>   read: Input/output error
>   special tests failed
>   Tests failed, leaving /mnt/nfs02 mounted
> 
> This is a known issue, reported back in June on the
> linux-2.6.17-CITI_NFS4_ALL-1 patchset.  It has something to do with the
> fix_short_read_drecovery.dif patch:
> 
>   http://linux-nfs.org/pipermail/nfsv4/2006-June/004587.html

I've just uploaded the fix for this issue to bugzilla, git and http.

> 4.  Iozone crashes  (Bugzilla #122)
> ==================
> We have been seeing a variety of crashes with Iozone in recent tests.
> In both linux-2.6.18-rc2-CITI_NFS4_ALL-1 and
> linux-2.6.18-rc4-CITI_NFS4_ALL-1, iozone is generating a kernel Oops
> followed by a sequence of BUGs, until the machine is power cycled.
> To see the output, load the following console log and search for the
> '19:25:51' timestamp.
> 
>   http://crucible.osdl.org/runs/1431/sysinfo/nfs03.console
> 
> It appears that there is a slab corruption, followed by a NULL pointer
> dereference BUG, and then an Oops.
> 
> nfs03 login: nfsd: last server has exited
> nfsd: unexporting all filesystems
> NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery
> directory
> NFSD: starting 90-second grace period
> Slab corruption: start=f72bec84, len=512
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c01bce01>](nfs_free_client+0xc8/0xe5)
> 0c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 8c c0 a2 f7
> Prev obj: start=f72bea78, len=512
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c038f608>](skb_release_data+0x8b/0x93)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Slab corruption: start=f72bec84, len=512
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c038f608>](skb_release_data+0x8b/0x93)
> 0c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 5c 46 5b c0
> Prev obj: start=f72bea78, len=512
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c038f608>](skb_release_data+0x8b/0x93)
> 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6bBUG: unable to handle kernel
> NULL pointer dereference at virtual address 00000000
> printing eip:
> c0124732
> *pde = 00000000
> Oops: 0002 [#1]
> PREEMPT SMP 
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c0124732>]    Not tainted VLI
> EFLAGS: 00010086   (2.6.18-rc2-CITI_NFS4_ALL-1 #1) 
> EIP is at try_to_del_timer_sync+0x32/0x54
> eax: 00000000   ebx: f72bed50   ecx: c05b3e00   edx: c05b465c
> bp: f7ba6f54   esp: f7ba6f48
> ds: 007b   es: 007b   ss: 0068
> Process 192.168.254.2-r (pid: 7857, ti=f7ba6000 task=c1bee000
> task.ti=f7ba6000)
> Stack: 00000213 f72bed50 f72bec84 f7ba6f64 c0124761 f72bed50 00003a98
> f7ba6f80 
> c01dc3e3 f72bed50 f72bed1c 00000000 f72bec84 f7add3d8 f7ba6f98 c01db24e 
> f72bec84 f7add3d8 f72bec84 fffffffe f7ba6fc0 c01dc0a3 f72bec84 f7add3d8 
> all Trace:
> [<c01037da>] show_stack_log_lvl+0x8a/0x92
> [<c010393b>] show_registers+0x11d/0x186
> [<c0103b27>] die+0x10c/0x1da
> [<c0113f35>] do_page_fault+0x3e0/0x4bc
> [<c01034ad>] error_code+0x39/0x40
> [<c0124761>] del_timer_sync+0xd/0x1b
> [<c01dc3e3>] nfs4_schedule_state_renewal+0x86/0xad
> [<c01db24e>] nfs4_init_client+0x3e/0x49
> [<c01dc0a3>] reclaimer+0xc1/0x1d6
> [<c012e4a6>]hread+0x84/0xad
> [<c0100f19>] kernel_thread_helper+0x5/0xb
> Code: 53 8d 45 f4 51 8b 5d 08 50 53 e8 40 fe ff ff 89 c1 58 39 59 14 5a
> 74 22 8b 13 31 f6 85 d2 74 1a 8b 43 04 be 01 00 00 00 89 42 04 <89> 10
> c7 43 04 00 02 20 00 c7 03 00 00 00 00 8b 55 f4 89 c8 e8 
> EIP: [<c0124732>] try_to_del_timer_sync+0x32/0x54 SS:ESP 0068:f7ba6f48
> <6>note: 192.168.254.2-r[7857] exited with preemBUG: soft lockup
> detected on CPU#0!
> [<c010374e>] show_trace+0x16/0x18
> [<c010381c>] dump_stack+0x19/0x1b
> [<c013d195>] softlockup_tick+0x9a/0xae
> [<c0125758>] run_local_timers+0x12/0x14
> [<c0125592>] update_process_times+0x3e/0x65
> [<c0110277>] smp_apic_timer_interrupt+0x54/0x5f
> [<c01033fb>] apic_timer_interrupt+0x1f/0x24
> [<c03f8dd7>] _spin_lock_irq+0x8/0xa
> [<c01255fc>] run_timer_softirq+0x36/0x180
> [<c0121313>] __do_softirq+0x5d/0xc6
> [<c0104cc2>] do_softirq+0x5b/0xaa
> =======================
> [<c01213b5>] irq_exit+0x39/0x46
> [<c011027d>] smp_apic_timer_interrupt+0x5a/0x5f
> [<c01033fb>] apic_timer_interrupt+0x1f/0x24
> [<c011ed41>] exit_notify+0x22e/0x272
> [<c011f10c>] do_exit+0x387/0x404
> [<c0103bed>] die+0x1d2/0x1da
> [<c0113f35>] do_page_fault+0x3e0/0x4bc
> [<c01034ad>] ermer_sync+0xd/0x1b
> [<c01dc3e3>] nfs4_schedule_state_renewal+0x86/0xad
> [<c01db24e>] nfs4_init_client+0x3e/0x49
> [<c01dc0a3>] reclaimer+0xc1/0x1d6
> [<c012e4a6>] kthread+0x84/0xad
> [<c0100f19>] kernel_thread_helper+0x5/0xb
> 
> 
> Possibly this issue also leaves the server in a bad state, because there
> is a kernel panic when the client is rebooted.  This time it is a BUG at
> kernel/timer.c:397 due to 'invalid opcode: 0000 [#1]'.  This is very
> similar to one reported in July:
> 
>   http://linux-nfs.org/pipermail/nfsv4/2006-July/004702.html
> 
> kernel BUG at kernel/timer.c:397!
> invalid opcode: 0000 [#1]
> PREEMPT SMP 
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c01247af>]    Not tainted VLI
> EFLAGS: 00010883   (2.6.18-rc2-CITI_NFS4_ALL-1 #1) 
> EIP is at cascade+0x40/0x65
> eax: f72d1a38   ebx: c04d9f18   ecx: c0565f94   edx: c05b3e00
> esi: c0565f94   edi: 00000035   ebp: c0565fa8   esp: c0565f94
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, ti=c0565000 task=c04938a0 task.ti=c052e000)
> Stack: f72d1a38 c04d4578 00000000 c05b3e00 c055d380 c0565fd8 c012562a
> c05b3e00 
> c05b461c 00000035 c0565fd8 c0394fd5 00007500 0000012b c0523b08 00000011 
> c055d380 c0565ff8 c0121313 c0523b08 00000000 0000000a c052ef6c c052e000 
> Call Trace:
> [<c01037da>] show_stack_log_lvl+0x8a/0x92
> [<c010393b> error_code+0x39/0x40
> [<c012562a>] run_timer_softirq+0x64/0x180
> [<c0121313>] __do_softirq+0x5d/0xc6
> [<c0104cc2>] do_softirq+0x5b/0xaa
> =======================
> [<c01213b5>] irq_exit+0x39/0x46
> [<c011027d>] smp_apic_timer_interrupt+0x5a/0x5f
> [<c01033fb>] apic_timer_interrupt+0x1f/0x24
> [<c0100c3f>] cpu_idle+0xae/0xdb
> [<c01002d2>] _stext+0x3a/0x3c
> [<c05337f4>] start_kernel+0x184/0x186
> [<c0100210>] 0xc0100210
> Code: 00 00 03 45 0c 8b 10 89 55 ec 89 4a 04 8b 50 04 89 0a 89 55 f0 89
> 00 89 40 04 8b 45 ec 39 c8 8b 18 74 23 8b 55 08 39 50 14 74 08 <0f> 0b
> 8d 01 78 4b 42 c0 50 ff 75 08 e8 d8 fc ff ff 59 58 89 d8 
> EIP: [<c01247af>] cascade+0x40/0x65 SS:ESP 0068:c0565f94
> <0>Kernel panic - not syncing: Fatal exception in interrupt

I'll have a look at these. Looks like a use-after-freed issue with
nfs_free_client. Was this on NFSv4 only?

Cheers,
 Trond



More information about the NFSv4 mailing list