Regression test results for linux-2.6.18-rc4-CITI_NFS4_ALL-1 (and
before)
Bryce Harrington
bryce at osdl.org
Tue Aug 22 20:30:20 EDT 2006
On Mon, Aug 21, 2006 at 09:50:54PM -0400, J. Bruce Fields wrote:
> http://www.citi.umich.edu/projects/nfsv4/linux/kernel-patches/2.6.18-rc4-1/linux-2.6.18-rc4-CITI_NFS4_ALL-1.diff
>
> Changes since 2.6.18-rc2-CITI_NFS4_ALL-1
> - update to 2.6.18-rc4 and Trond's latest
> - Full server NFSv4 ACL support. Well, maybe: we're still mapping to
> backend POSIX draft ACLs, but using an algorithm that maps an
> arbitrary NFSv4 ACL to the closest POSIX ACL that is no more
> permissive than the given ACL.
> - More flexible handling of ACL inheritance flags, which allows
> accepting a few more inheritable ACLs.
> - Various ACL code cleanup
Hi,
Here is the report for this patch release (and the two prior releases):
linux-2.6.18-rc4-CITI_NFS4_ALL-1 http://crucible.osdl.org/runs/1589/
linux-2.6.18-rc2-CITI_NFS4_ALL-1 http://crucible.osdl.org/runs/1431/
linux-2.6.18-rc1-CITI_NFS4_ALL-1 http://crucible.osdl.org/runs/1096/
The latest testrun is showing a number of improvements compared with the
previous two runs, particularly in PyNFS and LTP. However, compared
with the linux-2.6.17-rc2-CITI_NFS4_ALL-1 patch, we've accumulated a
number of regressions. Most or all of these have been reported already,
however I will try to summarize what is being found presently.
First the good news:
* The issue that had been causing NewPyNFS to skip 151 tests has
gone away. This issue seemed to have been due to a MKFILE
st_open.testOpen dependency failure.
* LTP is now passing 18/34 tests, compared with 10/34 previously.
Now for the issues:
1. Failure during holey file test (Bugzilla #119)
==================================
Starting with the linux-2.6.18-rc2-CITI_NFS4_ALL-1 patch, Connectathon
04 has been failing during the "holey file test", with the following
output:
test holey file support
read (hole) offset 8192, sz = 56667, bytes = 5141 (ret -1), holesz = 9012
read: Input/output error
special tests failed
Tests failed, leaving /mnt/nfs02 mounted
This is a known issue, reported back in June on the
linux-2.6.17-CITI_NFS4_ALL-1 patchset. It has something to do with the
fix_short_read_drecovery.dif patch:
http://linux-nfs.org/pipermail/nfsv4/2006-June/004587.html
2. SETCLIENTID failures (Bugzilla #120)
========================
The following cases are failing due to issues related to SETCLIENTID:
CID3 st_setclientid.testLoseAnswer : FAILURE
SETCLIENTID case not covered in RFC should return
NFS4_OK, instead got NFS4ERR_INVAL
CID4 st_setclientid.testAllCases : FAILURE
OP_SETCLIENTID should return NFS4_OK, instead got
NFS4ERR_INVAL
LKT2b st_lockt.testBlock : FAILURE
operation OP_SETCLIENTID_CONFIRM should return
NFS4_OK, instead got NFS4ERR_CLID_INUSE
OPEN7c st_open.testChar : FAILURE
operation OP_SETCLIENTID_CONFIRM should return
NFS4_OK, instead got NFS4ERR_CLID_INUSE
OPEN11 st_open.testLongName : FAILURE
operation OP_SETCLIENTID_CONFIRM should return
NFS4_OK, instead got NFS4ERR_CLID_INUSE
All of these test cases were passing in runs against the
linux-2.6.17-rc2-CITI_NFS4_ALL-1 patchset. The first four also were
passing in the prior patchset, linux-2.6.17-rc1-CITI_NFS4_ALL-1, and
OPEN11 was failing for a different reason (got NFS4ERR_ACCESS).
This was also previously reported in June, with a suggestion it might be
resolvable with an nfs-utils patch from Neil, but that the exact exit
path causing the error message is not obvious:
http://linux-nfs.org/pipermail/nfsv4/2006-June/004588.html
http://linux-nfs.org/pipermail/nfsv4/2006-July/004649.html
I also wonder if it may be related to Bugzilla #71, which deals with
stateids that are forgotten due to lease expiration:
http://bugzilla.linux-nfs.org/show_bug.cgi?id=71
In case it is a unique bug, I've entered a new ticket on it; if someone
knows for sure it is a duplicate of an earlier reported bug, please mark
it a dup.
3. ACL Attr failures (Bugzilla #121)
=====================
NewPyNFS's setattr tests check for several unsupported features, which
should return NFS4ERR_ATTRNOTSUPP. These are currently failing because
they are instead seeing NFS4ERR_NOTSUPP:
SATT11a st_setattr.testUnsupportedLink : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11b st_setattr.testUnsupportedBlock : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11c st_setattr.testUnsupportedChar : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11d st_setattr.testUnsupportedDir : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11f st_setattr.testUnsupportedFifo : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11r st_setattr.testUnsupportedFile : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
SATT11s st_setattr.testUnsupportedSocket : FAILURE
SETATTR with unsupported attr acl should return
NFS4ERR_ATTRNOTSUPP, instead got NFS4ERR_NOTSUPP
I'm assuming these have to do with the ACL implementation work done in
this patchset?
4. Iozone crashes (Bugzilla #122)
==================
We have been seeing a variety of crashes with Iozone in recent tests.
In both linux-2.6.18-rc2-CITI_NFS4_ALL-1 and
linux-2.6.18-rc4-CITI_NFS4_ALL-1, iozone is generating a kernel Oops
followed by a sequence of BUGs, until the machine is power cycled.
To see the output, load the following console log and search for the
'19:25:51' timestamp.
http://crucible.osdl.org/runs/1431/sysinfo/nfs03.console
It appears that there is a slab corruption, followed by a NULL pointer
dereference BUG, and then an Oops.
nfs03 login: nfsd: last server has exited
nfsd: unexporting all filesystems
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery
directory
NFSD: starting 90-second grace period
Slab corruption: start=f72bec84, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c01bce01>](nfs_free_client+0xc8/0xe5)
0c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 8c c0 a2 f7
Prev obj: start=f72bea78, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c038f608>](skb_release_data+0x8b/0x93)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
Slab corruption: start=f72bec84, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c038f608>](skb_release_data+0x8b/0x93)
0c0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 5c 46 5b c0
Prev obj: start=f72bea78, len=512
Redzone: 0x5a2cf071/0x5a2cf071.
Last user: [<c038f608>](skb_release_data+0x8b/0x93)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6bBUG: unable to handle kernel
NULL pointer dereference at virtual address 00000000
printing eip:
c0124732
*pde = 00000000
Oops: 0002 [#1]
PREEMPT SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c0124732>] Not tainted VLI
EFLAGS: 00010086 (2.6.18-rc2-CITI_NFS4_ALL-1 #1)
EIP is at try_to_del_timer_sync+0x32/0x54
eax: 00000000 ebx: f72bed50 ecx: c05b3e00 edx: c05b465c
bp: f7ba6f54 esp: f7ba6f48
ds: 007b es: 007b ss: 0068
Process 192.168.254.2-r (pid: 7857, ti=f7ba6000 task=c1bee000
task.ti=f7ba6000)
Stack: 00000213 f72bed50 f72bec84 f7ba6f64 c0124761 f72bed50 00003a98
f7ba6f80
c01dc3e3 f72bed50 f72bed1c 00000000 f72bec84 f7add3d8 f7ba6f98 c01db24e
f72bec84 f7add3d8 f72bec84 fffffffe f7ba6fc0 c01dc0a3 f72bec84 f7add3d8
all Trace:
[<c01037da>] show_stack_log_lvl+0x8a/0x92
[<c010393b>] show_registers+0x11d/0x186
[<c0103b27>] die+0x10c/0x1da
[<c0113f35>] do_page_fault+0x3e0/0x4bc
[<c01034ad>] error_code+0x39/0x40
[<c0124761>] del_timer_sync+0xd/0x1b
[<c01dc3e3>] nfs4_schedule_state_renewal+0x86/0xad
[<c01db24e>] nfs4_init_client+0x3e/0x49
[<c01dc0a3>] reclaimer+0xc1/0x1d6
[<c012e4a6>]hread+0x84/0xad
[<c0100f19>] kernel_thread_helper+0x5/0xb
Code: 53 8d 45 f4 51 8b 5d 08 50 53 e8 40 fe ff ff 89 c1 58 39 59 14 5a
74 22 8b 13 31 f6 85 d2 74 1a 8b 43 04 be 01 00 00 00 89 42 04 <89> 10
c7 43 04 00 02 20 00 c7 03 00 00 00 00 8b 55 f4 89 c8 e8
EIP: [<c0124732>] try_to_del_timer_sync+0x32/0x54 SS:ESP 0068:f7ba6f48
<6>note: 192.168.254.2-r[7857] exited with preemBUG: soft lockup
detected on CPU#0!
[<c010374e>] show_trace+0x16/0x18
[<c010381c>] dump_stack+0x19/0x1b
[<c013d195>] softlockup_tick+0x9a/0xae
[<c0125758>] run_local_timers+0x12/0x14
[<c0125592>] update_process_times+0x3e/0x65
[<c0110277>] smp_apic_timer_interrupt+0x54/0x5f
[<c01033fb>] apic_timer_interrupt+0x1f/0x24
[<c03f8dd7>] _spin_lock_irq+0x8/0xa
[<c01255fc>] run_timer_softirq+0x36/0x180
[<c0121313>] __do_softirq+0x5d/0xc6
[<c0104cc2>] do_softirq+0x5b/0xaa
=======================
[<c01213b5>] irq_exit+0x39/0x46
[<c011027d>] smp_apic_timer_interrupt+0x5a/0x5f
[<c01033fb>] apic_timer_interrupt+0x1f/0x24
[<c011ed41>] exit_notify+0x22e/0x272
[<c011f10c>] do_exit+0x387/0x404
[<c0103bed>] die+0x1d2/0x1da
[<c0113f35>] do_page_fault+0x3e0/0x4bc
[<c01034ad>] ermer_sync+0xd/0x1b
[<c01dc3e3>] nfs4_schedule_state_renewal+0x86/0xad
[<c01db24e>] nfs4_init_client+0x3e/0x49
[<c01dc0a3>] reclaimer+0xc1/0x1d6
[<c012e4a6>] kthread+0x84/0xad
[<c0100f19>] kernel_thread_helper+0x5/0xb
Possibly this issue also leaves the server in a bad state, because there
is a kernel panic when the client is rebooted. This time it is a BUG at
kernel/timer.c:397 due to 'invalid opcode: 0000 [#1]'. This is very
similar to one reported in July:
http://linux-nfs.org/pipermail/nfsv4/2006-July/004702.html
kernel BUG at kernel/timer.c:397!
invalid opcode: 0000 [#1]
PREEMPT SMP
Modules linked in:
CPU: 0
EIP: 0060:[<c01247af>] Not tainted VLI
EFLAGS: 00010883 (2.6.18-rc2-CITI_NFS4_ALL-1 #1)
EIP is at cascade+0x40/0x65
eax: f72d1a38 ebx: c04d9f18 ecx: c0565f94 edx: c05b3e00
esi: c0565f94 edi: 00000035 ebp: c0565fa8 esp: c0565f94
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c0565000 task=c04938a0 task.ti=c052e000)
Stack: f72d1a38 c04d4578 00000000 c05b3e00 c055d380 c0565fd8 c012562a
c05b3e00
c05b461c 00000035 c0565fd8 c0394fd5 00007500 0000012b c0523b08 00000011
c055d380 c0565ff8 c0121313 c0523b08 00000000 0000000a c052ef6c c052e000
Call Trace:
[<c01037da>] show_stack_log_lvl+0x8a/0x92
[<c010393b> error_code+0x39/0x40
[<c012562a>] run_timer_softirq+0x64/0x180
[<c0121313>] __do_softirq+0x5d/0xc6
[<c0104cc2>] do_softirq+0x5b/0xaa
=======================
[<c01213b5>] irq_exit+0x39/0x46
[<c011027d>] smp_apic_timer_interrupt+0x5a/0x5f
[<c01033fb>] apic_timer_interrupt+0x1f/0x24
[<c0100c3f>] cpu_idle+0xae/0xdb
[<c01002d2>] _stext+0x3a/0x3c
[<c05337f4>] start_kernel+0x184/0x186
[<c0100210>] 0xc0100210
Code: 00 00 03 45 0c 8b 10 89 55 ec 89 4a 04 8b 50 04 89 0a 89 55 f0 89
00 89 40 04 8b 45 ec 39 c8 8b 18 74 23 8b 55 08 39 50 14 74 08 <0f> 0b
8d 01 78 4b 42 c0 50 ff 75 08 e8 d8 fc ff ff 59 58 89 d8
EIP: [<c01247af>] cascade+0x40/0x65 SS:ESP 0068:c0565f94
<0>Kernel panic - not syncing: Fatal exception in interrupt
Bryce
More information about the NFSv4
mailing list