Heavy Bug Encountered on NFSv4 Server
Thomas Hoppe
raphead at gmx.net
Wed May 30 19:26:41 EDT 2007
Hi Bruce,
I have switched to release R8 from release R6 of the 2.6.20 Gentoo Kernel
and so far I cannot reproduce it but this doesn't mean anything;
before it also ran for say a week without this problem and
then it occured every 3 days.
What about the compilter, I've compiled my whole
system with -O3
I will keep you updated.
Thomas
J. Bruce Fields schrieb:
> On Fri, May 25, 2007 at 11:30:23PM +0200, Thomas Hoppe wrote:
>
>> From time to time I get the following error logged:
>> After an outage of the NFSD, I have to reboot the machine.
>>
>> May 24 13:40:42 dbserver1 ------------[ cut here ]------------
>> May 24 13:40:42 dbserver1 kernel BUG at fs/locks.c:172!
>>
>
> Hm, so in 2.6.20, this is the line
>
> BUG_ON(!list_empty(&fl->fl_block));
>
> >from fs/locks.c:locks_free_lock().
>
>
>> May 24 13:40:42 dbserver1 [<c01719f1>] __break_lease+0x81/0x320
>>
>
> But locks_free_lock() is called from __break_lease() only to destroy
> new_fl. When allocated, list_empty(&new_fl->fl_block) should be true.
> The only time new_fl is touched in __break_lease() is in
> locks_block_on_timeout(), but that's never called in this case since:
>
>
>> May 24 13:40:42 dbserver1 [<c01abbd0>] reiserfs_find_actor+0x0/0x20
>> May 24 13:40:42 dbserver1 [<c01abbd0>] reiserfs_find_actor+0x0/0x20
>> May 24 13:40:42 dbserver1 [<c0174e70>] ifind+0x50/0xa0
>> May 24 13:40:42 dbserver1 [<c0222d25>] nfsd_open+0x95/0x180
>> May 24 13:40:42 dbserver1 [<c02382c4>] nfsd4_process_open2+0x8a4/0x970
>>
>
> we seem to have been called from nfsd_open(), which always calls
> break_lease with the O_NONBLOCK flag set.
>
> So the only way I can see this happening is if somebody else is
> modifying this file_lock. So probably somebody's using it after it's
> been freed.
>
> The one thing that looks suspicious to me here is the delegation code--I
> think a delegation can outlast the lease it points to. Which is
> obviously a problem, but I don't see how that can explain this
> particular bug.
>
> Anyway, so I have two things to try:
> - Fix that delegation bug, and see if that helps.
> - Try fooling with some delegation tests to see if we can
> trigger the problem here.
>
> It would be helpful to know whether you can reproduce this on more
> recent kernels, or whether you can figure out a way to reliably produce
> this bug.
>
> Also, you might try running:
>
> echo 0 >/proc/sys/fs/leases-enable
>
> before you ever start the nfs server, and see if that eliminates the
> bug. If so, that would help pin the blame on the delegation code.
>
> --b.
>
>
>
--
Thomas Hoppe
Richard-Wagner-Str. 35
73207 Plochingen
Phone: +49.(0).7153.615928
Mobile: +49.(0).176.27233907
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://linux-nfs.org/pipermail/nfsv4/attachments/20070531/3236dbda/attachment.htm
More information about the NFSv4
mailing list