Heavy Bug Encountered on NFSv4 Server
J. Bruce Fields
bfields at fieldses.org
Wed May 30 18:57:49 EDT 2007
On Fri, May 25, 2007 at 11:30:23PM +0200, Thomas Hoppe wrote:
> From time to time I get the following error logged:
> After an outage of the NFSD, I have to reboot the machine.
>
> May 24 13:40:42 dbserver1 ------------[ cut here ]------------
> May 24 13:40:42 dbserver1 kernel BUG at fs/locks.c:172!
Hm, so in 2.6.20, this is the line
BUG_ON(!list_empty(&fl->fl_block));
from fs/locks.c:locks_free_lock().
> May 24 13:40:42 dbserver1 [<c01719f1>] __break_lease+0x81/0x320
But locks_free_lock() is called from __break_lease() only to destroy
new_fl. When allocated, list_empty(&new_fl->fl_block) should be true.
The only time new_fl is touched in __break_lease() is in
locks_block_on_timeout(), but that's never called in this case since:
> May 24 13:40:42 dbserver1 [<c01abbd0>] reiserfs_find_actor+0x0/0x20
> May 24 13:40:42 dbserver1 [<c01abbd0>] reiserfs_find_actor+0x0/0x20
> May 24 13:40:42 dbserver1 [<c0174e70>] ifind+0x50/0xa0
> May 24 13:40:42 dbserver1 [<c0222d25>] nfsd_open+0x95/0x180
> May 24 13:40:42 dbserver1 [<c02382c4>] nfsd4_process_open2+0x8a4/0x970
we seem to have been called from nfsd_open(), which always calls
break_lease with the O_NONBLOCK flag set.
So the only way I can see this happening is if somebody else is
modifying this file_lock. So probably somebody's using it after it's
been freed.
The one thing that looks suspicious to me here is the delegation code--I
think a delegation can outlast the lease it points to. Which is
obviously a problem, but I don't see how that can explain this
particular bug.
Anyway, so I have two things to try:
- Fix that delegation bug, and see if that helps.
- Try fooling with some delegation tests to see if we can
trigger the problem here.
It would be helpful to know whether you can reproduce this on more
recent kernels, or whether you can figure out a way to reliably produce
this bug.
Also, you might try running:
echo 0 >/proc/sys/fs/leases-enable
before you ever start the nfs server, and see if that eliminates the
bug. If so, that would help pin the blame on the delegation code.
--b.
More information about the NFSv4
mailing list