[PATCH] fix kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server for NFS v4 (was Re: File open issue on 2.6.23 with NFS v4)
Frank Filz
ffilzlnx at us.ibm.com
Thu Jan 31 14:22:24 EST 2008
On Mon, 2008-01-28 at 16:19 -0800, Frank Filz wrote:
> Ok, I've found a new way to hit kernel BUG at fs/nfs/namespace.c:108:
>
> server=2.6.23
> mount --bind /home /export/home
> /etc/exports:
> #simple
> /export/home gss/krb5(nohide,insecure,no_subtree_check,no_root_squash,async,rw)
> /export gss/krb5(crossmnt,fsid=0,insecure,no_subtree_check,no_root_squash,async,rw)
>
> client=2.6.24-rc5 with the previous patches
> on client:
> mount -t nfs4 -osec=krb5 server:/home /mnt/home
> ls -l /mnt/home
>
> Looking at a tcpdump trace, it appears there is no network traffic for
> the ls and it crashes immediately.
>
> Any thoughts?
One thing I discovered, even immediately issuing umount will cause the
BUG. The root dentry for the mount is basically useless.
It turns out this was due to a bad bug in nfs-utils in
utils/mountd/cache.c (I am working on some stuff in that file). It
basically caused user space to write a bad export into the kernel. It
does seem that it's not a good idea for the client to crash in this case
though.
Perhaps that BUG_ON at fs/nfs/namespace.c:108 should be changed. I tried
returning an error instead of BUGing, but that didn't seem to work.
While trying to debug the client, I did try this:
--- ./fs/nfs/getroot.c.orig 2008-01-30 16:57:25.000000000 -0800
+++ ./fs/nfs/getroot.c 2008-01-30 12:18:28.000000000 -0800
@@ -270,6 +270,13 @@ struct dentry *nfs4_get_root(struct supe
return ERR_PTR(error);
}
+ //FSFTEMP try this out
+ if (!nfs_fsid_equal(&server->fsid, &fattr.fsid)) {
+ printk(KERN_WARNING "FSFTEMP trying to fix fsid=%lld:%lld to fsid=%lld:%lld\n",
+ server->fsid.major, server->fsid.minor,
+ fattr.fsid.major, fattr.fsid.minor);
+ memcpy(&server->fsid, &fattr.fsid, sizeof(server->fsid));
+ }
inode = nfs_fhget(sb, mntfh, &fattr);
if (IS_ERR(inode)) {
dprintk("nfs_get_root: get root inode failed\n");
It keeps the client from hitting the BUG_ON at least. Between the messed
up server and this fix, doing the ls shown above shows the contents of
server:/ (which is on the file system with the fsid that ends up being
changed to).
While debugging, I did a network trace. The client does a LOOKUP home
followed by a GETATTR. This reports the fsid 8:3 that is correct for
server:/home. Later, the client does another getattr (just before the
patch in nfs4_get_root()) which reports fsid 8:1.
I don't think this is a real fix, and fortunately, with a correct
nfs-utils on the server, the client doesn't hit this, but perhaps it
bears some investigation to find a proper fix.
Frank Filz
More information about the NFSv4
mailing list