[pnfs] [PATCH 2/2] NFS: Fix a pNFS deadlock due to violations of RPC asynchronous i/o rules

Trond Myklebust Trond.Myklebust at netapp.com
Tue Jul 15 09:35:49 EDT 2008


On Tue, 2008-07-15 at 09:41 +0300, Benny Halevy wrote:
> > +	/* FIXME: This lock is just so wrong... */
> > +	spin_lock(&nfsi->lo_lock);
> 
> What happens here is that the lo_lock was dropped in
> pnfs_update_layout before calling get_layout that's
> doing kmalloc and an rpc hence it must not hold the spin
> lock.  We reacquire it here for pnfs_inject_layout that
> inserts the retrieved layout onto the file's layout.

...and hence you end up holding it across an allocation inside
pnfs_inject_layout(). It would be far better to move the lock to where
it is needed inside pnfs_inject_layout() _after_ the allocation has been
made.

> > +
> > +	/* FIXME: WTF? */
> > +	BUG_ON(nfsi->current_layout != lo);
> 
> While dropping the spin lock we do keep a reference count
> on the layout structure to keep it around and valid and the
> BUG_ON is here to check we're sane since current_layout is
> not supposed to change while referenced (in the present
> implementation).  That said, I've no objection to remove
> the BUG_ON since I believe this code should work even
> this assumption is changed and we might be referencing
> a layout structure detached from the nfs_inode.

So who is holding the reference? Wouldn't it make sense to hold a
reference in the 'struct nfs4_pnfs_layoutget' itself, so that you don't
have to worry about sanity checks like the above?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust at netapp.com
www.netapp.com


More information about the pNFS mailing list