[pnfs] [PATCH 2/2] NFS: Fix a pNFS deadlock due to violations of RPC asynchronous i/o rules
Trond Myklebust
Trond.Myklebust at netapp.com
Tue Jul 15 09:35:49 EDT 2008
On Tue, 2008-07-15 at 09:41 +0300, Benny Halevy wrote:
> > + /* FIXME: This lock is just so wrong... */
> > + spin_lock(&nfsi->lo_lock);
>
> What happens here is that the lo_lock was dropped in
> pnfs_update_layout before calling get_layout that's
> doing kmalloc and an rpc hence it must not hold the spin
> lock. We reacquire it here for pnfs_inject_layout that
> inserts the retrieved layout onto the file's layout.
...and hence you end up holding it across an allocation inside
pnfs_inject_layout(). It would be far better to move the lock to where
it is needed inside pnfs_inject_layout() _after_ the allocation has been
made.
> > +
> > + /* FIXME: WTF? */
> > + BUG_ON(nfsi->current_layout != lo);
>
> While dropping the spin lock we do keep a reference count
> on the layout structure to keep it around and valid and the
> BUG_ON is here to check we're sane since current_layout is
> not supposed to change while referenced (in the present
> implementation). That said, I've no objection to remove
> the BUG_ON since I believe this code should work even
> this assumption is changed and we might be referencing
> a layout structure detached from the nfs_inode.
So who is holding the reference? Wouldn't it make sense to hold a
reference in the 'struct nfs4_pnfs_layoutget' itself, so that you don't
have to worry about sanity checks like the above?
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust at netapp.com
www.netapp.com
More information about the pNFS
mailing list